fietje-2 / README.md
BramVanroy's picture
Update README.md
10f8e87 verified
|
raw
history blame
2.5 kB
metadata
license: mit
base_model: microsoft/phi-2
tags:
  - trl
  - fietje
  - alignment-handbook
datasets:
  - uonlp/CulturaX
  - wikimedia/wikipedia
model-index:
  - name: fietje-2b
    results: []
language:
  - nl
pipeline_tag: text-generation
inference: false

Fietje banner

Fietje 2B

An open and efficient LLM for Dutch.

🚀 Looking for the fast GGUF version? You can find it, and how to use it with ollama (command line) or LM Studio (interface), here.

This model is an adapted version of microsoft/phi-2, finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found here, which also describes the filtering that took place.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-05
  • train_batch_size: 40
  • eval_batch_size: 40
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 1920
  • total_eval_batch_size: 640
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
1.6334 0.13 900 1.5937
1.5469 0.26 1800 1.5051
1.4937 0.4 2700 1.4628
1.4633 0.53 3600 1.4375
1.4485 0.66 4500 1.4203
1.4374 0.79 5400 1.4085
1.4278 0.92 6300 1.4013

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2