metadata

license: mit
base_model: microsoft/phi-2
tags:
  - trl
  - fietje
  - alignment-handbook
datasets:
  - uonlp/CulturaX
  - wikimedia/wikipedia
model-index:
  - name: fietje-2b
    results: []
language:
  - nl
pipeline_tag: text-generation
inference: false

Fietje 2B

An open and efficient LLM for Dutch.

🚀 Looking for the fast GGUF version? You can find it, and how to use it with ollama (command line) or LM Studio (interface), here.

This model is an adapted version of microsoft/phi-2, finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found here, which also describes the filtering that took place.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 9e-05
train_batch_size: 40
eval_batch_size: 40
seed: 42
distributed_type: multi-GPU
num_devices: 16
gradient_accumulation_steps: 3
total_train_batch_size: 1920
total_eval_batch_size: 640
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
lr_scheduler_type: linear
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.6334	0.13	900	1.5937
1.5469	0.26	1800	1.5051
1.4937	0.4	2700	1.4628
1.4633	0.53	3600	1.4375
1.4485	0.66	4500	1.4203
1.4374	0.79	5400	1.4085
1.4278	0.92	6300	1.4013

Framework versions

Transformers 4.39.1
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.15.2