metadata
license: mit
base_model: microsoft/phi-2
tags:
- trl
- fietje
- alignment-handbook
datasets:
- uonlp/CulturaX
- wikimedia/wikipedia
model-index:
- name: fietje-2b
results: []
language:
- nl
pipeline_tag: text-generation
inference: false
Fietje 2B
An open and efficient LLM for Dutch.🚀 Looking for the fast GGUF version? You can find it, and how to use it with
ollama
(command line) or LM Studio (interface), here.
This model is an adapted version of microsoft/phi-2, finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found here, which also describes the filtering that took place.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 9e-05
- train_batch_size: 40
- eval_batch_size: 40
- seed: 42
- distributed_type: multi-GPU
- num_devices: 16
- gradient_accumulation_steps: 3
- total_train_batch_size: 1920
- total_eval_batch_size: 640
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: linear
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.6334 | 0.13 | 900 | 1.5937 |
1.5469 | 0.26 | 1800 | 1.5051 |
1.4937 | 0.4 | 2700 | 1.4628 |
1.4633 | 0.53 | 3600 | 1.4375 |
1.4485 | 0.66 | 4500 | 1.4203 |
1.4374 | 0.79 | 5400 | 1.4085 |
1.4278 | 0.92 | 6300 | 1.4013 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2