|
--- |
|
language: |
|
- de |
|
tags: |
|
- question-generation |
|
- german |
|
- text2text-generation |
|
- generated_from_trainer |
|
datasets: |
|
- lmqg/qg_dequad |
|
metrics: |
|
- bleu4 |
|
- f1 |
|
- rouge |
|
- exact_match |
|
model-index: |
|
- name: german-jeopardy-mt5-base |
|
results: |
|
- task: |
|
name: Sequence-to-sequence Language Modeling |
|
type: text2text-generation |
|
dataset: |
|
name: lmqg/qg_dequad |
|
type: default |
|
args: default |
|
metrics: |
|
- name: BLEU-4 |
|
type: bleu4 |
|
value: 14.56 |
|
- name: F1 |
|
type: f1 |
|
value: 39.53 |
|
- name: ROUGE-1 |
|
type: rouge1 |
|
value: 40.62 |
|
- name: ROUGE-2 |
|
type: rouge2 |
|
value: 21.49 |
|
- name: ROUGE-L |
|
type: rougel |
|
value: 39.14 |
|
- name: ROUGE-Lsum |
|
type: rougelsum |
|
value: 39.13 |
|
- name: Exact Match |
|
type: exact_match |
|
value: 2.72 |
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# german-jeopardy-mt5-base |
|
|
|
This model is a fine-tuned version of [google/mt5-base](https://huggingface.co/google/mt5-base) on the [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad) dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.66 |
|
- Brevity Penalty: 0.9025 |
|
- System Length: 18860 |
|
- Reference Length: 20793 |
|
- ROUGE-1: 40.62 |
|
- ROUGE-2: 21.49 |
|
- ROUGE-L: 39.14 |
|
- ROUGE-Lsum: 39.13 |
|
- Exact Match: 2.72 |
|
- BLEU: 14.56 |
|
- F1: 39.53 |
|
|
|
## Model description |
|
|
|
See [google/mt5-base](https://huggingface.co/google/mt5-base) for the model architecture. |
|
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM. |
|
|
|
## Intended uses & limitations |
|
|
|
This model can be used for question generation on German text. |
|
|
|
## Training and evaluation data |
|
|
|
See [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad). |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 4 |
|
- seed: 7 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 64 |
|
- optimizer: Adafactor |
|
- lr_scheduler_type: constant |
|
- num_epochs: 20 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Counts 1 | Counts 2 | Counts 3 | Counts 4 | Totals 1 | Totals 2 | Totals 3 | Totals 4 | Precisions 1 | Precisions 2 | Precisions 3 | Precisions 4 | Brevity Penalty | System Length | Reference Length | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | Exact Match | BLEU | Mean Generated Length | F1 | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:------------:|:------------:|:------------:|:------------:|:---------------:|:-------------:|:----------------:|:-------:|:-------:|:-------:|:----------:|:-----------:|:-------:|:---------------------:|:------:| |
|
| 5.5131 | 1.0 | 145 | 1.8698 | 6032 | 1668 | 626 | 216 | 16023 | 13819 | 11615 | 9411 | 37.6459 | 12.0703 | 5.3896 | 2.2952 | 0.7216 | 16023 | 21250 | 0.2485 | 0.1011 | 0.2368 | 0.2366 | 0.0018 | 6.2485 | 12.6166 | 0.2406 | |
|
| 2.3946 | 2.0 | 291 | 1.5888 | 7325 | 2554 | 1178 | 558 | 16853 | 14649 | 12445 | 10241 | 43.4641 | 17.4346 | 9.4656 | 5.4487 | 0.7704 | 16853 | 21250 | 0.3226 | 0.1585 | 0.31 | 0.31 | 0.0145 | 10.8315 | 12.2582 | 0.3148 | |
|
| 2.0101 | 3.0 | 436 | 1.4997 | 7623 | 2764 | 1304 | 629 | 17042 | 14838 | 12634 | 10430 | 44.7307 | 18.6278 | 10.3214 | 6.0307 | 0.7812 | 17042 | 21250 | 0.3403 | 0.1723 | 0.3263 | 0.3263 | 0.0154 | 11.7891 | 12.6783 | 0.3315 | |
|
| 1.8073 | 4.0 | 582 | 1.4610 | 7728 | 2916 | 1415 | 707 | 16654 | 14450 | 12246 | 10042 | 46.4033 | 20.1799 | 11.5548 | 7.0404 | 0.7588 | 16654 | 21250 | 0.3461 | 0.1818 | 0.3324 | 0.3326 | 0.0168 | 12.6068 | 12.2963 | 0.3387 | |
|
| 1.6851 | 4.99 | 727 | 1.4357 | 7964 | 3059 | 1483 | 727 | 17381 | 15177 | 12973 | 10769 | 45.8201 | 20.1555 | 11.4314 | 6.7509 | 0.8004 | 17381 | 21250 | 0.3558 | 0.1888 | 0.3415 | 0.3414 | 0.0159 | 13.0784 | 12.7436 | 0.3483 | |
|
| 1.5642 | 6.0 | 873 | 1.4003 | 8299 | 3224 | 1592 | 788 | 17351 | 15147 | 12943 | 10739 | 47.8301 | 21.2847 | 12.3001 | 7.3377 | 0.7987 | 17351 | 21250 | 0.3814 | 0.2025 | 0.3684 | 0.3685 | 0.0204 | 13.9065 | 12.9569 | 0.3736 | |
|
| 1.4756 | 6.99 | 1018 | 1.3779 | 8640 | 3430 | 1712 | 879 | 17669 | 15465 | 13261 | 11057 | 48.8992 | 22.1791 | 12.91 | 7.9497 | 0.8165 | 17669 | 21250 | 0.3971 | 0.2133 | 0.3828 | 0.3826 | 0.025 | 14.9146 | 13.1084 | 0.3892 | |
|
| 1.3792 | 8.0 | 1164 | 1.3624 | 8732 | 3417 | 1712 | 871 | 17996 | 15792 | 13588 | 11384 | 48.5219 | 21.6375 | 12.5994 | 7.6511 | 0.8346 | 17996 | 21250 | 0.4003 | 0.2131 | 0.3852 | 0.3849 | 0.0245 | 14.8859 | 13.3748 | 0.3917 | |
|
| 1.3133 | 9.0 | 1310 | 1.3630 | 8804 | 3500 | 1754 | 920 | 17661 | 15457 | 13253 | 11049 | 49.85 | 22.6435 | 13.2347 | 8.3265 | 0.8161 | 17661 | 21250 | 0.4078 | 0.219 | 0.3932 | 0.3935 | 0.025 | 15.3264 | 13.2019 | 0.4 | |
|
| 1.261 | 10.0 | 1455 | 1.3685 | 8910 | 3602 | 1849 | 1000 | 17709 | 15505 | 13301 | 11097 | 50.3134 | 23.2312 | 13.9012 | 9.0114 | 0.8188 | 17709 | 21250 | 0.4135 | 0.223 | 0.3991 | 0.3992 | 0.0295 | 16.0163 | 13.1892 | 0.4055 | |
|
| 1.1897 | 11.0 | 1601 | 1.3639 | 9096 | 3690 | 1902 | 1012 | 18261 | 16057 | 13853 | 11649 | 49.8111 | 22.9806 | 13.7299 | 8.6874 | 0.849 | 18261 | 21250 | 0.4201 | 0.2289 | 0.4059 | 0.4057 | 0.0281 | 16.3202 | 13.5077 | 0.4121 | |
|
| 1.1453 | 11.99 | 1746 | 1.3610 | 9106 | 3735 | 1932 | 1023 | 18329 | 16125 | 13921 | 11717 | 49.6808 | 23.1628 | 13.8783 | 8.7309 | 0.8527 | 18329 | 21250 | 0.4173 | 0.2303 | 0.4026 | 0.4025 | 0.0281 | 16.4772 | 13.8013 | 0.4099 | |
|
| 1.0858 | 13.0 | 1892 | 1.3716 | 9245 | 3778 | 1955 | 1049 | 18556 | 16352 | 14148 | 11944 | 49.8222 | 23.1042 | 13.8182 | 8.7827 | 0.8649 | 18556 | 21250 | 0.4244 | 0.2327 | 0.409 | 0.409 | 0.0322 | 16.7204 | 13.8144 | 0.417 | |
|
| 1.0472 | 13.99 | 2037 | 1.3770 | 9166 | 3756 | 1946 | 1054 | 18315 | 16111 | 13907 | 11703 | 50.0464 | 23.3133 | 13.993 | 9.0062 | 0.8519 | 18315 | 21250 | 0.4216 | 0.2311 | 0.4068 | 0.4067 | 0.0309 | 16.6825 | 13.8099 | 0.4143 | |
|
| 0.9953 | 15.0 | 2183 | 1.3881 | 9342 | 3926 | 2046 | 1108 | 18132 | 15928 | 13724 | 11520 | 51.5222 | 24.6484 | 14.9082 | 9.6181 | 0.842 | 18132 | 21250 | 0.4328 | 0.2418 | 0.4171 | 0.4171 | 0.0327 | 17.3937 | 13.5023 | 0.4258 | |
|
| 0.9509 | 16.0 | 2329 | 1.4016 | 9330 | 3894 | 2024 | 1084 | 18672 | 16468 | 14264 | 12060 | 49.9679 | 23.6459 | 14.1896 | 8.9884 | 0.871 | 18672 | 21250 | 0.4269 | 0.237 | 0.4123 | 0.4122 | 0.0313 | 17.1618 | 13.956 | 0.4198 | |
|
| 0.9183 | 17.0 | 2474 | 1.4152 | 9303 | 3824 | 1979 | 1084 | 18476 | 16272 | 14068 | 11864 | 50.3518 | 23.5005 | 14.0674 | 9.1369 | 0.8606 | 18476 | 21250 | 0.4269 | 0.2345 | 0.4121 | 0.4122 | 0.0327 | 16.995 | 13.7854 | 0.4199 | |
|
| 0.8696 | 18.0 | 2620 | 1.4404 | 9184 | 3798 | 1993 | 1085 | 18379 | 16175 | 13971 | 11767 | 49.9701 | 23.4807 | 14.2653 | 9.2207 | 0.8554 | 18379 | 21250 | 0.4218 | 0.2333 | 0.4076 | 0.4074 | 0.034 | 16.9541 | 13.726 | 0.4148 | |
|
| 0.8389 | 19.0 | 2765 | 1.4360 | 9476 | 4000 | 2092 | 1139 | 19003 | 16799 | 14595 | 12391 | 49.8658 | 23.8109 | 14.3337 | 9.1922 | 0.8885 | 19003 | 21250 | 0.4307 | 0.2406 | 0.4161 | 0.416 | 0.0299 | 17.67 | 14.2064 | 0.4239 | |
|
| 0.7993 | 19.92 | 2900 | 1.4545 | 9464 | 3970 | 2078 | 1126 | 18741 | 16537 | 14333 | 12129 | 50.4989 | 24.0068 | 14.498 | 9.2835 | 0.8747 | 18741 | 21250 | 0.4349 | 0.2424 | 0.4194 | 0.4192 | 0.0327 | 17.5799 | 13.9959 | 0.4269 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.32.1 |
|
- Pytorch 2.1.0 |
|
- Datasets 2.12.0 |
|
- Tokenizers 0.13.3 |
|
|