Andrey Kutuzov
commited on
Commit
•
92b4692
1
Parent(s):
ff0ec79
Camera ready
Browse files- README.md +26 -38
- config.json +1 -1
README.md
CHANGED
@@ -19,32 +19,44 @@ datasets:
|
|
19 |
- marksverdhei/wordnet-definitions-en-2021
|
20 |
---
|
21 |
|
22 |
-
#
|
23 |
|
24 |
-
This model is a version of [
|
25 |
|
26 |
-
It
|
27 |
-
|
28 |
-
- Rouge1: 41.5067
|
29 |
-
- Rouge2: 23.7149
|
30 |
-
- Rougel: 39.138
|
31 |
-
- Rougelsum: 39.1647
|
32 |
-
- Gen Len: 15.1578
|
33 |
|
34 |
## Model description
|
35 |
|
36 |
-
|
|
|
37 |
|
38 |
## Intended uses & limitations
|
39 |
|
40 |
-
|
|
|
41 |
|
42 |
## Training and evaluation data
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## Training procedure
|
47 |
|
|
|
|
|
48 |
### Training hyperparameters
|
49 |
|
50 |
The following hyperparameters were used during training:
|
@@ -61,35 +73,11 @@ The following hyperparameters were used during training:
|
|
61 |
- lr_scheduler_type: linear
|
62 |
- num_epochs: 20.0
|
63 |
|
64 |
-
### Training results
|
65 |
-
|
66 |
-
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
|
67 |
-
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
|
68 |
-
| 2.1171 | 1.0 | 1370 | 1.8175 | 27.0261 | 8.6429 | 25.2826 | 25.2952 | 11.8798 |
|
69 |
-
| 1.8186 | 2.0 | 2740 | 1.7112 | 29.1583 | 9.9747 | 27.3432 | 27.3647 | 11.7919 |
|
70 |
-
| 1.643 | 3.0 | 4110 | 1.6442 | 30.9045 | 11.2256 | 28.7826 | 28.788 | 12.4125 |
|
71 |
-
| 1.499 | 4.0 | 5480 | 1.5978 | 32.1126 | 12.6674 | 29.97 | 29.9843 | 12.3129 |
|
72 |
-
| 1.3772 | 5.0 | 6850 | 1.5720 | 33.6113 | 13.8451 | 31.3468 | 31.3599 | 12.6887 |
|
73 |
-
| 1.2742 | 6.0 | 8220 | 1.5564 | 34.4899 | 15.1005 | 32.3177 | 32.3291 | 12.2003 |
|
74 |
-
| 1.1785 | 7.0 | 9590 | 1.5466 | 35.4729 | 16.2035 | 33.2166 | 33.2295 | 12.4487 |
|
75 |
-
| 1.0941 | 8.0 | 10960 | 1.5571 | 36.4885 | 17.5396 | 34.2494 | 34.2759 | 12.7543 |
|
76 |
-
| 1.0202 | 9.0 | 12330 | 1.5541 | 37.4019 | 18.5568 | 35.1341 | 35.1473 | 12.8603 |
|
77 |
-
| 0.9552 | 10.0 | 13700 | 1.5642 | 38.127 | 19.4057 | 35.9008 | 35.9163 | 12.6987 |
|
78 |
-
| 0.8963 | 11.0 | 15070 | 1.5772 | 38.5073 | 20.0584 | 36.3304 | 36.3399 | 12.7052 |
|
79 |
-
| 0.8443 | 12.0 | 16440 | 1.5955 | 39.2323 | 20.9237 | 36.9863 | 37.0049 | 13.0395 |
|
80 |
-
| 0.7982 | 13.0 | 17810 | 1.6089 | 39.7947 | 21.6422 | 37.5619 | 37.5815 | 13.1400 |
|
81 |
-
| 0.7586 | 14.0 | 19180 | 1.6293 | 40.2922 | 22.2301 | 38.0755 | 38.0757 | 12.8589 |
|
82 |
-
| 0.7234 | 15.0 | 20550 | 1.6493 | 40.6358 | 22.5355 | 38.3523 | 38.3659 | 13.1102 |
|
83 |
-
| 0.6946 | 16.0 | 21920 | 1.6701 | 40.7708 | 22.906 | 38.5037 | 38.5174 | 13.1035 |
|
84 |
-
| 0.6688 | 17.0 | 23290 | 1.6902 | 41.0847 | 23.1663 | 38.8126 | 38.8149 | 13.2951 |
|
85 |
-
| 0.6484 | 18.0 | 24660 | 1.7005 | 41.2075 | 23.3967 | 38.9529 | 38.9545 | 13.2707 |
|
86 |
-
| 0.6342 | 19.0 | 26030 | 1.7116 | 41.2454 | 23.5187 | 39.0203 | 39.0396 | 13.2173 |
|
87 |
-
| 0.6234 | 20.0 | 27400 | 1.7210 | 41.3073 | 23.5691 | 39.0662 | 39.074 | 13.2558 |
|
88 |
-
|
89 |
-
|
90 |
### Framework versions
|
91 |
|
92 |
- Transformers 4.30.2
|
93 |
- Pytorch 1.13.1+rocm5.2
|
94 |
- Datasets 2.12.0
|
95 |
- Tokenizers 0.12.1
|
|
|
|
|
|
19 |
- marksverdhei/wordnet-definitions-en-2021
|
20 |
---
|
21 |
|
22 |
+
# mT0-Definition-En XL
|
23 |
|
24 |
+
This model is a version of [mT0 XL](https://huggingface.co/bigscience/mt0-xl) finetuned on a dataset of English definitions and usage examples.
|
25 |
|
26 |
+
It generates definitions of English words in context.
|
27 |
+
Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
+
See details in the paper `Enriching Word Usage Graphs with Cluster Definitions` (LREC-COLING'2024) by
|
32 |
+
Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.
|
33 |
|
34 |
## Intended uses & limitations
|
35 |
|
36 |
+
The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.
|
37 |
+
Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.
|
38 |
|
39 |
## Training and evaluation data
|
40 |
|
41 |
+
Three datasets were used to fine-tune the model:
|
42 |
+
- *WordNet* ([Ishiwatari et al., NAACL 2019](https://aclanthology.org/N19-1350/)), also [available on HF](https://huggingface.co/datasets/marksverdhei/wordnet-definitions-en-2021)
|
43 |
+
- *Oxford dictionary or CHA* ([Gadetsky et al., ACL 2018](https://aclanthology.org/P18-2043/))
|
44 |
+
- English subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/))
|
45 |
+
|
46 |
+
## Training results
|
47 |
+
|
48 |
+
mT0-Definition-En XL achieves the following results on concatenated validations sets from WordNet and Oxford dictionary:
|
49 |
+
- Loss: 1.7210
|
50 |
+
- Rouge1: 41.5067
|
51 |
+
- Rouge2: 23.7149
|
52 |
+
- Rougel: 39.138
|
53 |
+
- Rougelsum: 39.1647
|
54 |
+
- Gen Len: 15.1578
|
55 |
|
56 |
## Training procedure
|
57 |
|
58 |
+
mT0-Definition-En XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.
|
59 |
+
|
60 |
### Training hyperparameters
|
61 |
|
62 |
The following hyperparameters were used during training:
|
|
|
73 |
- lr_scheduler_type: linear
|
74 |
- num_epochs: 20.0
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
### Framework versions
|
77 |
|
78 |
- Transformers 4.30.2
|
79 |
- Pytorch 1.13.1+rocm5.2
|
80 |
- Datasets 2.12.0
|
81 |
- Tokenizers 0.12.1
|
82 |
+
|
83 |
+
## Citation
|
config.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "mt0-xl
|
3 |
"architectures": [
|
4 |
"MT5ForConditionalGeneration"
|
5 |
],
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "mt0-xl",
|
3 |
"architectures": [
|
4 |
"MT5ForConditionalGeneration"
|
5 |
],
|