JohnnyBoy00
commited on
Commit
•
00eecff
1
Parent(s):
d160911
Update README.md
Browse files
README.md
CHANGED
@@ -78,14 +78,14 @@ The following hyperparameters were utilized during training:
|
|
78 |
|
79 |
## Evaluation results
|
80 |
|
81 |
-
The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
|
82 |
|
83 |
The following results were achieved.
|
84 |
|
85 |
-
| Split | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
|
86 |
-
| --------------------- | :-------: |
|
87 |
-
| test_unseen_answers | 39.5 | 29.8
|
88 |
-
| test_unseen_questions | 0.3 | 0.5
|
89 |
|
90 |
|
91 |
The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
|
|
|
78 |
|
79 |
## Evaluation results
|
80 |
|
81 |
+
The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
|
82 |
|
83 |
The following results were achieved.
|
84 |
|
85 |
+
| Split | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
|
86 |
+
| --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
|
87 |
+
| test_unseen_answers | 39.5 | 29.8 | 63.3 | 63.1 | 80.1 | 80.3 | 80.7 |
|
88 |
+
| test_unseen_questions | 0.3 | 0.5 | 33.8 | 31.3 | 48.7 | 46.5 | 40.6 |
|
89 |
|
90 |
|
91 |
The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
|