Short-Answer-Feedback
/

mbart-finetuned-saf-micro-job

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

JohnnyBoy00 commited on Dec 21, 2022

Commit

00eecff

•

1 Parent(s): d160911

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -78,14 +78,14 @@ The following hyperparameters were utilized during training:
 ## Evaluation results
-The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
 The following results were achieved.
-| Split                 | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
-| --------------------- | :-------: | :---: | :----: | :-------: | :------: | :---------: | :------: |
-| test_unseen_answers   | 39.5	    | 29.8  | 63.3   | 63.1      | 80.1     | 80.3        | 80.7     |
-| test_unseen_questions | 0.3       | 0.5   | 33.8   | 31.3      | 48.7     | 46.5        | 40.6     |
 The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.

 ## Evaluation results
+The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
 The following results were achieved.
+| Split                 | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
+| --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
+| test_unseen_answers   | 39.5	    | 29.8    | 63.3   | 63.1      | 80.1     | 80.3        | 80.7     |
+| test_unseen_questions | 0.3       | 0.5     | 33.8   | 31.3      | 48.7     | 46.5        | 40.6     |
 The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.