JohnnyBoy00 commited on
Commit
638b288
1 Parent(s): d0b77dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -18
README.md CHANGED
@@ -1,49 +1,77 @@
1
  ---
 
 
 
 
2
  tags:
3
  - generated_from_trainer
4
  model-index:
5
  - name: mbart-finetuned-saf-micro-job
6
  results: []
 
 
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
-
12
  # mbart-finetuned-saf-micro-job
13
 
14
- This model is a fine-tuned version of [facebook/mbart-large-cc25](https://huggingface.co/facebook/mbart-large-cc25) on the None dataset.
15
 
16
  ## Model description
17
 
18
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Intended uses & limitations
21
 
22
- More information needed
 
 
23
 
24
  ## Training and evaluation data
25
 
26
- More information needed
 
 
 
 
 
 
 
 
 
27
 
28
  ## Training procedure
29
 
 
 
 
 
30
  ### Training hyperparameters
31
 
32
- The following hyperparameters were used during training:
 
 
33
  - learning_rate: 5e-05
 
34
  - train_batch_size: 1
35
- - eval_batch_size: 4
36
- - seed: 42
37
  - gradient_accumulation_steps: 4
38
- - total_train_batch_size: 4
39
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
- - lr_scheduler_type: linear
41
- - num_epochs: 10
42
  - mixed_precision_training: Native AMP
43
-
44
- ### Training results
45
-
46
-
47
 
48
  ### Framework versions
49
 
@@ -51,3 +79,48 @@ The following hyperparameters were used during training:
51
  - Pytorch 1.12.1+cu113
52
  - Datasets 2.7.1
53
  - Tokenizers 0.13.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: de
3
+ dataset:
4
+ - type: JohnnyBoy00/saf_micro_job_german
5
+ - name: SAF - Micro Job - German
6
  tags:
7
  - generated_from_trainer
8
  model-index:
9
  - name: mbart-finetuned-saf-micro-job
10
  results: []
11
+ widget:
12
+ - text: "Antwort: Ich gebe mich zu erkennen und zeige das Informationsschreiben vor Lösung: Der Jobber soll sich in diesem Fall dem Personal gegenüber zu erkennen geben (0.25 P) und das entsprechende Informationsschreiben in der App vorzeigen (0.25 P). Zusätzlich muss notiert werden, zu welchem Zeitpunkt (0.25 P) des Jobs der Jobber enttarnt wurde. Zentrale Frage ist dabei, ob ein neutrales, unvoreingenommenes Verkaufsgespräch stattgefunden hat. Der Job soll mit Erlaubnis der Mitarbeiter bis zum Ende durchgeführt (0.25 P) werden. Frage: Frage 1: Wie reagierst du, wenn du auf deine Tätigkeit angesprochen wirst?"
13
  ---
14
 
 
 
 
15
  # mbart-finetuned-saf-micro-job
16
 
17
+ This model is a fine-tuned version of [facebook/mbart-large-cc25](https://huggingface.co/facebook/mbart-large-cc25) on the [saf_micro_job_german](https://huggingface.co/datasets/JohnnyBoy00/saf_micro_job_german) dataset for Short Answer Feedback (SAF), as proposed in [Filighera et al., ACL 2022](https://aclanthology.org/2022.acl-long.587).
18
 
19
  ## Model description
20
 
21
+ This model was built on top of [mBART](https://arxiv.org/abs/2001.08210), which is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages.
22
+
23
+ It expects inputs in the following format:
24
+ ```
25
+ Antwort: [answer] Lösung: [reference_answer] Frage: [question]
26
+ ```
27
+
28
+ In the example above, `[answer]`, `[reference_answer]` and `[question]` should be replaced by the provided answer, the reference answer and the question to which they refer, respectively.
29
+
30
+
31
+ The outputs are formatted as follows:
32
+ ```
33
+ [verification_feedback] Feedback: [feedback]
34
+ ```
35
+
36
+ In this case, `[verification_feedback]` will be one of `Correct`, `Partially correct` or `Incorrect`, while `[feedback]` will be the textual feedback generated by the model according to the given answer.
37
 
38
  ## Intended uses & limitations
39
 
40
+ This model is intended to be used for Short Answer Feedback generation in the context of micro-job training (as conducted on the crowd-worker platform appJobber). Thus, it is not expected to have particularly good performance on sets of questions and answers out of this scope.
41
+
42
+ It is important to acknowledge that the model underperforms when a question that was not seen during training is given as input for inference. In particular, it tends to classify most answers as being correct and does not provide relevant feedback in such cases. Nevertheless, this limitation could be partially overcome by extending the dataset with the desired question (and associated answers) and fine-tuning it for a few epochs on the new data.
43
 
44
  ## Training and evaluation data
45
 
46
+ As mentioned previously, the model was trained on the [saf_micro_job_german](https://huggingface.co/datasets/JohnnyBoy00/saf_micro_job_german) dataset, which is divided into the following splits.
47
+
48
+ | Split | Number of examples |
49
+ | --------------------- | ------------------ |
50
+ | train | 1226 |
51
+ | validation | 308 |
52
+ | test_unseen_answers | 271 |
53
+ | test_unseen_questions | 602 |
54
+
55
+ Evaluation was performed on the `test_unseen_answers` and `test_unseen_questions` splits.
56
 
57
  ## Training procedure
58
 
59
+ The [Trainer API](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Seq2SeqTrainer) was used to fine-tune the model. The code utilized for pre-processing and training was mostly adapted from the [summarization script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) made available by HuggingFace.
60
+
61
+ Training was completed in a little under 1 hour on a GPU on Google Colab.
62
+
63
  ### Training hyperparameters
64
 
65
+ The following hyperparameters were utilized during training:
66
+ - num_epochs: 10
67
+ - optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
68
  - learning_rate: 5e-05
69
+ - lr_scheduler_type: linear
70
  - train_batch_size: 1
 
 
71
  - gradient_accumulation_steps: 4
72
+ - eval_batch_size: 4
 
 
 
73
  - mixed_precision_training: Native AMP
74
+ - PyTorch seed: 42
 
 
 
75
 
76
  ### Framework versions
77
 
 
79
  - Pytorch 1.12.1+cu113
80
  - Datasets 2.7.1
81
  - Tokenizers 0.13.2
82
+
83
+ ## Evaluation results
84
+
85
+ The model was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, as well as the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn.
86
+
87
+ The following results were achieved.
88
+
89
+ | Split | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
90
+ | --------------------- | :-------: | :---: | :----: | :-------: | :------: | :---------: | :------: |
91
+ | test_unseen_answers | 39.5 | 29.8 | 63.3 | 63.1 | 80.1 | 80.3 | 80.7 |
92
+ | test_unseen_questions | 0.3 | 0.5 | 33.8 | 31.3 | 48.7 | 46.5 | 40.6 |
93
+
94
+
95
+ The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
96
+
97
+ ## Usage
98
+
99
+ The example below shows how the model can be applied to generate textual feedback to a given answer.
100
+
101
+ ```python
102
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
103
+
104
+ model = AutoModelForSeq2SeqLM.from_pretrained('JohnnyBoy00/mbart-finetuned-saf-micro-job')
105
+ tokenizer = AutoTokenizer.from_pretrained('JohnnyBoy00/mbart-finetuned-saf-micro-job')
106
+
107
+ example_input = 'Antwort: Ich gebe mich zu erkennen und zeige das Informationsschreiben vor Lösung: Der Jobber soll sich in diesem Fall dem Personal gegenüber zu erkennen geben (0.25 P) und das entsprechende Informationsschreiben in der App vorzeigen (0.25 P). Zusätzlich muss notiert werden, zu welchem Zeitpunkt (0.25 P) des Jobs der Jobber enttarnt wurde. Zentrale Frage ist dabei, ob ein neutrales, unvoreingenommenes Verkaufsgespräch stattgefunden hat. Der Job soll mit Erlaubnis der Mitarbeiter bis zum Ende durchgeführt (0.25 P) werden. Frage: Frage 1: Wie reagierst du, wenn du auf deine Tätigkeit angesprochen wirst?'
108
+ inputs = tokenizer(example_input, max_length=256, padding='max_length', truncation=True, return_tensors='pt')
109
+
110
+ generated_tokens = model.generate(
111
+ inputs['input_ids'],
112
+ attention_mask=inputs['attention_mask'],
113
+ max_length=128
114
+ )
115
+ output = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
116
+ ```
117
+
118
+ The output generated by the model then looks as follows:
119
+
120
+ ```
121
+ Partially correct Feedback: Sollte das Personal dies gestatten, kannst du den Check auch gerne noch abschließen. Bitte halte nur in fest, wann genau du auf deine Tätigkeit angesprochen wurdest.
122
+ ```
123
+
124
+ ## Related Work
125
+
126
+ [Filighera et al., ACL 2022](https://aclanthology.org/2022.acl-long.587) trained a [T5 model](https://huggingface.co/docs/transformers/model_doc/t5) on this dataset, providing a baseline for SAF generation. The entire code used to define and train the model can be found on [GitHub](https://github.com/SebOchs/SAF).