Behnamm commited on
Commit
b5c5474
1 Parent(s): f97bde5

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +2 -2
src/about.py CHANGED
@@ -70,8 +70,8 @@ For all these evaluations, a higher score is a better score.
70
  We chose these benchmarks for now, but several other benchmarks are going to be added later to help us perform a more thorough examination of models.
71
 
72
  The last two benchmarks, ParsiNLU NLI and ParsiNLU QQP are evaluated in different few-shot settings and then the maximum score is returned as the final evaluation.
73
- We argue that is indeed a fair evaluation method since many light-weight models (around ~7B and less) can have a pooor in-context learning and thus they perform better
74
- in small shots. We wish to not hold this against the model by trying to measure performances in different settings and take the maximum score achieved .
75
 
76
  ## REPRODUCIBILITY
77
  The parameters used for evaluation along with instructions and prompts will be available once the framework is release. (TO BE COMPLETED)
 
70
  We chose these benchmarks for now, but several other benchmarks are going to be added later to help us perform a more thorough examination of models.
71
 
72
  The last two benchmarks, ParsiNLU NLI and ParsiNLU QQP are evaluated in different few-shot settings and then the maximum score is returned as the final evaluation.
73
+ We argue that this is indeed a fair evaluation scheme since many light-weight models (around ~7B and less) can have a poor in-context learning and thus perform better
74
+ in small shots (or have a small knowledge capacity and perform poorly in zero-shot). We wish to not hold this against the model by trying to measure performances in different settings and take the maximum score achieved .
75
 
76
  ## REPRODUCIBILITY
77
  The parameters used for evaluation along with instructions and prompts will be available once the framework is release. (TO BE COMPLETED)