persian_llm_leaderboard

Running

Behnamm commited on Aug 28

Commit

04eb2c0

•

1 Parent(s): 88fb0ea

Update src/about.py

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -69,7 +69,7 @@ For all these evaluations, a higher score is a better score.
 We use the given *test* subset (for those benchmarks that also have *train* and *dev* subsets) for all these evaluations.
-We chose these benchmarks for now, but several other benchmarks are going to be added later to help us perform a more thorough examination of models.
 The last two benchmarks, ParsiNLU NLI and ParsiNLU QQP are evaluated in different few-shot settings and then the maximum score is returned as the final evaluation.
 We argue that this is indeed a fair evaluation scheme since many light-weight models (around ~7B and less) can have a poor in-context learning and thus perform better

 We use the given *test* subset (for those benchmarks that also have *train* and *dev* subsets) for all these evaluations.
+These benchmarks are picked for now, but several other benchmarks are going to be added later to help us perform a more thorough examination of models.
 The last two benchmarks, ParsiNLU NLI and ParsiNLU QQP are evaluated in different few-shot settings and then the maximum score is returned as the final evaluation.
 We argue that this is indeed a fair evaluation scheme since many light-weight models (around ~7B and less) can have a poor in-context learning and thus perform better