Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
yellowvm commited on
Commit
2b10f59
1 Parent(s): 3f922ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -192,7 +192,7 @@ The model training took roughly two months.
192
 
193
  ## Benchmarks
194
 
195
- We evaluate our model on all benchmarks of the leaderboard's version 2 using the `lm-evaluation-harness` package, and we evaluate it on the benchmarks of version 1 using `lighteval`. The reported evaluation results on the leaderboard version 2 are normalized following HuggingFace score normalization.
196
 
197
 
198
  | `model name` |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
@@ -212,6 +212,8 @@ We evaluate our model on all benchmarks of the leaderboard's version 2 using the
212
  | `gemma-7B` | 26.59 | 21.12 | 6.42 | 4.92 | 10.98 | 21.64 |**15.28**|
213
 
214
 
 
 
215
 
216
  | `model name` |`ARC`|`HellaSwag` |`MMLU` |`Winogrande`|`TruthfulQA`|`GSM8K`|`Average` |
217
  |:-----------------------------|:------:|:---------:|:-----:|:----------:|:----------:|:-----:|:----------------:|
 
192
 
193
  ## Benchmarks
194
 
195
+ We evaluate our model on all benchmarks of the new leaderboard's version using the `lm-evaluation-harness` package, and then normalize the evaluation results with HuggingFace score normalization.
196
 
197
 
198
  | `model name` |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
 
212
  | `gemma-7B` | 26.59 | 21.12 | 6.42 | 4.92 | 10.98 | 21.64 |**15.28**|
213
 
214
 
215
+ Also, we evaluate our model on the benchmarks of the first leaderboard using `lighteval`.
216
+
217
 
218
  | `model name` |`ARC`|`HellaSwag` |`MMLU` |`Winogrande`|`TruthfulQA`|`GSM8K`|`Average` |
219
  |:-----------------------------|:------:|:---------:|:-----:|:----------:|:----------:|:-----:|:----------------:|