tiiuae
/

falcon-mamba-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yellowvm commited on Jul 25

Commit

2b10f59

•

1 Parent(s): 3f922ac

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -192,7 +192,7 @@ The model training took roughly two months.
 ## Benchmarks
-We evaluate our model on all benchmarks of the leaderboard's version 2 using the `lm-evaluation-harness` package, and we evaluate it on the benchmarks of version 1 using `lighteval`. The reported evaluation results on the leaderboard version 2 are normalized following HuggingFace score normalization.
 | `model name`              |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
@@ -212,6 +212,8 @@ We evaluate our model on all benchmarks of the leaderboard's version 2 using the
 | `gemma-7B`                | 26.59  | 21.12 |    6.42   | 4.92  | 10.98 | 21.64    |**15.28**|
 | `model name`                 |`ARC`|`HellaSwag`   |`MMLU` |`Winogrande`|`TruthfulQA`|`GSM8K`|`Average`         |
 |:-----------------------------|:------:|:---------:|:-----:|:----------:|:----------:|:-----:|:----------------:|

 ## Benchmarks
+We evaluate our model on all benchmarks of the new leaderboard's version using the `lm-evaluation-harness` package, and then normalize the evaluation results with HuggingFace score normalization.
 | `model name`              |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
 | `gemma-7B`                | 26.59  | 21.12 |    6.42   | 4.92  | 10.98 | 21.64    |**15.28**|
+Also, we evaluate our model on the benchmarks of the first leaderboard using `lighteval`.
 | `model name`                 |`ARC`|`HellaSwag`   |`MMLU` |`Winogrande`|`TruthfulQA`|`GSM8K`|`Average`         |
 |:-----------------------------|:------:|:---------:|:-----:|:----------:|:----------:|:-----:|:----------------:|