Update README.md
Browse files
README.md
CHANGED
@@ -126,8 +126,6 @@ Use Gemma 2 format.
|
|
126 |
|
127 |
## Benchmarks and Leaderboard Rankings
|
128 |
|
129 |
-
OpenLLM: Pending in Queue
|
130 |
-
|
131 |
Creative Writing V2 - Score: 82.64 (Rank 1)
|
132 |
|
133 |
https://eqbench.com/creative_writing.html
|
@@ -138,6 +136,20 @@ That's right, much to everyone's surprise (mine included) this model has topped
|
|
138 |
|
139 |
![Leaderboard](https://i.imgur.com/gJd9Pab.png)
|
140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
141 |
## Preface and Rambling
|
142 |
|
143 |
My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still liked the original SPPO finetune just a bit more.
|
@@ -192,16 +204,4 @@ slices:
|
|
192 |
model: nbeerbower/gemma2-gutenberg-9B
|
193 |
```
|
194 |
|
195 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
196 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lemon07r__Gemma-2-Ataraxy-9B)
|
197 |
-
|
198 |
-
| Metric |Value|
|
199 |
-
|-------------------|----:|
|
200 |
-
|Avg. |22.43|
|
201 |
-
|IFEval (0-Shot) |30.09|
|
202 |
-
|BBH (3-Shot) |42.03|
|
203 |
-
|MATH Lvl 5 (4-Shot)| 0.83|
|
204 |
-
|GPQA (0-shot) |11.30|
|
205 |
-
|MuSR (0-shot) |14.47|
|
206 |
-
|MMLU-PRO (5-shot) |35.85|
|
207 |
|
|
|
126 |
|
127 |
## Benchmarks and Leaderboard Rankings
|
128 |
|
|
|
|
|
129 |
Creative Writing V2 - Score: 82.64 (Rank 1)
|
130 |
|
131 |
https://eqbench.com/creative_writing.html
|
|
|
136 |
|
137 |
![Leaderboard](https://i.imgur.com/gJd9Pab.png)
|
138 |
|
139 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
140 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lemon07r__Gemma-2-Ataraxy-9B)
|
141 |
+
|
142 |
+
| Metric |Value|
|
143 |
+
|-------------------|----:|
|
144 |
+
|Avg. |22.43|
|
145 |
+
|IFEval (0-Shot) |30.09|
|
146 |
+
|BBH (3-Shot) |42.03|
|
147 |
+
|MATH Lvl 5 (4-Shot)| 0.83|
|
148 |
+
|GPQA (0-shot) |11.30|
|
149 |
+
|MuSR (0-shot) |14.47|
|
150 |
+
|MMLU-PRO (5-shot) |35.85|
|
151 |
+
|
152 |
+
|
153 |
## Preface and Rambling
|
154 |
|
155 |
My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still liked the original SPPO finetune just a bit more.
|
|
|
204 |
model: nbeerbower/gemma2-gutenberg-9B
|
205 |
```
|
206 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
207 |
|