lemon07r commited on
Commit
3dd9200
1 Parent(s): 32bdf6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -126,8 +126,6 @@ Use Gemma 2 format.
126
 
127
  ## Benchmarks and Leaderboard Rankings
128
 
129
- OpenLLM: Pending in Queue
130
-
131
  Creative Writing V2 - Score: 82.64 (Rank 1)
132
 
133
  https://eqbench.com/creative_writing.html
@@ -138,6 +136,20 @@ That's right, much to everyone's surprise (mine included) this model has topped
138
 
139
  ![Leaderboard](https://i.imgur.com/gJd9Pab.png)
140
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  ## Preface and Rambling
142
 
143
  My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still liked the original SPPO finetune just a bit more.
@@ -192,16 +204,4 @@ slices:
192
  model: nbeerbower/gemma2-gutenberg-9B
193
  ```
194
 
195
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
196
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lemon07r__Gemma-2-Ataraxy-9B)
197
-
198
- | Metric |Value|
199
- |-------------------|----:|
200
- |Avg. |22.43|
201
- |IFEval (0-Shot) |30.09|
202
- |BBH (3-Shot) |42.03|
203
- |MATH Lvl 5 (4-Shot)| 0.83|
204
- |GPQA (0-shot) |11.30|
205
- |MuSR (0-shot) |14.47|
206
- |MMLU-PRO (5-shot) |35.85|
207
 
 
126
 
127
  ## Benchmarks and Leaderboard Rankings
128
 
 
 
129
  Creative Writing V2 - Score: 82.64 (Rank 1)
130
 
131
  https://eqbench.com/creative_writing.html
 
136
 
137
  ![Leaderboard](https://i.imgur.com/gJd9Pab.png)
138
 
139
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
140
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lemon07r__Gemma-2-Ataraxy-9B)
141
+
142
+ | Metric |Value|
143
+ |-------------------|----:|
144
+ |Avg. |22.43|
145
+ |IFEval (0-Shot) |30.09|
146
+ |BBH (3-Shot) |42.03|
147
+ |MATH Lvl 5 (4-Shot)| 0.83|
148
+ |GPQA (0-shot) |11.30|
149
+ |MuSR (0-shot) |14.47|
150
+ |MMLU-PRO (5-shot) |35.85|
151
+
152
+
153
  ## Preface and Rambling
154
 
155
  My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still liked the original SPPO finetune just a bit more.
 
204
  model: nbeerbower/gemma2-gutenberg-9B
205
  ```
206
 
 
 
 
 
 
 
 
 
 
 
 
 
207