lemon07r commited on
Commit
3f3e275
1 Parent(s): 37f96d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -31,9 +31,9 @@ Use Gemma 2 format.
31
 
32
  ## Preface and Rambling
33
 
34
- My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still like the original SPPO finetune a bit more, I think because the gutenberg finetune may have been slightly overfit on the gutenberg dataset.
35
 
36
- Someone suggested that merging the base model on top of the gutenberg may help with the overfitting, which gave me a (possibly) better idea; slerp merging the SimPO finetune on top of the Gutenberg finetune, which is similar to the pretty popular Nephilim v3 recipe, using the Gutenberg finetune in place of the SPPO model, which I thought may give us better results since Gutenberg was trained on top of SPPO.
37
 
38
  I wasn't entirely too sure, since if nephilim v3 is anything to go by, it was probably going to also end up worse than the parent models. Normally when I try merges like these, they dont go anywhere. I'm pretty picky, and very skeptical usually, so most times I find that the merge is usually just not better than the original models or only marginally better. Tried this merge anyways to see how it goes, and much to my surprise, this time, I feel like I got very good results. Figured I'd share, and hopefully this wont be just me introducing more useless slop into a world that already has way too many unnecessary merges.
39
 
@@ -47,7 +47,7 @@ We use gutenberg 9b, which is finetuned over SPPO cause of how good the gutenber
47
 
48
  See https://github.com/arcee-ai/mergekit/issues/390
49
 
50
- Model is not actually 10b, mergekit is randomly adding lm_head for some reason when doing SLERP merge with Gemma 2 models. I believe Nephilim v3 had a similar issue before the used some sort of workaround that I'm not aware of. Doesn't seem like this affects the GGUF quants, as they're the correct size, so I will leave it as is until mergekit gets a commit that addresses this issue.
51
 
52
  ## Merge Details
53
  ### Merge Method
 
31
 
32
  ## Preface and Rambling
33
 
34
+ My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still liked the original SPPO finetune just a bit more.
35
 
36
+ Someone suggested that merging the base model on top of the gutenberg may help with tame it back down, which gave me a (possibly) better idea; slerp merging the SimPO finetune on top of the Gutenberg finetune, which is similar to the pretty popular Nephilim v3 recipe, using the Gutenberg finetune in place of the SPPO model, which I thought may give us better results since Gutenberg was trained on top of SPPO.
37
 
38
  I wasn't entirely too sure, since if nephilim v3 is anything to go by, it was probably going to also end up worse than the parent models. Normally when I try merges like these, they dont go anywhere. I'm pretty picky, and very skeptical usually, so most times I find that the merge is usually just not better than the original models or only marginally better. Tried this merge anyways to see how it goes, and much to my surprise, this time, I feel like I got very good results. Figured I'd share, and hopefully this wont be just me introducing more useless slop into a world that already has way too many unnecessary merges.
39
 
 
47
 
48
  See https://github.com/arcee-ai/mergekit/issues/390
49
 
50
+ Model is not actually 10b, mergekit is randomly adding lm_head for some reason when doing SLERP merge with Gemma 2 models. I believe Nephilim v3 had a similar issue before they used some sort of workaround that I'm not aware of. Doesn't seem like this affects the GGUF quants, as they're the correct size, so I will leave it as is until mergekit gets a commit that addresses this issue.
51
 
52
  ## Merge Details
53
  ### Merge Method