--- base_model: - princeton-nlp/gemma-2-9b-it-SimPO - nbeerbower/gemma2-gutenberg-9B library_name: transformers tags: - mergekit - merge license: gemma --- # Gemma-2-Ataraxy-9B ![Ataraxy](https://i.imgur.com/aP03a5d.png) ## GGUF Quants Huge thanks to [@mradermacher](https://huggingface.co/mradermacher) and [@bartowski](https://huggingface.co/bartowski) for making these GGUF quants available to us. Bartowski quants (imatrix): [bartowski/Gemma-2-Ataraxy-9B-GGUF](https://huggingface.co/bartowski/Gemma-2-Ataraxy-9B-GGUF) Mradermacher quants (static): [mradermacher/Gemma-2-Ataraxy-9B-GGUF](https://huggingface.co/lemon07r/Gemma-2-Ataraxy-9B) Mradermacher quants (imatrix): [mradermacher/Gemma-2-Ataraxy-9B-i1-GGUF](https://huggingface.co/mradermacher/Gemma-2-Ataraxy-9B-GGUF) I think bartowski and mradermacher use different calibration data for imatrix quants, or maybe you prefer static quants. Pick your poison :). ## Format Use Gemma 2 format. ## Preface and Rambling My favorite Gemma 2 9B models are the SPPO iter3 and SimPO finetunes, but I felt the slerp merge between the two (nephilim v3) wasn't as good for some reason. The Gutenberg Gemma 2 finetune by nbeerbower is another my favorites. It's trained on one of my favorite datasets, and actually improves the SPPO model's average openllm leaderboard 2 average score by a bit, on top of improving it's writing capabilities and making the LLM sound less AI-like. However I still like the original SPPO finetune a bit more, I think because the gutenberg finetune may have been slightly overfit on the gutenberg dataset. Someone suggested that merging the base model on top of the gutenberg may help with the overfitting, which gave me a (possibly) better idea; slerp merging the SimPO finetune on top of the Gutenberg finetune, which is similar to the pretty popular Nephilim v3 recipe, using the Gutenberg finetune in place of the SPPO model, which I thought may give us better results since Gutenberg was trained on top of SPPO. I wasn't entirely too sure, since if nephilim v3 is anything to go by, it was probably going to also end up worse than the parent models. Normally when I try merges like these, they dont go anywhere. I'm pretty picky, and very skeptical usually, so most times I find that the merge is usually just not better than the original models or only marginally better. Tried this merge anyways to see how it goes, and much to my surprise, this time, I feel like I got very good results. Figured I'd share, and hopefully this wont be just me introducing more useless slop into a world that already has way too many unnecessary merges. If you're looking for a mistral nemo 12B model instead, I HIGHLY recommend Mistral Nemo Gutenberg v2 by nbeerbower. It's head and shoulders above the many other mistral nemo finetunes I've tried (the first version of mistral nemo gutenburg, romulus simpo, and magnum mini 1.1 being close second favorites). ## Why is it 10b?? See https://github.com/arcee-ai/mergekit/issues/390 Model is not actually 10b, mergekit is randomly adding lm_head for some reason when doing SLERP merge with Gemma 2 models. I believe Nephilim v3 had a similar issue before the used some sort of workaround that I'm not aware of. Doesn't seem like this affects the GGUF quants, as they're the correct size, so I will leave it as is until mergekit gets a commit that addresses this issue. ## Merge Details ### Merge Method This model was merged using the SLERP merge method. ### Models Merged The following models were included in the merge: * [princeton-nlp/gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) * [nbeerbower/gemma2-gutenberg-9B](https://huggingface.co/nbeerbower/gemma2-gutenberg-9B) ### Configuration The following YAML configuration was used to produce this model: ```yaml base_model: nbeerbower/gemma2-gutenberg-9B dtype: bfloat16 merge_method: slerp parameters: t: - filter: self_attn value: [0.0, 0.5, 0.3, 0.7, 1.0] - filter: mlp value: [1.0, 0.5, 0.7, 0.3, 0.0] - value: 0.5 slices: - sources: - layer_range: [0, 42] model: princeton-nlp/gemma-2-9b-it-SimPO - layer_range: [0, 42] model: nbeerbower/gemma2-gutenberg-9B ```