Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,4 @@
|
|
|
|
1 |
---
|
2 |
base_model:
|
3 |
- princeton-nlp/gemma-2-9b-it-SimPO
|
@@ -14,16 +15,19 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
14 |
|
15 |
## GGUF Quants
|
16 |
|
17 |
-
Huge thanks to @mradermacher and @bartowski for making these GGUF quants available to us.
|
18 |
|
19 |
-
Bartowski quants (imatrix): bartowski/Gemma-2-Ataraxy-9B-GGUF
|
20 |
|
21 |
-
Mradermacher quants (static): mradermacher/Gemma-2-Ataraxy-9B-GGUF
|
22 |
|
23 |
-
Mradermacher quants (imatrix): mradermacher/Gemma-2-Ataraxy-9B-i1-GGUF
|
24 |
|
25 |
I think bartowski and mradermacher use different calibration data for imatrix quants, or maybe you prefer static quants. Pick your poison :).
|
26 |
|
|
|
|
|
|
|
27 |
|
28 |
## Preface and Rambling
|
29 |
|
@@ -33,7 +37,7 @@ Someone suggested that merging the base model on top of the gutenberg may help w
|
|
33 |
|
34 |
I wasn't entirely too sure, since if nephilim v3 is anything to go by, it was probably going to also end up worse than the parent models. Normally when I try merges like these, they dont go anywhere. I'm pretty picky, and very skeptical usually, so most times I find that the merge is usually just not better than the original models or only marginally better. Tried this merge anyways to see how it goes, and much to my surprise, this time, I feel like I got very good results. Figured I'd share, and hopefully this wont be just me introducing more useless slop into a world that already has way too many unnecessary merges.
|
35 |
|
36 |
-
If you're looking for a mistral nemo 12B model instead, I HIGHLY recommend Mistral Nemo Gutenberg v2 by nbeerbower. It's head and shoulders above the many other mistral nemo finetunes I've tried (romulus simpo and magnum mini 1.1 being close second favorites).
|
37 |
|
38 |
## Why is it 10b??
|
39 |
|
@@ -41,10 +45,6 @@ See https://github.com/arcee-ai/mergekit/issues/390
|
|
41 |
|
42 |
Model is not actually 10b, mergekit is randomly adding lm_head for some reason when doing SLERP merge with Gemma 2 models. I believe Nephilim v3 had a similar issue before the used some sort of workaround that I'm not aware of. Doesn't seem like this affects the GGUF quants, as they're the correct size, so I will leave it as is until mergekit gets a commit that addresses this issue.
|
43 |
|
44 |
-
## Format
|
45 |
-
|
46 |
-
Use Gemma 2 format.
|
47 |
-
|
48 |
## Merge Details
|
49 |
### Merge Method
|
50 |
|
@@ -77,4 +77,4 @@ slices:
|
|
77 |
model: princeton-nlp/gemma-2-9b-it-SimPO
|
78 |
- layer_range: [0, 42]
|
79 |
model: nbeerbower/gemma2-gutenberg-9B
|
80 |
-
```
|
|
|
1 |
+
|
2 |
---
|
3 |
base_model:
|
4 |
- princeton-nlp/gemma-2-9b-it-SimPO
|
|
|
15 |
|
16 |
## GGUF Quants
|
17 |
|
18 |
+
Huge thanks to [@mradermacher](https://huggingface.co/mradermacher) and [@bartowski](https://huggingface.co/bartowski) for making these GGUF quants available to us.
|
19 |
|
20 |
+
Bartowski quants (imatrix): [bartowski/Gemma-2-Ataraxy-9B-GGUF](https://huggingface.co/bartowski/Gemma-2-Ataraxy-9B-GGUF)
|
21 |
|
22 |
+
Mradermacher quants (static): [mradermacher/Gemma-2-Ataraxy-9B-GGUF](https://huggingface.co/lemon07r/Gemma-2-Ataraxy-9B)
|
23 |
|
24 |
+
Mradermacher quants (imatrix): [mradermacher/Gemma-2-Ataraxy-9B-i1-GGUF](https://huggingface.co/mradermacher/Gemma-2-Ataraxy-9B-GGUF)
|
25 |
|
26 |
I think bartowski and mradermacher use different calibration data for imatrix quants, or maybe you prefer static quants. Pick your poison :).
|
27 |
|
28 |
+
## Format
|
29 |
+
|
30 |
+
Use Gemma 2 format.
|
31 |
|
32 |
## Preface and Rambling
|
33 |
|
|
|
37 |
|
38 |
I wasn't entirely too sure, since if nephilim v3 is anything to go by, it was probably going to also end up worse than the parent models. Normally when I try merges like these, they dont go anywhere. I'm pretty picky, and very skeptical usually, so most times I find that the merge is usually just not better than the original models or only marginally better. Tried this merge anyways to see how it goes, and much to my surprise, this time, I feel like I got very good results. Figured I'd share, and hopefully this wont be just me introducing more useless slop into a world that already has way too many unnecessary merges.
|
39 |
|
40 |
+
If you're looking for a mistral nemo 12B model instead, I HIGHLY recommend Mistral Nemo Gutenberg v2 by nbeerbower. It's head and shoulders above the many other mistral nemo finetunes I've tried (the first version of mistral nemo gutenburg, romulus simpo, and magnum mini 1.1 being close second favorites).
|
41 |
|
42 |
## Why is it 10b??
|
43 |
|
|
|
45 |
|
46 |
Model is not actually 10b, mergekit is randomly adding lm_head for some reason when doing SLERP merge with Gemma 2 models. I believe Nephilim v3 had a similar issue before the used some sort of workaround that I'm not aware of. Doesn't seem like this affects the GGUF quants, as they're the correct size, so I will leave it as is until mergekit gets a commit that addresses this issue.
|
47 |
|
|
|
|
|
|
|
|
|
48 |
## Merge Details
|
49 |
### Merge Method
|
50 |
|
|
|
77 |
model: princeton-nlp/gemma-2-9b-it-SimPO
|
78 |
- layer_range: [0, 42]
|
79 |
model: nbeerbower/gemma2-gutenberg-9B
|
80 |
+
```
|