matchaaaaa commited on
Commit
4eba6af
1 Parent(s): 15fc2b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -129
README.md CHANGED
@@ -1,130 +1,130 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
- ---
8
-
9
- ![cute](https://huggingface.co/matchaaaaa/Chaifighter-v3-20B/resolve/main/chaifighter-v3-cute.png)
10
-
11
- # Chaifighter-v3-20B
12
-
13
- Meet Chaifighter-v3! A flagship frankenmerge blend brewed with love by yours truly!
14
-
15
- Chaifighter-v3 brings back the hyper-attention of the original, vastly expands on the fixes used to make v2 usable, and is built on a modified [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B). Moreover, it RoPEs up to 8K perfectly, and should work well at 12K and beyond.
16
-
17
- *Native Context Length: 4K/4096 (can be extended to 8K/8192 or more with RoPE)*
18
-
19
- ## Prompt Template: Alpaca/Alpaca-based
20
-
21
- ```
22
- Below is an instruction that describes a task. Write a response that appropriately completes the request.
23
-
24
- ### Instruction:
25
- {prompt}
26
-
27
- ### Response:
28
- ```
29
-
30
- ## Recommended Settings: Universal-Light
31
-
32
- Here are some setting ranges that tend to work for my models. I used these when testing, and they're pretty safe bets. Feel free to tweak according to taste or do whatever you want (but maybe it might maybe break, maybe).
33
-
34
- * Temperature: **1.0** to **1.25**
35
- * Min-P: **0.05** to **0.1**
36
- * Repetition Penalty: **1.05** *to* **1.1**
37
- * Rep. Penalty Range: **256** *or* **512**
38
- * *(all other samplers disabled)*
39
-
40
- ## The Deets
41
-
42
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
43
-
44
- ### Merge Method
45
-
46
- This model was merged using the passthrough merge method.
47
-
48
- ### Models Merged
49
-
50
- The following models were included in the merge:
51
-
52
- * [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
53
- * Pop-Taro-11B, *a variation of [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B)*
54
- * [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
55
- * [crestf411/daybreak-kunoichi-2dpo-7b](https://huggingface.co/crestf411/daybreak-kunoichi-2dpo-7b)
56
- * [KatyTheCutie/LemonadeRP-4.5.3](https://huggingface.co/KatyTheCutie/LemonadeRP-4.5.3)
57
- * [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
58
- * [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
59
- * [Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora](https://huggingface.co/Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora)
60
-
61
- ### The Special Sauce
62
-
63
- The following YAML configuration was used to produce this model:
64
-
65
- ```yaml
66
- slices: # modified Big-Lemon-Cookie recipe
67
- - sources:
68
- - model: SanjiWatsuki/Kunoichi-7B
69
- layer_range: [0, 24]
70
- - sources:
71
- - model: crestf411/daybreak-kunoichi-2dpo-7b # this was Silicon-Maid in the OG
72
- layer_range: [8, 24]
73
- - sources:
74
- - model: KatyTheCutie/LemonadeRP-4.5.3
75
- layer_range: [24, 32]
76
- merge_method: passthrough
77
- dtype: float32
78
- name: pre-Taro-11B
79
- ---
80
- models: # this is what FallenMerick did for the Chunky-Lemon-Cookie
81
- - model: pre-Taro-11B
82
- parameters:
83
- weight: 0.85
84
- - model: Sao10K/Fimbulvetr-11B-v2
85
- parameters:
86
- weight: 0.15
87
- merge_method: linear
88
- dtype: float32
89
- name: Taro-11B
90
- ---
91
- models: # further healing with PEFT qLoRA, thanks undi
92
- - model: Taro-11B
93
- parameters:
94
- weight: 0.68 # these values were a good balance
95
- - model: Taro-11B+Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora # picked PIPPA because I'm old school
96
- parameters:
97
- weight: 0.32 # good balance pt. 2
98
- merge_method: linear
99
- dtype: float32
100
- name: Pop-Taro-11B
101
- ---
102
- slices: # this is the really cursed part
103
- - sources:
104
- - model: Sao10K/Fimbulvetr-11B-v2
105
- layer_range: [0, 40]
106
- - sources:
107
- - model: Pop-Taro-11B # probably will release this later especially if it's good on its own and there's interest for it
108
- layer_range: [0, 48] # includes the first 8 layers to boost attention, why does it worK???
109
- merge_method: passthrough
110
- dtype: float32
111
- name: Chaifighter-v3-20B
112
- ```
113
-
114
- All merging was done at float32 precision to minimize quality loss.
115
-
116
- ### The Thought Process
117
- **Alternate title: "Input Layers Placed Halfway Through Your Frankenmerge Is All You Need"**
118
-
119
- Note: much of this is conjecture. Thanks to [@ToastyPigeon](https://huggingface.co/ToastyPigeon) and the "Jeb's mad science 11B and 16B" thread on the Kobold discord. Without them, my understanding of this model would be much, much worse. Their help and insights were crucial in making this model happen!
120
-
121
- This model started with the original recipe. According to everything my friends and I know, it just shouldn't have worked nearly as well as it did. I wondered what it would take to make it work, and as it turns out, it was the repeated Mistral "output layers" (meaning, the last 8 or so hidden layers) that caused most of the model's trouble. There was still stack damage, though. Essentially, this is a 7B base model expanded to 19.5B parameters. If that sounds like a lot, that's because it is a lot.
122
- One of the core reasons it works, we believe, is because of [Fimbulvetr-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) being based on [SOLAR 10.7B](https://huggingface.co/upstage/SOLAR-10.7B-v1.0). When SOLAR was being made, it received finetuning after being stacked up to 48 layers to heal the "stack damage". We think that this finetuning helped differentiate the layers enough from Mistral for the "input layers" (the first 8 or so hidden layers) for the whole model to actually function. Jeb's mad lads did a lot of testing and have concluded that one of the countless ways to break a model is to repeat these "input layers", and well, apparently (evidently) SOLAR somehow allows this cursedness to work.
123
- As a side note, [Fimbulvetr-v2.1-16K](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K) was also tested in this merge. For some reason, it just wasn't happy there, and it caused all kinds of problems and a first-person bias (which I thought might be annoying to most people).
124
- [Kunoichi](https://huggingface.co/SanjiWatsuki/Kunoichi-7B) has always been one of my favorites because of how great it is at prompt-following and awareness. That's why its second set of "input layers" are chosen. [@Kromeurus](https://huggingface.co/kromeurus) recommended [daybreak-kunoichi-2dpo-7b](https://huggingface.co/crestf411/daybreak-kunoichi-2dpo-7b), which was trained on the [Storyteller's Diamond Law](https://files.catbox.moe/d15m3g.txt) and in theory should increase the model's knowledge a little. [LemonadeRP-4.5.3](KatyTheCutie/LemonadeRP-4.5.3) is a solid performer as well, being part of [@FallenMerick's](https://huggingface.co/FallenMerick) [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B) and by extension, [Honey-Yuzu-13B](https://huggingface.co/matchaaaaa/Honey-Yuzu-13B) (by me). It was also part of Chaifighter-v2's recipe, and as such, v3's writing should be familiar (in a good way :skull:) for those who liked v2.
125
- Finally, this second stack of models was merged with Fimbulvetr-v2 to help heal the stack damage. In theory, this helps "smoothen" the layers together and make the model more put together overall. I went further with this idea by using [a PIPPA qLoRA trained for a Mistral 11B DUS stack called OmniMix](https://huggingface.co/Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora). I played with the values to ensure the end result was sufficiently stable without being overpowered by PIPPA.
126
- There was a lot of trial and error involved in the creation of this model. Additionally, many many great minds helped shape this model. Thank you very much to everyone who kindly gave feedback, encouragement, or helped in any other way, big or small, with the development of this model and any past model. I really, really appreciate it. <3
127
-
128
- And thank YOU for taking the time to read this and for checking out my model!
129
-
130
  Have feedback? Comments? Questions? Don't hesitate to let me know! As always, have a fantastice day, and remember to take care of yourself! :)
 
1
+ ---
2
+ base_model: [Sao10K/Fimbulvetr-11B-v2]
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+ ---
8
+
9
+ ![cute](https://huggingface.co/matchaaaaa/Chaifighter-v3-20B/resolve/main/chaifighter-v3-cute.png)
10
+
11
+ # Chaifighter-v3-20B
12
+
13
+ Meet Chaifighter-v3! A flagship frankenmerge blend brewed with love by yours truly!
14
+
15
+ Chaifighter-v3 brings back the hyper-attention of the original, vastly expands on the fixes used to make v2 usable, and is built on a modified [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B). Moreover, it RoPEs up to 8K perfectly, and should work well at 12K and beyond.
16
+
17
+ *Native Context Length: 4K/4096 (can be extended to 8K/8192 or more with RoPE)*
18
+
19
+ ## Prompt Template: Alpaca/Alpaca-based
20
+
21
+ ```
22
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
23
+
24
+ ### Instruction:
25
+ {prompt}
26
+
27
+ ### Response:
28
+ ```
29
+
30
+ ## Recommended Settings: Universal-Light
31
+
32
+ Here are some setting ranges that tend to work for my models. I used these when testing, and they're pretty safe bets. Feel free to tweak according to taste or do whatever you want (but maybe it might maybe break, maybe).
33
+
34
+ * Temperature: **1.0** to **1.25**
35
+ * Min-P: **0.05** to **0.1**
36
+ * Repetition Penalty: **1.05** *to* **1.1**
37
+ * Rep. Penalty Range: **256** *or* **512**
38
+ * *(all other samplers disabled)*
39
+
40
+ ## The Deets
41
+
42
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
43
+
44
+ ### Merge Method
45
+
46
+ This model was merged using the passthrough merge method.
47
+
48
+ ### Models Merged
49
+
50
+ The following models were included in the merge:
51
+
52
+ * [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
53
+ * Pop-Taro-11B, *a variation of [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B)*
54
+ * [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
55
+ * [crestf411/daybreak-kunoichi-2dpo-7b](https://huggingface.co/crestf411/daybreak-kunoichi-2dpo-7b)
56
+ * [KatyTheCutie/LemonadeRP-4.5.3](https://huggingface.co/KatyTheCutie/LemonadeRP-4.5.3)
57
+ * [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
58
+ * [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
59
+ * [Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora](https://huggingface.co/Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora)
60
+
61
+ ### The Special Sauce
62
+
63
+ The following YAML configuration was used to produce this model:
64
+
65
+ ```yaml
66
+ slices: # modified Big-Lemon-Cookie recipe
67
+ - sources:
68
+ - model: SanjiWatsuki/Kunoichi-7B
69
+ layer_range: [0, 24]
70
+ - sources:
71
+ - model: crestf411/daybreak-kunoichi-2dpo-7b # this was Silicon-Maid in the OG
72
+ layer_range: [8, 24]
73
+ - sources:
74
+ - model: KatyTheCutie/LemonadeRP-4.5.3
75
+ layer_range: [24, 32]
76
+ merge_method: passthrough
77
+ dtype: float32
78
+ name: pre-Taro-11B
79
+ ---
80
+ models: # this is what FallenMerick did for the Chunky-Lemon-Cookie
81
+ - model: pre-Taro-11B
82
+ parameters:
83
+ weight: 0.85
84
+ - model: Sao10K/Fimbulvetr-11B-v2
85
+ parameters:
86
+ weight: 0.15
87
+ merge_method: linear
88
+ dtype: float32
89
+ name: Taro-11B
90
+ ---
91
+ models: # further healing with PEFT qLoRA, thanks undi
92
+ - model: Taro-11B
93
+ parameters:
94
+ weight: 0.68 # these values were a good balance
95
+ - model: Taro-11B+Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora # picked PIPPA because I'm old school
96
+ parameters:
97
+ weight: 0.32 # good balance pt. 2
98
+ merge_method: linear
99
+ dtype: float32
100
+ name: Pop-Taro-11B
101
+ ---
102
+ slices: # this is the really cursed part
103
+ - sources:
104
+ - model: Sao10K/Fimbulvetr-11B-v2
105
+ layer_range: [0, 40]
106
+ - sources:
107
+ - model: Pop-Taro-11B # probably will release this later especially if it's good on its own and there's interest for it
108
+ layer_range: [0, 48] # includes the first 8 layers to boost attention, why does it worK???
109
+ merge_method: passthrough
110
+ dtype: float32
111
+ name: Chaifighter-v3-20B
112
+ ```
113
+
114
+ All merging was done at float32 precision to minimize quality loss.
115
+
116
+ ### The Thought Process
117
+ **Alternate title: "Input Layers Placed Halfway Through Your Frankenmerge Is All You Need"**
118
+
119
+ Note: much of this is conjecture. Thanks to [@ToastyPigeon](https://huggingface.co/ToastyPigeon) and the "Jeb's mad science 11B and 16B" thread on the Kobold discord. Without them, my understanding of this model would be much, much worse. Their help and insights were crucial in making this model happen!
120
+
121
+ This model started with the original recipe. According to everything my friends and I know, it just shouldn't have worked nearly as well as it did. I wondered what it would take to make it work, and as it turns out, it was the repeated Mistral "output layers" (meaning, the last 8 or so hidden layers) that caused most of the model's trouble. There was still stack damage, though. Essentially, this is a 7B base model expanded to 19.5B parameters. If that sounds like a lot, that's because it is a lot.
122
+ One of the core reasons it works, we believe, is because of [Fimbulvetr-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) being based on [SOLAR 10.7B](https://huggingface.co/upstage/SOLAR-10.7B-v1.0). When SOLAR was being made, it received finetuning after being stacked up to 48 layers to heal the "stack damage". We think that this finetuning helped differentiate the layers enough from Mistral for the "input layers" (the first 8 or so hidden layers) for the whole model to actually function. Jeb's mad lads did a lot of testing and have concluded that one of the countless ways to break a model is to repeat these "input layers", and well, apparently (evidently) SOLAR somehow allows this cursedness to work.
123
+ As a side note, [Fimbulvetr-v2.1-16K](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K) was also tested in this merge. For some reason, it just wasn't happy there, and it caused all kinds of problems and a first-person bias (which I thought might be annoying to most people).
124
+ [Kunoichi](https://huggingface.co/SanjiWatsuki/Kunoichi-7B) has always been one of my favorites because of how great it is at prompt-following and awareness. That's why its second set of "input layers" are chosen. [@Kromeurus](https://huggingface.co/kromeurus) recommended [daybreak-kunoichi-2dpo-7b](https://huggingface.co/crestf411/daybreak-kunoichi-2dpo-7b), which was trained on the [Storyteller's Diamond Law](https://files.catbox.moe/d15m3g.txt) and in theory should increase the model's knowledge a little. [LemonadeRP-4.5.3](KatyTheCutie/LemonadeRP-4.5.3) is a solid performer as well, being part of [@FallenMerick's](https://huggingface.co/FallenMerick) [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B) and by extension, [Honey-Yuzu-13B](https://huggingface.co/matchaaaaa/Honey-Yuzu-13B) (by me). It was also part of Chaifighter-v2's recipe, and as such, v3's writing should be familiar (in a good way :skull:) for those who liked v2.
125
+ Finally, this second stack of models was merged with Fimbulvetr-v2 to help heal the stack damage. In theory, this helps "smoothen" the layers together and make the model more put together overall. I went further with this idea by using [a PIPPA qLoRA trained for a Mistral 11B DUS stack called OmniMix](https://huggingface.co/Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora). I played with the values to ensure the end result was sufficiently stable without being overpowered by PIPPA.
126
+ There was a lot of trial and error involved in the creation of this model. Additionally, many many great minds helped shape this model. Thank you very much to everyone who kindly gave feedback, encouragement, or helped in any other way, big or small, with the development of this model and any past model. I really, really appreciate it. <3
127
+
128
+ And thank YOU for taking the time to read this and for checking out my model!
129
+
130
  Have feedback? Comments? Questions? Don't hesitate to let me know! As always, have a fantastice day, and remember to take care of yourself! :)