---
base_model:
- Sao10K/Fimbulvetr-11B-v2
library_name: transformers
tags:
- mergekit
- merge
- mistral
- roleplay
license: apache-2.0
pipeline_tag: text-generation
---

![cute](https://huggingface.co/matchaaaaa/Chaifighter-20B-v3/resolve/main/chaifighter-v3-cute.png)

# Chaifighter-v3-20B

Meet Chaifighter-v3! A flagship frankenmerge blend brewed with love by yours truly! 

Chaifighter-v3 brings back the hyper-attention of the original, vastly expands on the fixes used to make v2 usable, and is built on a modified [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B). Moreover, it RoPEs up to 8K perfectly, and should work well at 12K and beyond.

*Native Context Length: 4K/4096 (can be extended to 8K/8192 or more with RoPE)* 

## Prompt Template: Alpaca/Alpaca-based

```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
```

## Recommended Settings: Universal-Light

Here are some setting ranges that tend to work for my models. I used these when testing, and they're pretty safe bets. Feel free to tweak according to taste or do whatever you want (but maybe it might maybe break, maybe).

* Temperature:        **1.0** to **1.25**
* Min-P:              **0.05** to **0.1**
* Repetition Penalty: **1.05** *to* **1.1**
* Rep. Penalty Range: **256** *or* **512**
* *(all other samplers disabled)*

## The Deets

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

### Merge Method

This model was merged using the passthrough merge method.

### Models Merged

The following models were included in the merge:

* [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) 
* Pop-Taro-11B, *a variation of [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B)*
  * [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
  * [crestf411/daybreak-kunoichi-2dpo-7b](https://huggingface.co/crestf411/daybreak-kunoichi-2dpo-7b)
  * [KatyTheCutie/LemonadeRP-4.5.3](https://huggingface.co/KatyTheCutie/LemonadeRP-4.5.3)
  * [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
  * [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
  * [Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora](https://huggingface.co/Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora)

### The Special Sauce

The following YAML configuration was used to produce this model:

```yaml
slices: # modified Big-Lemon-Cookie recipe
  - sources:
    - model: SanjiWatsuki/Kunoichi-7B
      layer_range: [0, 24]
  - sources:
    - model: crestf411/daybreak-kunoichi-2dpo-7b # this was Silicon-Maid in the OG
      layer_range: [8, 24]
  - sources:
    - model: KatyTheCutie/LemonadeRP-4.5.3
      layer_range: [24, 32]
merge_method: passthrough
dtype: float32
name: pre-Taro-11B
---
models:  # this is what FallenMerick did for the Chunky-Lemon-Cookie
  - model: pre-Taro-11B
    parameters:
      weight: 0.85
  - model: Sao10K/Fimbulvetr-11B-v2
    parameters:
      weight: 0.15
merge_method: linear
dtype: float32
name: Taro-11B
---
models: # further healing with PEFT qLoRA, thanks undi 
  - model: Taro-11B
    parameters:
      weight: 0.68 # these values were a good balance
  - model: Taro-11B+Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora # picked PIPPA because I'm old school 
    parameters:
      weight: 0.32 # good balance pt. 2
merge_method: linear
dtype: float32
name: Pop-Taro-11B
---
slices: # this is the really cursed part 
  - sources: 
    - model: Sao10K/Fimbulvetr-11B-v2
      layer_range: [0, 40] 
  - sources:
    - model: Pop-Taro-11B # probably will release this later especially if it's good on its own and there's interest for it
      layer_range: [0, 48] #  includes the first 8 layers to boost attention, why does it worK???
merge_method: passthrough
dtype: float32
name: Chaifighter-v3-20B
```

All merging was done at float32 precision to minimize quality loss. 

### The Thought Process
**Alternate title: "Input Layers Placed Halfway Through Your Frankenmerge Is All You Need"**

Note: much of this is conjecture. Thanks to [@ToastyPigeon](https://huggingface.co/ToastyPigeon) and the "Jeb's mad science 11B and 16B" thread on the Kobold discord. Without them, my understanding of this model would be much, much worse. Their help and insights were crucial in making this model happen!

This model started with the original recipe. According to everything my friends and I know, it just shouldn't have worked nearly as well as it did. I wondered what it would take to make it work, and as it turns out, it was the repeated Mistral "output layers" (meaning, the last 8 or so hidden layers) that caused most of the model's trouble. There was still stack damage, though. Essentially, this is a 7B base model expanded to 19.5B parameters. If that sounds like a lot, that's because it is a lot. 
One of the core reasons it works, we believe, is because of [Fimbulvetr-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) being based on [SOLAR 10.7B](https://huggingface.co/upstage/SOLAR-10.7B-v1.0). When SOLAR was being made, it received finetuning after being stacked up to 48 layers to heal the "stack damage". We think that this finetuning helped differentiate the layers enough from Mistral for the "input layers" (the first 8 or so hidden layers) for the whole model to actually function. Jeb's mad lads did a lot of testing and have concluded that one of the countless ways to break a model is to repeat these "input layers", and well, apparently (evidently) SOLAR somehow allows this cursedness to work.
As a side note, [Fimbulvetr-v2.1-16K](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K) was also tested in this merge. For some reason, it just wasn't happy there, and it caused all kinds of problems and a first-person bias (which I thought might be annoying to most people). 
[Kunoichi](https://huggingface.co/SanjiWatsuki/Kunoichi-7B) has always been one of my favorites because of how great it is at prompt-following and awareness. That's why its second set of "input layers" are chosen. [@Kromeurus](https://huggingface.co/kromeurus) recommended [daybreak-kunoichi-2dpo-7b](https://huggingface.co/crestf411/daybreak-kunoichi-2dpo-7b), which was trained on the [Storyteller's Diamond Law](https://files.catbox.moe/d15m3g.txt) and in theory should increase the model's knowledge a little. [LemonadeRP-4.5.3](KatyTheCutie/LemonadeRP-4.5.3) is a solid performer as well, being part of [@FallenMerick's](https://huggingface.co/FallenMerick) [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B) and by extension, [Honey-Yuzu-13B](https://huggingface.co/matchaaaaa/Honey-Yuzu-13B) (by me). It was also part of Chaifighter-v2's recipe, and as such, v3's writing should be familiar (in a good way :skull:) for those who liked v2. 
Finally, this second stack of models was merged with Fimbulvetr-v2 to help heal the stack damage. In theory, this helps "smoothen" the layers together and make the model more put together overall. I went further with this idea by using [a PIPPA qLoRA trained for a Mistral 11B DUS stack called OmniMix](https://huggingface.co/Undi95/Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora). I played with the values to ensure the end result was sufficiently stable without being overpowered by PIPPA.  
There was a lot of trial and error involved in the creation of this model. Additionally, many many great minds helped shape this model. Thank you very much to everyone who kindly gave feedback, encouragement, or helped in any other way, big or small, with the development of this model and any past model. I really, really appreciate it. <3

And thank YOU for taking the time to read this and for checking out my model!

Have feedback? Comments? Questions? Don't hesitate to let me know! As always, have a fantastice day, and remember to take care of yourself! :)