File size: 563 Bytes
8f7378e a032368 a6fa752 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Oh no, he's dumb too! I have a working hypothesis. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required.
### Recipe
merge_method: dare_ties
- base_model: athirdpath/BigMistral-13b
- model: athirdpath/CleverMage-Mistral-13b
weight: 0.60 / density: 0.35
- model: athirdpath/CleverMage-Mistral-13b-INV
weight: 0.40 / density: 0.30
int8_mask: true
dtype: bfloat16 |