Oh no, he's dumb too! I have a working hypothesis. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required. ### Recipe merge_method: dare_ties - base_model: athirdpath/BigMistral-13b - model: athirdpath/CleverMage-Mistral-13b weight: 0.60 / density: 0.35 - model: athirdpath/CleverMage-Mistral-13b-INV weight: 0.40 / density: 0.30 int8_mask: true dtype: bfloat16