File size: 563 Bytes
8f7378e
a032368
a6fa752
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Oh no, he's dumb too! I have a working hypothesis. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required.

### Recipe
merge_method: dare_ties

  - base_model: athirdpath/BigMistral-13b

  - model: athirdpath/CleverMage-Mistral-13b
   
      weight: 0.60 / density: 0.35
    
  - model: athirdpath/CleverMage-Mistral-13b-INV
   
      weight: 0.40 / density: 0.30

int8_mask: true

dtype: bfloat16