athirdpath
/

CleverMage-Mistral-13b-DARE_blended-FAILURE

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

CleverMage-Mistral-13b-DARE_blended-FAILURE / README.md

athirdpath's picture

Update README.md

8f7378e 11 months ago

|

history blame contribute delete

563 Bytes

	Oh no, he's dumb too! I have a working hypothesis. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required.

	### Recipe
	merge_method: dare_ties

	- base_model: athirdpath/BigMistral-13b

	- model: athirdpath/CleverMage-Mistral-13b

	weight: 0.60 / density: 0.35

	- model: athirdpath/CleverMage-Mistral-13b-INV

	weight: 0.40 / density: 0.30

	int8_mask: true

	dtype: bfloat16