AgoraMix is crafted using semi-automated proesses to merge top-performing models.
Note: This model was a response to seeing density parameters on model_stock merges on huggingface, an experiment to see their results. Meanwhile, checking the main branch of mergekit's merge_methods reveals that these parameters are unused. I don't have a fork that would allow that. So, this is a normal model_stock merge, but its performance is still promising.
Ancestor Models
VAGOsolutions/SauerkrautLM-v2-14b-DPO - Solid instruction-following and problem-solving capabilities.
arcee-ai/SuperNova-Medius - Brings a large knowledge base distilled from Llama 3.1 into the Qwen architecture. Middling density and gentle weighting were used to draw on the knowledge pool while keeping behavior more predictable.
CultriX/Qwen2.5-14B-Wernicke - A highly emphasized ancestor merge, because its problem-solving, factuality, and comprehension are exceptional for the model size. Shout out to CultriX whose methods helped inspire this merge.
rombodawg/Rombos-LLM-V2.6-Qwen-14b - Lightly applied to enhance reasoning abilities.
underwoods/medius-erebus-magnum-14b - Subtly incorporated to improve prose quality.
Models Merged
The following YAML configuration was used to produce this model:
merge_method: model_stock
base_model: Qwen/Qwen2.5-14B
tokenizer_source: base
parameters:
int8_mask: false
normalize: true
rescale: false
models:
- model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
- model: arcee-ai/SuperNova-Medius
- model: CultriX/Qwen2.5-14B-Wernicke
- model: rombodawg/Rombos-LLM-V2.6-Qwen-14b
- model: underwoods/medius-erebus-magnum-14b
dtype: bfloat16
out_dtype: bfloat16
- Downloads last month
- 9