(GGUFs)
Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.
This is still early stage. As always, feedback is welcome, and begone if you demand perfection.
The second stage, like the Sunfall series, follows the Silly Tavern preset (Mistral V2 & V3, though V3-Tekken works fine), so ymmv in particular if you use some other tool and/or preset.
Parameter suggestions:
I did all my testing with temp 1, min-p 0.1, DRY 0.8.
Training details:
- Stage 1 (continued pretraining)
- Target: mistralai/Mistral-Nemo-Base-2407 (resulting LoRA merged into mistralai/Mistral-Nemo-Instruct-2407)
- LoRA dropout 0.5 (motivation)
- LoRA rank 64, alpha 128 (motivation)
- LR cosine 4e-6
- LoRA+ with LR Ratio: 15
- Context size: 16384
- Gradient accumulation steps: 4
- Epochs: 1
- Stage 2 (fine tune)
- Target: Stage 1 model
- LoRA dropout 0.5
- LoRA rank 32, alpha 64
- LR cosine 5e-6 (min 5e-7)
- LoRA+ with LR Ratio: 15
- Context size: 16384
- Gradient accumulation steps: 4
- Epochs: 2
Merge Details
Merge Method
This model was merged using the TIES merge method using mistralai/Mistral-Nemo-Base-2407 as a base.
Configuration
The following YAML configuration was used to produce this model:
models:
- model: stage1-on-instruct
parameters:
weight: 1
density: 1
- model: stage2-on-stage1
parameters:
weight: 0.7
density: 1
- model: mistralai/Mistral-Nemo-Instruct-2407
parameters:
weight: 1
density: 1
merge_method: ties
base_model: mistralai/Mistral-Nemo-Base-2407
parameters:
weight: 1
density: 1
normalize: true
int8_mask: true
tokenizer_source: mistralai/Mistral-Nemo-Instruct-2407
dtype: bfloat16
- Downloads last month
- 66