README.md · crestf411/MN-Slush at main

metadata

library_name: transformers
tags:
  - not-for-all-audiences
  - mergekit
datasets:
  - crestf411/LimaRP-DS
  - Gryphe/Sonnet3.5-Charcard-Roleplay
  - anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system
  - anthracite-org/kalo-opus-instruct-22k-no-refusal-no-system
  - anthracite-org/kalo-opus-instruct-3k-filtered-no-system
  - anthracite-org/nopm_claude_writing_fixed
base_model:
  - mistralai/Mistral-Nemo-Instruct-2407

(GGUFs)

Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.

This is still early stage. As always, feedback is welcome, and begone if you demand perfection.

The second stage, like the Sunfall series, follows the Silly Tavern preset (Mistral V2 & V3, though V3-Tekken works fine), so ymmv in particular if you use some other tool and/or preset.

Parameter suggestions:

I did all my testing with temp 1, min-p 0.1, DRY 0.8.

Training details:

Stage 1 (continued pretraining)
- Target: mistralai/Mistral-Nemo-Base-2407 (resulting LoRA merged into mistralai/Mistral-Nemo-Instruct-2407)
- LoRA dropout 0.5 (motivation)
- LoRA rank 64, alpha 128 (motivation)
- LR cosine 4e-6
- LoRA+ with LR Ratio: 15
- Context size: 16384
- Gradient accumulation steps: 4
- Epochs: 1
Stage 2 (fine tune)
- Target: Stage 1 model
- LoRA dropout 0.5
- LoRA rank 32, alpha 64
- LR cosine 5e-6 (min 5e-7)
- LoRA+ with LR Ratio: 15
- Context size: 16384
- Gradient accumulation steps: 4
- Epochs: 2

Merge Details

Merge Method

This model was merged using the TIES merge method using mistralai/Mistral-Nemo-Base-2407 as a base.

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: stage1-on-instruct
    parameters:
      weight: 1
      density: 1
  - model: stage2-on-stage1
    parameters:
      weight: 0.7
      density: 1
  - model: mistralai/Mistral-Nemo-Instruct-2407
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: mistralai/Mistral-Nemo-Base-2407
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
tokenizer_source: mistralai/Mistral-Nemo-Instruct-2407
dtype: bfloat16