Roleplay Quantization in EXL2 format for Magnum v1

Quantized using the cleaned PIPPA roleplay dataset.

2.25bpw6h quants (tested and working on a single RTX 3090 24GiB at 16k context length)
2.4bpw6h quants (may not load on 24GiB VRAM machines!)
3.0bpw8h quants
4.0bpw8h quants (tested and working on two 3090s at 32k context/cache)
4.4bpw8h quants (tested and working on two 3090s at 32k context, 64k Q4 cache (for CFG or parallelism) with tabbyAPI)
4.5bpw8h quants
6.0bpw8h quants
8.0bpw8h quants

All tests performed on a headless Linux instance with no active desktop environment to maximize VRAM.

Other quants available on request, feel free to ask!

See original model for further details.

Original Model card

This is the first in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Qwen-2 72B Instruct.

Prompting

Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Credits

This model has been a team effort, credits go to:

Sao10K for help with (and cleaning up!) the dataset.
alpindale for the training.
kalomaze for helping with the hyperparameter tuning.
Various other people for their continued help as we tuned the parameters, restarted failed runs. In no particular order: Doctor Shotgun, Lucy, Nopm, Mango, and the rest of the Silly Tilly.

And last but not least, we'd like to thank Kearm for sponsoring the compute needed to train this model.

Training

The training was done with 55 million tokens of high-quality RP data, over 1.5 epochs. We used 8x AMD Instinct™ MI300X Accelerators for the full-parameter fine-tuning of the model.

Safety

...