GGUF quants
Description
Danube3 500M model finetuned on adamo1139/Fal7acy_4chan_archive_ShareGPT which is essentially 250M tokens of chat data from 4chan, organized in coherent threads, capturing various boards.
ChatML prompt format, use system prompts such as "A chat on 4chan board /3/", "A chat on 4chan board /biz/" etc, as this was trained in.
This is a very small 500M model, so it's not very smart.
Issues
Dataset doesn't have correctly formatted newspaces, so quoted content doesn't format correctly.
Instead of this:
>what did you say
I didn't say nothing
It will print
>what did you say I didn't say nothing
Training details
1 epoch, 16-bit LoRA, 8192 seq length, 256 rank, 256 alpha, rslora True, batch size 8, learning rate 0.00004, embedding learning rate 0.00001, target modules ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj","embed_tokens", "lm_head"], cosing learning scheduler.
- Downloads last month
- 126