|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
# FlatDolphinMaid-8x7B 3.5bpw |
|
Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B) |
|
|
|
You probably want this version. It just fits in 24gb of vram at half context (16384). |
|
|
|
If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants. |
|
|
|
I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup. |
|
|
|
|
|
|
|
Other BPW's [3.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2), [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2), [4.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2) |
|
|
|
Make sure you **enable 8bit cache**. |
|
|
|
|
|
### Promt format: |
|
|
|
``` |
|
### Instruction: |
|
{system prompt} |
|
|
|
### Input: |
|
{input} |
|
|
|
### Response: |
|
{reply} |
|
``` |
|
|
|
### Contact |
|
Kooten on discord. |
|
|