File size: 979 Bytes
5166832 f766628 2b57b65 f766628 2b57b65 f766628 5166832 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: cc-by-nc-4.0
---
# FlatDolphinMaid-8x7B 3.5bpw
Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B)
You probably want this version. It just fits in 24gb of vram at half context (16384).
If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants.
I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup.
Other BPW's [3.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2), [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2), [4.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2)
Make sure you **enable 8bit cache**.
### Promt format:
```
### Instruction:
{system prompt}
### Input:
{input}
### Response:
{reply}
```
### Contact
Kooten on discord.
|