Kooten's picture
Update README.md
2b57b65
|
raw
history blame contribute delete
No virus
979 Bytes
metadata
license: cc-by-nc-4.0

FlatDolphinMaid-8x7B 3.5bpw

Exllama quant of Undi95/FlatDolphinMaid-8x7B

You probably want this version. It just fits in 24gb of vram at half context (16384).

If you really want the larger context 3bpw should do it but you are probably better of with the gguf version with higher quants.

I did make a 4bpw, it might work in a headless or multigpu setup.

Other BPW's 3.0bpw, 3.5bpw, 4.0bpw

Make sure you enable 8bit cache.

Promt format:

### Instruction:
{system prompt}

### Input:
{input}

### Response:
{reply}

Contact

Kooten on discord.