metadata

license: cc-by-nc-4.0

FlatDolphinMaid-8x7B 3.5bpw

You probably want this version. It just fits in 24gb of vram at half context (16384).

If you really want the larger context 3bpw should do it but you are probably better of with the gguf version with higher quants.

I did make a 4bpw, it might work in a headless or multigpu setup.

Make sure you enable 8bit cache.

### Instruction:
{system prompt}

### Input:
{input}

### Response:
{reply}

Kooten on discord.