Kooten
/

FlatDolphinMaid-8x7B-3.5bpw-exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kooten commited on Jan 3

Commit

2b57b65

•

1 Parent(s): f766628

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -7,9 +7,9 @@ Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/Fla
 You probably want this version. It just fits in 24gb of vram at half context (16384).
-If you really want the larger context [3.0](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants.
-I did make a [4.0](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup.

 You probably want this version. It just fits in 24gb of vram at half context (16384).
+If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants.
+I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup.