thirteenbit commited on
Commit
7abb801
1 Parent(s): 9876284

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -17,3 +17,17 @@ use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible soft
17
 
18
  Converted to gguf using llama.cpp [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
19
  and quantized using llama.cpp llama-quantize, llama.cpp version [b3325](https://github.com/ggerganov/llama.cpp/commits/b3325).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  Converted to gguf using llama.cpp [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
19
  and quantized using llama.cpp llama-quantize, llama.cpp version [b3325](https://github.com/ggerganov/llama.cpp/commits/b3325).
20
+
21
+
22
+ ## Provided files
23
+
24
+ | Name | Quant method | Bits | Size | VRAM required |
25
+ | ---- | ---- | ---- | ---- | ---- |
26
+ | [model-q3_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q3_k_m.gguf) | Q3_K_M | 3 | 4.9 GB| 5.7 GB |
27
+ | [model-q4_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q4_k_m.gguf) | Q4_K_M | 4 | 6.3 GB| 7.1 GB |
28
+ | [model-q5_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q5_k_m.gguf) | Q5_K_M | 5 | 7.2 GB| 7.9 GB |
29
+ | [model-q6_k.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q6_k.gguf) | Q6_K | 6 | 8.2 GB| 8.9 GB |
30
+ | [model-q8_0.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q8_0.gguf) | Q8_0 | 8 | 11 GB| 11.3 GB |
31
+
32
+ **Note**: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.
33
+