Zoyd
/

LLM360_K2-Chat-3_5bpw_exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Zoyd commited on Jun 1

Commit

f2923ec

•

1 Parent(s): 3046134

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -1,6 +1,23 @@
 ---
 license: apache-2.0
 ---
 # K2-Chat: a fully-reproducible large language model outperforming Llama 2 70B Chat using 35% less compute
 K2 Chat is finetuned from [K2-65B](https://huggingface.co/LLM360/K2). K2 Chat outperforms Llama 2-70B-Chat on all evaluations conducted. The model also outperforms Llama 3-70B-Instruct on coding tasks.

 ---
 license: apache-2.0
 ---
+**Exllamav2** quant (**exl2** / **3.5 bpw**) made with ExLlamaV2 v0.1.1
+Other EXL2 quants:
+| **Quant** | **Model Size** | **lm_head** |
+| ----- | ---------- | ------- |
+|<center>**[2.2](https://huggingface.co/Zoyd/LLM360_K2-Chat-2_2bpw_exl2)**</center> | <center>17685 MB</center> | <center>6</center> |
+|<center>**[2.5](https://huggingface.co/Zoyd/LLM360_K2-Chat-2_5bpw_exl2)**</center> | <center>20000 MB</center> | <center>6</center> |
+|<center>**[3.0](https://huggingface.co/Zoyd/LLM360_K2-Chat-3_0bpw_exl2)**</center> | <center>23857 MB</center> | <center>6</center> |
+|<center>**[3.5](https://huggingface.co/Zoyd/LLM360_K2-Chat-3_5bpw_exl2)**</center> | <center>27721 MB</center> | <center>6</center> |
+|<center>**[3.75](https://huggingface.co/Zoyd/LLM360_K2-Chat-3_75bpw_exl2)**</center> | <center>29647 MB</center> | <center>6</center> |
+|<center>**[4.0](https://huggingface.co/Zoyd/LLM360_K2-Chat-4_0bpw_exl2)**</center> | <center>31549 MB</center> | <center>6</center> |
+|<center>**[4.25](https://huggingface.co/Zoyd/LLM360_K2-Chat-4_25bpw_exl2)**</center> | <center>33505 MB</center> | <center>6</center> |
+|<center>**[5.0](https://huggingface.co/Zoyd/LLM360_K2-Chat-5_0bpw_exl2)**</center> | <center>39300 MB</center> | <center>6</center> |
+|<center>**[6.0](https://huggingface.co/Zoyd/LLM360_K2-Chat-6_0bpw_exl2)**</center> | <center>46927 MB</center> | <center>8</center> |
+|<center>**[6.5](https://huggingface.co/Zoyd/LLM360_K2-Chat-6_5bpw_exl2)**</center> | <center>50613 MB</center> | <center>8</center> |
+|<center>**[8.0](https://huggingface.co/Zoyd/LLM360_K2-Chat-8_0bpw_exl2)**</center> | <center>49516 MB</center> | <center>8</center> |
 # K2-Chat: a fully-reproducible large language model outperforming Llama 2 70B Chat using 35% less compute
 K2 Chat is finetuned from [K2-65B](https://huggingface.co/LLM360/K2). K2 Chat outperforms Llama 2-70B-Chat on all evaluations conducted. The model also outperforms Llama 3-70B-Instruct on coding tasks.