mobiuslabsgmbh
/

Llama-2-7b-chat-hf_1bitgs8_hqq

Text Generation

Model card Files Files and versions Community

mobicham commited on Mar 27

Commit

2dc97bb

•

1 Parent(s): 342bf46

fix wiki 2.8K count

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ This is an experimental <a href="https://github.com/mobiusml/hqq/">HQQ</a> 1-bit
 ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/636b945ef575d3705149e982/3fOfrg-5WtJwC5cpcVDub.gif)
 Quantizing small models at extreme low-bits is a challenging task. The purpose of this model is to show the community what to expect when fine-tuning such models.
-We notice that, 1-bit quantization doesn't work well when applied directly on small models such as the Llama2-7B. However, when fine-tuned, the model's ouput significantly improves. In fact, the 1-bit base model outperforms Quip# 2-bit after fine-tuning on ~2.9K samples.
 Note that the weights here are unsigned 1-bit (0 or 1), <a href="https://arxiv.org/abs/2402.17764">not ternary like the recent 1.58-bit work </a>. This is a more challenging task since we lose the sign of the weights and only fine-tune a small fraction of the parameters (~94MB worth of weights).
 The dequantization step can be rewriten as a 1-bit matmul which could potentially require only additions + a very low-rank matmul which is fast to compute.

 ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/636b945ef575d3705149e982/3fOfrg-5WtJwC5cpcVDub.gif)
 Quantizing small models at extreme low-bits is a challenging task. The purpose of this model is to show the community what to expect when fine-tuning such models.
+We notice that, 1-bit quantization doesn't work well when applied directly on small models such as the Llama2-7B. However, when fine-tuned, the model's ouput significantly improves. In fact, the 1-bit base model outperforms Quip# 2-bit after fine-tuning on ~2.8K samples.
 Note that the weights here are unsigned 1-bit (0 or 1), <a href="https://arxiv.org/abs/2402.17764">not ternary like the recent 1.58-bit work </a>. This is a more challenging task since we lose the sign of the weights and only fine-tune a small fraction of the parameters (~94MB worth of weights).
 The dequantization step can be rewriten as a 1-bit matmul which could potentially require only additions + a very low-rank matmul which is fast to compute.