daisd-ai
/

anydef-v2-linear-W4A16

compressed-tensors

Model card Files Files and versions Community

arynkiewicz commited on 17 days ago

Commit

d19e769

•

1 Parent(s): efd9158

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -33,7 +33,15 @@ dtype: bfloat16
 ## Quantization
 The quantization was applied using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with 512 random examples from [anydef-kilt-tasks-v2](https://huggingface.co/datasets/daisd-ai/anydef-kilt-tasks-v2) dataset.
-We tested other number of examples, but did not see noticeable improvement with higher number of examples during quantization.
 ## Inference

 ## Quantization
 The quantization was applied using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with 512 random examples from [anydef-kilt-tasks-v2](https://huggingface.co/datasets/daisd-ai/anydef-kilt-tasks-v2) dataset.
+We tested other numbers of examples, but did not see noticeable improvement with higher number of examples during quantization.
+The recipe for quantization:
+```python
+recipe = [
+    SmoothQuantModifier(smoothing_strength=0.8),
+    GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
+]
+```
 ## Inference