gradjitta
/

Poro-34B-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

gradjitta commited on Nov 15, 2023

Commit

7fd60cb

•

1 Parent(s): c154fdc

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -12,6 +12,28 @@ license: apache-2.0
 ```
 #### Work supported by https://datacrunch.io/
 ##### Quantized by: gradjitta

 ```
+#### Script to AWQ quantization
+```
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer
+model_path = 'PATH_TO Poro-34B'
+quant_path = 'Poro-34B-AWQ'
+quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
+# Load model
+model = AutoAWQForCausalLM.from_pretrained(model_path, safetensors=True)
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+# Quantize
+model.quantize(tokenizer, quant_config=quant_config)
+# Save quantized model
+model.save_quantized(quant_path)
+tokenizer.save_pretrained(quant_path)
+```
 #### Work supported by https://datacrunch.io/
 ##### Quantized by: gradjitta