hugging-quants
/

Meta-Llama-3.1-8B-Instruct-AWQ-INT4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alvarobartt HF staff commited on Jul 23

Commit

9292676

•

1 Parent(s): 56b7801

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -125,7 +125,7 @@ Coming soon!
 ## Quantization Reproduction
 > [!NOTE]
-> In order to quantize Llama 3.1 8B Instruct using AutoAWQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~8GiB, and an NVIDIA GPU with 24GiB of VRAM to quantize it.
 In order to quantize Llama 3.1 8B Instruct, first install `torch` and `autoawq` as follows:

 ## Quantization Reproduction
 > [!NOTE]
+> In order to quantize Llama 3.1 8B Instruct using AutoAWQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~8GiB, and an NVIDIA GPU with 16GiB of VRAM to quantize it.
 In order to quantize Llama 3.1 8B Instruct, first install `torch` and `autoawq` as follows: