ISTA-DASLab
/

Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SpiridonSunRotator commited on May 3

Commit

75d8ed3

•

1 Parent(s): 712c4fa

Added initial model card

Files changed (1) hide show

README.md +20 -0

README.md ADDED Viewed

	@@ -0,0 +1,20 @@

+---
+library_name: transformers
+tags:
+- llama
+- facebook
+- meta
+- llama-3
+- conversational
+- text-generation-inference
+---
+Official [AQLM](https://arxiv.org/abs/2401.06118) quantization of [meta-llama/Meta-Llama-3-70B
+](https://huggingface.co/meta-llama/Meta-Llama-3-70B).
+For this quantization, we used 1 codebook of 16 bits.
+Results (in progress):
+| Model      | Quantization | Model size, Gb |
+|------|------|------|
+|meta-llama/Meta-Llama-3-70B | - | 141.2 |
+|  | 1x16 |  21.9 |