elyza
/

Llama-3-ELYZA-JP-8B-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

tyoyo commited on Jun 25

Commit

cd055ed

•

1 Parent(s): 6a52e4d

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -17,10 +17,18 @@ Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama
 For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
-## AWQ Quantization
 This model is quantized using the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
 ## Use with vLLM
 Install vLLM.

 For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
+## Quantization
 This model is quantized using the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
+We have prepared two quantized model options, GGUF and AWQ. Here is the table measuring the performance degradation due to quantization.
+| Model | ELYZA-tasks-100 GPT4 score |
+| :-------------------------------- | ---: |
+| Llama-3-ELYZA-JP-8B               | 3.655 |
+| Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M) | 3.57  |
+| Llama-3-ELYZA-JP-8B-AWQ           | 3.39  |
 ## Use with vLLM
 Install vLLM.