elyza
/

Llama-3-ELYZA-JP-8B-GGUF

Inference Endpoints

Model card Files Files and versions Community

tyoyo commited on Jun 25

Commit

d6ac9a3

•

1 Parent(s): 0796d05

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -19,6 +19,18 @@ Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama
 For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
+## Quantization
+We performed quantization using [llama.cpp](https://github.com/ggerganov/llama.cpp) and converted the model to GGUF format. Currently, we only offer quantized models in the Q4_K_M format.
+We have prepared two quantized model options, GGUF and AWQ. Here is the table measuring the performance degradation due to quantization.
+| Model | ELYZA-tasks-100 GPT4 score |
+| :-------------------------------- | ---: |
+| Llama-3-ELYZA-JP-8B               | 3.655 |
+| Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M) | 3.57  |
+| Llama-3-ELYZA-JP-8B-AWQ           | 3.39  |
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)