File size: 4,730 Bytes
ad90505 5c01a58 ad90505 5c01a58 4367e81 5c01a58 d6ac9a3 b617fe5 ae550c6 d6ac9a3 05aa8dc d6ac9a3 ad90505 7471f47 ad90505 b617fe5 7471f47 ad90505 5c01a58 ad90505 7471f47 ad90505 5c01a58 7471f47 5c01a58 7471f47 5c01a58 b617fe5 bad0264 5c01a58 05aa8dc 5c01a58 ad90505 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
library_name: transformers
license: llama3
language:
- ja
- en
tags:
- llama-cpp
---
# Llama-3-ELYZA-JP-8B-GGUF
![Llama-3-ELYZA-JP-8B-image](./key_visual.png)
## Model Description
**Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/).
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3)
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
## Quantization
We have prepared two quantized model options, GGUF and AWQ. This is the GGUF (Q4_K_M) model, converted using [llama.cpp](https://github.com/ggerganov/llama.cpp).
The following table shows the performance degradation due to quantization:
| Model | ELYZA-tasks-100 GPT4 score |
| :-------------------------------- | ---: |
| [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B) | 3.655 |
| [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57 |
| [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ) | 3.39 |
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux):
```bash
brew install llama.cpp
```
Invoke the llama.cpp server:
```bash
$ llama-server \
--hf-repo elyza/Llama-3-ELYZA-JP-8B-GGUF \
--hf-file Llama-3-ELYZA-JP-8B-q4_k_m.gguf \
--port 8080
```
Call the API using curl:
```bash
$ curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
],
"temperature": 0.6,
"max_tokens": -1,
"stream": false
}'
```
Call the API using Python:
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key = "dummy_api_key"
)
completion = client.chat.completions.create(
model="dummy_model_name",
messages=[
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
]
)
```
## Use with Desktop App
There are various desktop applications that can handle GGUF models, but here we will introduce how to use the model in the no-code environment [LM Studio](https://lmstudio.ai/).
- **Installation**: Download and install [LM Studio](https://lmstudio.ai/).
- **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
- **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM.
- **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
- **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
![lmstudio-demo](./lmstudio-demo.gif)
This demo showcases Llama-3-ELYZA-JP-8B-GGUF running smoothly on a MacBook Pro (M1 Pro), achieving an inference speed of approximately 20 tokens per second.
## Developers
Listed in alphabetical order.
- [Masato Hirakawa](https://huggingface.co/m-hirakawa)
- [Shintaro Horie](https://huggingface.co/e-mon)
- [Tomoaki Nakamura](https://huggingface.co/tyoyo)
- [Daisuke Oba](https://huggingface.co/daisuk30ba)
- [Sam Passaglia](https://huggingface.co/passaglia)
- [Akira Sasaki](https://huggingface.co/akirasasaki)
## License
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)
## How to Cite
```tex
@misc{elyzallama2024,
title={elyza/Llama-3-ELYZA-JP-8B},
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
year={2024},
}
```
## Citations
```tex
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
```
|