|
--- |
|
library_name: transformers |
|
license: llama3 |
|
language: |
|
- ja |
|
- en |
|
tags: |
|
- llama-cpp |
|
--- |
|
|
|
# Llama-3-ELYZA-JP-8B-GGUF |
|
|
|
![Llama-3-ELYZA-JP-8B-image](./key_visual.png) |
|
|
|
## Model Description |
|
|
|
**Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/). |
|
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3) |
|
|
|
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd). |
|
|
|
## Quantization |
|
|
|
We have prepared two quantized model options, GGUF and AWQ. This is the GGUF (Q4_K_M) model, converted using [llama.cpp](https://github.com/ggerganov/llama.cpp). |
|
|
|
The following table shows the performance degradation due to quantization: |
|
|
|
| Model | ELYZA-tasks-100 GPT4 score | |
|
| :-------------------------------- | ---: | |
|
| [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B) | 3.655 | |
|
| [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57 | |
|
| [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ) | 3.39 | |
|
|
|
|
|
## Use with llama.cpp |
|
|
|
Install llama.cpp through brew (works on Mac and Linux): |
|
```bash |
|
brew install llama.cpp |
|
``` |
|
|
|
Invoke the llama.cpp server: |
|
```bash |
|
$ llama-server \ |
|
--hf-repo elyza/Llama-3-ELYZA-JP-8B-GGUF \ |
|
--hf-file Llama-3-ELYZA-JP-8B-q4_k_m.gguf \ |
|
--port 8080 |
|
``` |
|
|
|
Call the API using curl: |
|
```bash |
|
$ curl http://localhost:8080/v1/chat/completions \ |
|
-H "Content-Type: application/json" \ |
|
-d '{ |
|
"messages": [ |
|
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" }, |
|
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" } |
|
], |
|
"temperature": 0.6, |
|
"max_tokens": -1, |
|
"stream": false |
|
}' |
|
``` |
|
|
|
Call the API using Python: |
|
```python |
|
import openai |
|
|
|
client = openai.OpenAI( |
|
base_url="http://localhost:8080/v1", |
|
api_key = "dummy_api_key" |
|
) |
|
|
|
completion = client.chat.completions.create( |
|
model="dummy_model_name", |
|
messages=[ |
|
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"}, |
|
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"} |
|
] |
|
) |
|
``` |
|
|
|
## Use with Desktop App |
|
|
|
There are various desktop applications that can handle GGUF models, but here we will introduce how to use the model in the no-code environment [LM Studio](https://lmstudio.ai/). |
|
|
|
- **Installation**: Download and install [LM Studio](https://lmstudio.ai/). |
|
- **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`. |
|
- **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM. |
|
- **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings. |
|
- **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server. |
|
|
|
![lmstudio-demo](./lmstudio-demo.gif) |
|
|
|
This demo showcases Llama-3-ELYZA-JP-8B-GGUF running smoothly on a MacBook Pro (M1 Pro), achieving an inference speed of approximately 20 tokens per second. |
|
|
|
## Developers |
|
|
|
Listed in alphabetical order. |
|
|
|
- [Masato Hirakawa](https://huggingface.co/m-hirakawa) |
|
- [Shintaro Horie](https://huggingface.co/e-mon) |
|
- [Tomoaki Nakamura](https://huggingface.co/tyoyo) |
|
- [Daisuke Oba](https://huggingface.co/daisuk30ba) |
|
- [Sam Passaglia](https://huggingface.co/passaglia) |
|
- [Akira Sasaki](https://huggingface.co/akirasasaki) |
|
|
|
## License |
|
|
|
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/) |
|
|
|
## How to Cite |
|
|
|
```tex |
|
@misc{elyzallama2024, |
|
title={elyza/Llama-3-ELYZA-JP-8B}, |
|
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B}, |
|
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki}, |
|
year={2024}, |
|
} |
|
``` |
|
|
|
## Citations |
|
|
|
```tex |
|
@article{llama3modelcard, |
|
title={Llama 3 Model Card}, |
|
author={AI@Meta}, |
|
year={2024}, |
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
} |
|
``` |
|
|