|
--- |
|
library_name: transformers |
|
license: llama3 |
|
language: |
|
- ja |
|
- en |
|
--- |
|
|
|
# Llama-3-ELYZA-JP-8B-AWQ |
|
|
|
![Llama-3-ELYZA-JP-8B-image](./key_visual.png) |
|
|
|
## Model Description |
|
|
|
**Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/). |
|
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. |
|
|
|
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd). |
|
|
|
## Quantization |
|
|
|
We have prepared two quantized model options, GGUF and AWQ. This is the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) model. |
|
|
|
The following table shows the performance degradation due to quantization: |
|
|
|
| Model | ELYZA-tasks-100 GPT4 score | |
|
| :-------------------------------- | ---: | |
|
| Llama-3-ELYZA-JP-8B | 3.655 | |
|
| Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M) | 3.57 | |
|
| Llama-3-ELYZA-JP-8B-AWQ | 3.39 | |
|
|
|
## Use with vLLM |
|
|
|
Install vLLM: |
|
|
|
```bash |
|
pip install vllm |
|
``` |
|
|
|
### vLLM Offline Batched Inference |
|
|
|
```python |
|
from vllm import LLM, SamplingParams |
|
|
|
llm = LLM(model="elyza/Llama-3-ELYZA-JP-8B-AWQ", quantization="awq") |
|
tokenizer = llm.get_tokenizer() |
|
|
|
DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" |
|
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=1000) |
|
messages_batch = [ |
|
[ |
|
{"role": "system", "content": DEFAULT_SYSTEM_PROMPT}, |
|
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"} |
|
], |
|
[ |
|
{"role": "system", "content": DEFAULT_SYSTEM_PROMPT}, |
|
{"role": "user", "content": "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"} |
|
] |
|
] |
|
|
|
prompts = [ |
|
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
for messages in messages_batch |
|
] |
|
|
|
outputs = llm.generate(prompts, sampling_params) |
|
|
|
# Print the outputs. |
|
for output in outputs: |
|
print(output.outputs[0].text) |
|
print("=" * 50) |
|
``` |
|
|
|
|
|
### vLLM OpenAI Compatible Server |
|
|
|
Start the API server: |
|
```bash |
|
python -m vllm.entrypoints.openai.api_server \ |
|
--model elyza/Llama-3-ELYZA-JP-8B-AWQ \ |
|
--port 8000 \ |
|
--host localhost \ |
|
--quantization awq |
|
``` |
|
|
|
|
|
Call the API using curl: |
|
```bash |
|
curl http://localhost:8000/v1/chat/completions \ |
|
-H "Content-Type: application/json" \ |
|
-d '{ |
|
"model": "elyza/Llama-3-ELYZA-JP-8B-AWQ", |
|
"messages": [ |
|
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" }, |
|
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" } |
|
], |
|
"temperature": 0.6, |
|
"max_tokens": 1000, |
|
"stream": false |
|
}' |
|
``` |
|
|
|
Call the API using Python: |
|
```python |
|
import openai |
|
|
|
client = openai.OpenAI( |
|
base_url="http://localhost:8000/v1", |
|
api_key = "dummy_api_key" |
|
) |
|
|
|
completion = client.chat.completions.create( |
|
model="elyza/Llama-3-ELYZA-JP-8B-AWQ", |
|
messages=[ |
|
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"}, |
|
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"} |
|
] |
|
) |
|
``` |
|
|
|
## Developers |
|
|
|
Listed in alphabetical order. |
|
|
|
- [Masato Hirakawa](https://huggingface.co/m-hirakawa) |
|
- [Shintaro Horie](https://huggingface.co/e-mon) |
|
- [Tomoaki Nakamura](https://huggingface.co/tyoyo) |
|
- [Daisuke Oba](https://huggingface.co/daisuk30ba) |
|
- [Sam Passaglia](https://huggingface.co/passaglia) |
|
- [Akira Sasaki](https://huggingface.co/akirasasaki) |
|
|
|
## License |
|
|
|
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/) |
|
|
|
## How to Cite |
|
|
|
```tex |
|
@misc{elyzallama2024, |
|
title={elyza/Llama-3-ELYZA-JP-8B}, |
|
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B}, |
|
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki}, |
|
year={2024}, |
|
} |
|
``` |
|
|
|
## Citations |
|
|
|
```tex |
|
@article{llama3modelcard, |
|
title={Llama 3 Model Card}, |
|
author={AI@Meta}, |
|
year={2024}, |
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
} |
|
``` |
|
|