File size: 4,212 Bytes
764f017 1f9f284 764f017 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
library_name: transformers
license: llama3
language:
- ja
- en
---
# Llama-3-ELYZA-JP-8B-AWQ
![Llama-3-ELYZA-JP-8B-image](./key_visual.png)
## Model Description
**Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/).
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning.
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
## AWQ Quantization
This model is quantized using the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
## Use with vLLM
Install vLLM.
```bash
pip install vllm
```
### vLLM Offline Batched Inference
```python
from vllm import LLM, SamplingParams
llm = LLM(model="elyza/Llama-3-ELYZA-JP-8B-AWQ", quantization="awq")
tokenizer = llm.get_tokenizer()
DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=1000)
messages_batch = [
[
{"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
],
[
{"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
{"role": "user", "content": "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"}
]
]
prompts = [
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
for messages in messages_batch
]
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
print(output.outputs[0].text)
print("=" * 50)
```
### vLLM OpenAI Compatible Server
Start the API server.
```bash
python -m vllm.entrypoints.openai.api_server \
--model elyza/Llama-3-ELYZA-JP-8B-AWQ \
--port 8000 \
--host localhost \
--quantization awq
```
Call the API using curl.
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "elyza/Llama-3-ELYZA-JP-8B-AWQ",
"messages": [
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
],
"temperature": 0.6,
"max_tokens": 1000,
"stream": false
}'
```
Call the API using Python.
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key = "dummy_api_key"
)
completion = client.chat.completions.create(
model="elyza/Llama-3-ELYZA-JP-8B-AWQ",
messages=[
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
]
)
```
## Developers
Listed in alphabetical order.
- [Masato Hirakawa](https://huggingface.co/m-hirakawa)
- [Shintaro Horie](https://huggingface.co/e-mon)
- [Tomoaki Nakamura](https://huggingface.co/tyoyo)
- [Daisuke Oba](https://huggingface.co/daisuk30ba)
- [Sam Passaglia](https://huggingface.co/passaglia)
- [Akira Sasaki](https://huggingface.co/akirasasaki)
## License
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)
## How to Cite
```tex
@misc{elyzallama2024,
title={elyza/Llama-3-ELYZA-JP-8B},
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
year={2024},
}
```
## Citations
```tex
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
```
|