|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- ko |
|
- zh |
|
--- |
|
ๅบไบ[Sakura-14B-Qwen2beta-Base-v2](https://huggingface.co/SakuraLLM/Sakura-14B-Qwen2beta-Base-v2)๏ผๅจ้ฉๆ่ฝปๅฐ่ฏด็ฟป่ฏๆฐๆฎไธๅพฎ่ฐ๏ผๅ
ๅซ550ๆฌๆฅ่ฝป็้ฉ็ฟปๅไธญ็ฟปๅฏน็
งไปฅๅ14ๆฌ้ฉ่ฝป็ไธญ็ฟป๏ผ |
|
|
|
ๆจกๅไป
ๆฏๆ้ฉๆโ็ฎไฝไธญๆ็็ฟป่ฏ |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from transformers.generation import GenerationConfig |
|
|
|
model_path = 'CjangCjengh/LN-Korean-14B-v0.2' |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', trust_remote_code=True).eval() |
|
model.generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True) |
|
|
|
# ๆฎต่ฝไน้ด็จ\nๅ้ |
|
text = '''์ฌ์์ ๋ค์ด ์์ ๋ค์ ์ฒซ ๊ฒฝํ์ ๋ํ ์ด์ผ๊ธฐ๋ฅผ ํ๋ ๊ฑธ ๋ค์ ์ ์ด ์๋๊ฐ. |
|
๋ฌผ๋ก ์ฌ๊ธฐ์ ์ฒซ ๊ฒฝํ์ด๋ผ๋ ๊ฒ์ ์ฒ์์ผ๋ก ์ผ์๋ฅผ ์จ๋ค๋ ๊ฐ ์ฒ์์ผ๋ก ์ ์ ๋ง์
๋ดค๋ค๋ ๊ฐ ๊ทธ๋ฐ ๊ฒ์ด ์๋๋ผ, ๋ช
์ค๊ณตํ ๊ทธ๋ ๊ณ ๊ทธ๋ฐ ์๋ฏธ์์์ ์ฒซ ๊ฒฝํ์ด๋ค. |
|
โ์ฐ, ์ฐ๋ฆฌ๊ฐโฆโฆ ์ฒ์์ผ๋ก ๊ทธ, ๊ทธ๊ฑธ ํ ๊ฑฐ๋ ๋ง์ด์ผ.โ |
|
๊ทธ๋ ๊ฒ ๋งํ ๊ฒ์ ์ํ์ ์์ ์๋ ๊ฐ์ ๊ต๋ณต์ ์๋
์๋ค. ๋ฅ๊ทผ ์ผ๊ตด์ ์ปค๋ค๋ ๊ฐ์ ๋๋์๋ฅผ ์ง๋, ๋ถ๋๋ฌ์ด ๋จธ๋ฆฌ์นด๋ฝ์ ์ด๊นจ ์๋ก ๋์ด๋จ๋ฆฌ๊ณ ์๋ ์๋
๋ค. ์ ๋ฐ์ ์ผ๋ก ์์ ํ ๋ชจ๋ฒ์ ๊ฐ์ ๋ณด์ด๋ ์ธ์์ด๊ณ ๋ชธ์ง๋ ์๋ดํ ํธ์ด์ง๋ง, ๊ต๋ณต ์์๋ฅผ ๋งคํน์ ์ผ๋ก ๋ถํ์ด ์ค๋ฅด๊ฒ ํ๊ณ ์๋ ๊ฐ์ด๋งํผ์ ์์ ํ์ง๋ ์๋ดํ์ง๋ ์์๋ค. ๋ชธ์ ์์ธ ๋ฆฐ ์์ธ ํ์ ๋ ํ์ด ๊ฐ์ด์ ์์์์ ์๋ฐํ๊ณ ์์ด, ๋ชธ์ ์์ง์ผ ๋๋ง๋ค ๊ทธ ์ค๊ณฝ์ด ๋ถ๋๋ฝ๊ฒ ์ผ๊ทธ๋ฌ์ก๋ค.''' |
|
|
|
# ๆๆฌ้ฟๅบฆๆงๅถๅจ1024ไปฅๅ
|
|
assert len(text) < 1024 |
|
|
|
messages = [ |
|
{'role': 'system', 'content': 'ไฝ ๆฏไธไธช่ฝปๅฐ่ฏด่ฏ่
๏ผๅไบๅฐๅคๆ่ฝปๅฐ่ฏด็ฟป่ฏๆไธญๆ'}, |
|
{'role': 'user', 'content': f'็ฟป่ฏๆไธญๆ๏ผ\n{text}'} |
|
] |
|
|
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
model_inputs = tokenizer([text], return_tensors='pt').to('cuda') |
|
|
|
generated_ids = model.generate( |
|
model_inputs.input_ids, |
|
max_new_tokens=1024 |
|
) |
|
|
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(response) |
|
``` |