|
--- |
|
library_name: peft |
|
base_model: google/gemma-2b |
|
language: |
|
- ko |
|
- en |
|
tags: |
|
- translation |
|
- gemma |
|
--- |
|
|
|
# Model Card for Model ID |
|
## Model Details |
|
### Model Description |
|
Summarise Korean sentences concisely |
|
- **Developed by:** [Kang Seok Ju] |
|
- **Contact:** [brildev7@gmail.com] |
|
|
|
## Training Details |
|
### Training Data |
|
https://huggingface.co/datasets/traintogpb/aihub-koen-translation-integrated-tiny-100k |
|
|
|
# Inference Examples |
|
``` |
|
import os |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
from peft import PeftModel |
|
|
|
model_id = "google/gemma-2b" |
|
peft_model_id = "brildev7/gemma-2b-translation-koen-sft-qlora" |
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_compute_dtype=torch.float16, |
|
bnb_4bit_quant_type="nf4" |
|
) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
quantization_config=quantization_config, |
|
torch_dtype=torch.float32, |
|
attn_implementation="sdpa", |
|
) |
|
model = PeftModel.from_pretrained(model, peft_model_id) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(peft_model_id) |
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
|
# example |
|
prompt_template = """λ€μ λ΄μ©μ μμ΄λ‘ λ²μνμΈμ.: |
|
{} |
|
|
|
λ²μ: |
|
""" |
|
sentences = "μμ€μ€μ΄ λλλ μ리μ μκ΅ μμΈμμ λΆμΈ μΌμ΄νΈ λ―Έλ€ν΄ μμΈμλΉ(42)μ΄ κ²°κ΅ μ μ§λ¨μ λ°μλ€. λ‘μ΄ν° ν΅μ μ λ°λ₯΄λ©΄ μμΈμλΉμ 22μΌ(νμ§μκ°) μΈμ€νκ·Έλ¨ μμ λ©μμ§λ₯Ό ν΅ν΄ μ§λ 1μ λ³΅λΆ μμ μ λ°μ λ€ μ€μν νμ κ²μ¬μμ μμ΄ λ°κ²¬λΌ νμ¬ ννμΉλ£λ₯Ό λ°κ³ μλ€κ³ λ°νλ€. μμΈμλΉμ 'μλ£μ§μ μλ°©μ μ°¨μμμ ννμΉλ£λ₯Ό κΆκ³ νλ€'λ©΄μ 'λ¬Όλ‘ μ΄κ²μ ν° μΆ©κ²©μΌλ‘ λ€κ°μμ§λ§ μ리μκ³Ό μ λ μ΄λ¦° κ°μ‘±λ€μ μν΄ μ΄ λ¬Έμ λ₯Ό ν΄κ²°νκ³ μ μ΅μ μ λ€νκ³ μλ€'κ³ λ§νλ€. κ·Έλ¬λ©΄μ 'νμ¬ μμΌλ‘ μΈν΄ μν₯μ λ°μ λͺ¨λ μ¬λλ€μ μκ°νκ³ μλ€'λ©° 'λ―Ώμκ³Ό ν¬λ§μ μμ§ λ§μ λ¬λΌ. μ¬λ¬λΆμ νΌμκ° μλλ€'λΌκ³ λ§λΆμλ€." |
|
texts = prompt_template.format(sentences) |
|
inputs = tokenizer(texts, return_tensors="pt").to(model.device) |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=1024) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
- Prince William's wife Kate Middleton, 42, has been diagnosed with cancer after undergoing surgery for her abdominal pain, according to Reuters news agency. In an Instagram message on the 22nd (local time), Kate Middleton, the wife of Prince William, said that she was diagnosed with cancer after undergoing surgery for her abdominal pain in January and is currently undergoing chemical therapy. She said that the medical team recommended chemical therapy as a measure to prevent the spread of the disease, but that she and Prince William are trying to resolve the issue for their young family. She added that "The medical team recommended chemical therapy as a measure to prevent the spread of the disease. |
|
|
|
# example |
|
prompt_template = """λ€μ λ΄μ©μ μμ΄λ‘ λ²μνμΈμ.: |
|
{} |
|
|
|
λ²μ: |
|
""" |
|
sentences = "μ νμ΄ μ£Όλ ₯ μμ₯ μ€μ νλμΈ μ€κ΅μμ νμ§ μ€λ§νΈν° μ μ‘°μ¬λ€μκ² λ°λ¦¬λ©° μκΈ°κ°μ΄ μ¦νλ κ°μ΄λ° μ€κ΅ μλΉμ μ‘κΈ°μ λμκ³ μλ€. ν μΏ‘ CEO(μ΅κ³ κ²½μμ)κ° μ§μ μ€κ΅μ λ°©λ¬Έν΄ ν¬μλ₯Ό μ½μνκ³ , 'μμ΄ν°' λ± μμ¬ κΈ°κΈ°μ μ€κ΅ λ°μ΄λμ AI(μΈκ³΅μ§λ₯) λͺ¨λΈμ νμ¬νλ λ°©μλ κ²ν νκ³ μλ€. μ€κ΅ λ³Έν μ μμ΄ν° ν μΈ κ³΅μΈμ μ΄μ΄ μ λ°©μμ ν¬μλ₯Ό λ리λ λͺ¨μμλ€." |
|
texts = prompt_template.format(sentences) |
|
inputs = tokenizer(texts, return_tensors="pt").to(model.device) |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=1024) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
- With Apple becoming a target in China, a major market, the company is taking a stance in a Chinese consumer magazine. CEO Tim Cook is visiting China and is planning to invest, and is also considering adding Chinese Big Data AI models on Apple's products such as 'iPhone'. It seems that China is making a wide-ranging investment following the iPhone discounting wave on the mainland. |
|
``` |