File size: 4,208 Bytes
b5ec7df 2a919e6 b5ec7df 2a919e6 b5ec7df 2a919e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
library_name: peft
base_model: google/gemma-2b
language:
- ko
- en
tags:
- translation
- gemma
---
# Model Card for Model ID
## Model Details
### Model Description
Summarise Korean sentences concisely
- **Developed by:** [Kang Seok Ju]
- **Contact:** [brildev7@gmail.com]
## Training Details
### Training Data
https://huggingface.co/datasets/traintogpb/aihub-koen-translation-integrated-tiny-100k
# Inference Examples
```
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
model_id = "google/gemma-2b"
peft_model_id = "brildev7/gemma-2b-translation-koen-sft-qlora"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
torch_dtype=torch.float32,
attn_implementation="sdpa",
)
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
tokenizer.pad_token_id = tokenizer.eos_token_id
# example
prompt_template = """λ€μ λ΄μ©μ μμ΄λ‘ λ²μνμΈμ.:
{}
λ²μ:
"""
sentences = "μμ€μ€μ΄ λλλ μ리μ μκ΅ μμΈμμ λΆμΈ μΌμ΄νΈ λ―Έλ€ν΄ μμΈμλΉ(42)μ΄ κ²°κ΅ μ μ§λ¨μ λ°μλ€. λ‘μ΄ν° ν΅μ μ λ°λ₯΄λ©΄ μμΈμλΉμ 22μΌ(νμ§μκ°) μΈμ€νκ·Έλ¨ μμ λ©μμ§λ₯Ό ν΅ν΄ μ§λ 1μ λ³΅λΆ μμ μ λ°μ λ€ μ€μν νμ κ²μ¬μμ μμ΄ λ°κ²¬λΌ νμ¬ ννμΉλ£λ₯Ό λ°κ³ μλ€κ³ λ°νλ€. μμΈμλΉμ 'μλ£μ§μ μλ°©μ μ°¨μμμ ννμΉλ£λ₯Ό κΆκ³ νλ€'λ©΄μ 'λ¬Όλ‘ μ΄κ²μ ν° μΆ©κ²©μΌλ‘ λ€κ°μμ§λ§ μ리μκ³Ό μ λ μ΄λ¦° κ°μ‘±λ€μ μν΄ μ΄ λ¬Έμ λ₯Ό ν΄κ²°νκ³ μ μ΅μ μ λ€νκ³ μλ€'κ³ λ§νλ€. κ·Έλ¬λ©΄μ 'νμ¬ μμΌλ‘ μΈν΄ μν₯μ λ°μ λͺ¨λ μ¬λλ€μ μκ°νκ³ μλ€'λ©° 'λ―Ώμκ³Ό ν¬λ§μ μμ§ λ§μ λ¬λΌ. μ¬λ¬λΆμ νΌμκ° μλλ€'λΌκ³ λ§λΆμλ€."
texts = prompt_template.format(sentences)
inputs = tokenizer(texts, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Prince William's wife Kate Middleton, 42, has been diagnosed with cancer after undergoing surgery for her abdominal pain, according to Reuters news agency. In an Instagram message on the 22nd (local time), Kate Middleton, the wife of Prince William, said that she was diagnosed with cancer after undergoing surgery for her abdominal pain in January and is currently undergoing chemical therapy. She said that the medical team recommended chemical therapy as a measure to prevent the spread of the disease, but that she and Prince William are trying to resolve the issue for their young family. She added that "The medical team recommended chemical therapy as a measure to prevent the spread of the disease.
# example
prompt_template = """λ€μ λ΄μ©μ μμ΄λ‘ λ²μνμΈμ.:
{}
λ²μ:
"""
sentences = "μ νμ΄ μ£Όλ ₯ μμ₯ μ€μ νλμΈ μ€κ΅μμ νμ§ μ€λ§νΈν° μ μ‘°μ¬λ€μκ² λ°λ¦¬λ©° μκΈ°κ°μ΄ μ¦νλ κ°μ΄λ° μ€κ΅ μλΉμ μ‘κΈ°μ λμκ³ μλ€. ν μΏ‘ CEO(μ΅κ³ κ²½μμ)κ° μ§μ μ€κ΅μ λ°©λ¬Έν΄ ν¬μλ₯Ό μ½μνκ³ , 'μμ΄ν°' λ± μμ¬ κΈ°κΈ°μ μ€κ΅ λ°μ΄λμ AI(μΈκ³΅μ§λ₯) λͺ¨λΈμ νμ¬νλ λ°©μλ κ²ν νκ³ μλ€. μ€κ΅ λ³Έν μ μμ΄ν° ν μΈ κ³΅μΈμ μ΄μ΄ μ λ°©μμ ν¬μλ₯Ό λ리λ λͺ¨μμλ€."
texts = prompt_template.format(sentences)
inputs = tokenizer(texts, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- With Apple becoming a target in China, a major market, the company is taking a stance in a Chinese consumer magazine. CEO Tim Cook is visiting China and is planning to invest, and is also considering adding Chinese Big Data AI models on Apple's products such as 'iPhone'. It seems that China is making a wide-ranging investment following the iPhone discounting wave on the mainland.
``` |