BoostCamp AI Tech NLP Model
Collection
4 items
โข
Updated
์ด ๋ชจ๋ธ์ lcw99 / t5-large-korean-text-summary์ klue-ynat์ผ๋ก ํ๋ จ์์ผ ๋ง๋ ๋ชจ๋ธ์
๋๋ค.
Input = ['IT๊ณผํ','๊ฒฝ์ ','์ฌํ','์ํ๋ฌธํ','์ธ๊ณ','์คํฌ์ธ ','์ ์น']
OUTPUT = ๊ฐ label์ ๋ง๋ ๋ด์ค ๊ธฐ์ฌ ์ ๋ชฉ์ ์์ฑํฉ๋๋ค.
๋ฐฐ์น๋จ์๋ก ์ถ๋ก ํ๊ณ ์ถ๋ค๋ฉด batch_encode_plus๋ฅผ ์ฌ์ฉํ์๋ฉด ๋ฉ๋๋ค.
git : https://github.com/taemin6697
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_dir = "kfkas/t5-large-korean-news-title-klue-ynat"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
model.to(device)
label_list = ['IT๊ณผํ','๊ฒฝ์ ','์ฌํ','์ํ๋ฌธํ','์ธ๊ณ','์คํฌ์ธ ','์ ์น']
text = "IT๊ณผํ"
input_ids = tokenizer.encode(text,return_tensors="pt").to(device)
with torch.no_grad():
output = model.generate(
input_ids,
do_sample=True, #์ํ๋ง ์ ๋ต ์ฌ์ฉ
max_length=128, # ์ต๋ ๋์ฝ๋ฉ ๊ธธ์ด๋ 50
top_k=50, # ํ๋ฅ ์์๊ฐ 50์ ๋ฐ์ธ ํ ํฐ์ ์ํ๋ง์์ ์ ์ธ
top_p=0.95, # ๋์ ํ๋ฅ ์ด 95%์ธ ํ๋ณด์งํฉ์์๋ง ์์ฑ
)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
print(decoded_output)#SKํ
๋ ์ฝค ์ค๋งํธ ๋ชจ๋ฐ์ผ ์๊ธ์ ์์ฆ1 ์ถ์
More information needed
More information needed
The following hyperparameters were used during training: