formal_classifier

formal classifier or honorific classifier

ํ•œ๊ตญ์–ด ์กด๋Œ“๋ง ๋ฐ˜๋ง ๋ถ„๋ฅ˜๊ธฐ

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model = AutoModelForSequenceClassification.from_pretrained("j5ng/kcbert-formal-classifier")
tokenizer = AutoTokenizer.from_pretrained('j5ng/kcbert-formal-classifier')

formal_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer)
print(formal_classifier("์ €๋ฒˆ์— ๊ต์ˆ˜๋‹˜๊ป˜์„œ ์ž๋ฃŒ ๊ฐ€์ ธ์˜ค๋ผํ–ˆ๋Š”๋ฐ ๊ธฐ์–ต๋‚˜?")) 
# [{'label': 'LABEL_0', 'score': 0.9999139308929443}]

๋ฐ์ดํ„ฐ ์…‹ ์ถœ์ฒ˜

์Šค๋งˆ์ผ๊ฒŒ์ดํŠธ ๋งํˆฌ ๋ฐ์ดํ„ฐ ์…‹(korean SmileStyle Dataset)

: https://github.com/smilegate-ai/korean_smile_style_dataset

AI ํ—ˆ๋ธŒ ๊ฐ์„ฑ ๋Œ€ํ™” ๋ง๋ญ‰์น˜

: https://www.aihub.or.kr/

๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ(AIํ—ˆ๋ธŒ๋Š” ์ง์ ‘๋‹ค์šด๋กœ๋“œ๋งŒ ๊ฐ€๋Šฅ)

wget https://raw.githubusercontent.com/smilegate-ai/korean_smile_style_dataset/main/smilestyle_dataset.tsv

๊ฐœ๋ฐœ ํ™˜๊ฒฝ

Python3.9
torch==1.13.1
transformers==4.26.0
pandas==1.5.3
emoji==2.2.0
soynlp==0.0.493
datasets==2.10.1
pandas==1.5.3

์‚ฌ์šฉ ๋ชจ๋ธ

beomi/kcbert-base


์˜ˆ์‹œ

sentence label
๊ณต๋ถ€๋ฅผ ์—ด์‹ฌํžˆ ํ•ด๋„ ์—ด์‹ฌํžˆ ํ•œ ๋งŒํผ ์„ฑ์ ์ด ์ž˜ ๋‚˜์˜ค์ง€ ์•Š์•„ 0
์•„๋“ค์—๊ฒŒ ๋ณด๋‚ด๋Š” ๋ฌธ์ž๋ฅผ ํ†ตํ•ด ๊ด€๊ณ„๊ฐ€ ํšŒ๋ณต๋˜๊ธธ ๋ฐ”๋ž„๊ฒŒ์š” 1
์ฐธ ์—ด์‹ฌํžˆ ์‚ฌ์‹  ๋ณด๋žŒ์ด ์žˆ์œผ์‹œ๋„ค์š” 1
๋‚˜๋„ ์Šค์‹œ ์ข‹์•„ํ•จ ์ด๋ฒˆ ๋‹ฌ๋ถ€ํ„ฐ ์˜๊ตญ ๊ฐˆ ๋“ฏ 0
๋ณธ๋ถ€์žฅ๋‹˜์ด ๋‚ด๊ฐ€ ํ•  ์ˆ˜ ์—†๋Š” ์—…๋ฌด๋ฅผ ๊ณ„์† ์ฃผ์…”์„œ ํž˜๋“ค์–ด 0

๋ถ„ํฌ

label train test
0 133,430 34,908
1 112,828 29,839

๊ฒฐ๊ณผ

์ €๋ฒˆ์— ๊ต์ˆ˜๋‹˜๊ป˜์„œ ์ž๋ฃŒ ๊ฐ€์ ธ์˜ค๋ผํ•˜์…จ๋Š”๋ฐ ๊ธฐ์–ต๋‚˜์„ธ์š”? : ์กด๋Œ“๋ง์ž…๋‹ˆ๋‹ค. ( ํ™•๋ฅ  99.19% )
์ €๋ฒˆ์— ๊ต์ˆ˜๋‹˜๊ป˜์„œ ์ž๋ฃŒ ๊ฐ€์ ธ์˜ค๋ผํ–ˆ๋Š”๋ฐ ๊ธฐ์–ต๋‚˜? : ๋ฐ˜๋ง์ž…๋‹ˆ๋‹ค. ( ํ™•๋ฅ  92.86% )

์ธ์šฉ

@misc{SmilegateAI2022KoreanSmileStyleDataset,
  title         = {SmileStyle: Parallel Style-variant Corpus for Korean Multi-turn Chat Text Dataset},
  author        = {Seonghyun Kim},
  year          = {2022},
  howpublished  = {\url{https://github.com/smilegate-ai/korean_smile_style_dataset}},
}
@inproceedings{lee2020kcbert,
  title={KcBERT: Korean Comments BERT},
  author={Lee, Junbum},
  booktitle={Proceedings of the 32nd Annual Conference on Human and Cognitive Language Technology},
  pages={437--440},
  year={2020}
}
Downloads last month
57
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using j5ng/kcbert-formal-classifier 1