metadata

tags:
  - generated_from_trainer
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: KoELECTRA-small-v3-modu-ner
    results: []
language:
  - ko
pipeline_tag: token-classification
widget:
  - text: 서울역으로 안내해줘.
    example_title: Example 1
  - text: 에어컨 온도 3도 올려줘.
    example_title: Example 2
  - text: 아이유 노래 검색해줘.
    example_title: Example 3

KoELECTRA-small-v3-modu-ner

This model is a fine-tuned version of monologg/koelectra-small-v3-discriminator on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1443
Precision: 0.8176
Recall: 0.8401
F1: 0.8287
Accuracy: 0.9615

Model description

태깅 시스템 : BIO 시스템

B-(begin) : 개체명이 시작할 때
I-(inside) : 토큰이 개체명 중간에 있을 때
O(outside) : 토큰이 개체명이 아닐 경우

한국정보통신기술협회(TTA) 대분류 기준을 따르는 15 가지의 태그셋

분류	표기	정의
ARTIFACTS	AF	사람에 의해 창조된 인공물로 문화재, 건물, 악기, 도로, 무기, 운송수단, 작품명, 공산품명이 모두 이에 해당
ANIMAL	AM	사람을 제외한 짐승
CIVILIZATION	CV	문명/문화
DATE	DT	기간 및 계절, 시기/시대
EVENT	EV	특정 사건/사고/행사 명칭
STUDY_FIELD	FD	학문 분야, 학파 및 유파
LOCATION	LC	지역/장소와 지형/지리 명칭 등을 모두 포함
MATERIAL	MT	원소 및 금속, 암석/보석, 화학물질
ORGANIZATION	OG	기관 및 단체 명칭
PERSON	PS	인명 및 인물의 별칭 (유사 인물 명칭 포함)
PLANT	PT	꽃/나무, 육지식물, 해초류, 버섯류, 이끼류
QUANTITY	QT	수량/분량, 순서/순차, 수사로 이루어진 표현
TIME	TI	시계상으로 나타나는 시/시각, 시간 범위
TERM	TM	타 개체명에서 정의된 세부 개체명 이외의 개체명
THEORY	TR	특정 이론, 법칙 원리 등

Intended uses & limitations

How to use

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Leo97/KoELECTRA-small-v3-modu-ner")
model = AutoModelForTokenClassification.from_pretrained("Leo97/KoELECTRA-small-v3-modu-ner")
ner = pipeline("ner", model=model, tokenizer=tokenizer)

example = "서울역으로 안내해줘."
ner_results = ner(example)
print(ner_results)

Training and evaluation data

개체명 인식(NER) 모델 학습 데이터 셋

문화체육관광부 > 국립국어원 > 모두의 말뭉치 > 개체명 분석 말뭉치 2021
https://corpus.korean.go.kr/request/reausetMain.do

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 3787
num_epochs: 18 (= 10 + 3 + 5)
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	3788	0.3021	0.6356	0.6380	0.6368	0.9223
No log	2.0	7576	0.1905	0.7397	0.7441	0.7419	0.9431
No log	3.0	11364	0.1612	0.7611	0.7897	0.7751	0.9505
No log	4.0	15152	0.1494	0.7855	0.7998	0.7926	0.9544
No log	5.0	18940	0.1427	0.7833	0.8194	0.8009	0.9559
No log	6.0	22728	0.1398	0.7912	0.8223	0.8064	0.9572
No log	7.0	26516	0.1361	0.8035	0.8240	0.8136	0.9587
No log	8.0	30304	0.1360	0.8047	0.8280	0.8162	0.9592
No log	9.0	34092	0.1346	0.8058	0.8299	0.8177	0.9596
0.2256	10.0	37880	0.1350	0.8068	0.8308	0.8186	0.9598
3회 훈련 추가
No log	1.0	3788	0.1367	0.8089	0.8240	0.8164	0.9595
No log	2.0	7576	0.1345	0.8130	0.8331	0.8229	0.9604
0.0953	3.0	11364	0.1370	0.8146	0.8349	0.8246	0.9609
5회 훈련 추가
No log	1.0	3788	0.1511	0.8095	0.8257	0.8176	0.9594
No log	2.0	7576	0.1461	0.8121	0.8339	0.8228	0.9600
No log	3.0	11364	0.1417	0.8139	0.8372	0.8254	0.9607
No log	4.0	15152	0.1418	0.8238	0.8346	0.8292	0.9617
0.0748	5.0	18940	0.1443	0.8176	0.8401	0.8287	0.9615

Framework versions

Transformers 4.27.4
Pytorch 2.0.0+cu118
Datasets 2.11.0
Tokenizers 0.13.2