Tolerblanc
commited on
Commit
•
584dd9a
1
Parent(s):
0bf8dc4
Update README.md
Browse files
README.md
CHANGED
@@ -5,5 +5,42 @@ datasets:
|
|
5 |
language:
|
6 |
- ko
|
7 |
---
|
8 |
-
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
language:
|
6 |
- ko
|
7 |
---
|
8 |
+
# K-urse_Detection_with_BERT
|
9 |
+
|
10 |
+
![TensorFlow](https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?style=for-the-badge&logo=TensorFlow&logoColor=white) ![Keras](https://img.shields.io/badge/Keras-%23D00000.svg?style=for-the-badge&logo=Keras&logoColor=white)
|
11 |
+
|
12 |
+
## Overview
|
13 |
+
**K-urse_Detection_with_BERT** : Korean Cursing expression Detection with fine-tuned klue_BERT
|
14 |
+
|
15 |
+
This is the KWU "text mining" output for the first semester of 2023.
|
16 |
+
|
17 |
+
See Project Overview Here! : [Notion(Korean)](https://www.notion.so/tolerblanc/4d70c776b3f74dbe8e03a38ccda27fbb?pvs=4)
|
18 |
+
|
19 |
+
See this model on GitHub : [Link](https://github.com/Tolerblanc/K-urse_Detection_with_BERT)
|
20 |
+
|
21 |
+
## Evaluation
|
22 |
+
- Comparison Model is [here](https://github.com/JminJ/Bad_text_classifier)
|
23 |
+
- Used [2runo's Curse-detection-data](https://github.com/2runo/Curse-detection-data)
|
24 |
+
|
25 |
+
| Model/Metric | Accuracy | Precision | Recall | F1 Score |
|
26 |
+
| --- | --- | --- | --- | --- |
|
27 |
+
| Comparison(Electra base) | 0.81 | 0.69 | **0.87** | **0.77** |
|
28 |
+
| klue-BERT base(Our best result) | **0.83** | **0.76**** | 0.75 | 0.75 |
|
29 |
+
|
30 |
+
- Used Youtube Comments
|
31 |
+
|
32 |
+
| Model/Metric | Accuracy | Precision | Recall | F1 Score |
|
33 |
+
| --- | --- | --- | --- | --- |
|
34 |
+
| Comparison(Electra base) | 0.77 | 0.52 | **0.90** | 0.66 |
|
35 |
+
| klue-BERT base(Our best result) | **0.89** | **0.75** | 0.80 | **0.78** |
|
36 |
+
|
37 |
+
## Demo with HuggingFace's Space 🤗
|
38 |
+
Try Demo Here! [Go to HuggingFace Space](https://huggingface.co/datasets/Tolerblanc/Demo_Kurse_detection)
|
39 |
+
|
40 |
+
## Reference
|
41 |
+
- Smilegate-AI's Korean Unsmile Dataset : [Link](https://huggingface.co/datasets/smilegate-ai/kor_unsmile)
|
42 |
+
- JeanLee's K-MHaS Dataset and Paper : [Link](https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech)
|
43 |
+
- KLUE(Korean Language Understanding Evaluation) BERT : [Link](https://github.com/KLUE-benchmark/KLUE)
|
44 |
+
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : [Link](https://arxiv.org/abs/1810.04805)
|
45 |
+
- 2runo's Curse Detection Dataset : [Link](https://github.com/2runo/Curse-detection-data)
|
46 |
+
- JminJ's Bad Text Classifier : [Link](https://github.com/JminJ/Bad_text_classifier)
|