saattrupdan's picture
Update README.md
6687172
|
raw
history blame
No virus
1.96 kB
metadata
license: mit
model-index:
  - name: xlm-roberta-base-offensive-text-detection-da
    results: []
widget:
  - text: Din store idiot

Danish Offensive Text Detection based on ELECTRA-small

This model is a fine-tuned version of xlm-roberta-base on a dataset consisting of approximately 5 million Facebook comments on DR's public Facebook pages. The labels have been automatically generated using weak supervision, based on the Snorkel framework.

The model achieves second place on a test set consisting of 500 Facebook comments annotated by two people, of which 41.2% were labelled as offensive:

Model Precision Recall F1-score
alexandrainst/electra-small-offensive-text-detection-da 85.45% 91.26% 88.26%
alexandrainst/xlm-roberta-base-offensive-text-detection-da (this) 83.48% 93.20% 88.07%
A-ttack 99.17% 58.25% 73.39%
DaNLP/da-electra-hatespeech-detection 92.19% 57.28% 70.66%
Guscode/DKbert-hatespeech-detection 84.91% 43.69% 57.69%

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • gradient_accumulation_steps: 1
  • total_train_batch_size: 32
  • seed: 4242
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • max_steps: 500000
  • fp16: True
  • eval_steps: 1000
  • early_stopping_patience: 100

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.11.0+cu113
  • Datasets 2.3.2
  • Tokenizers 0.12.1