Edit model card

Multilingual Hate Speech Classifier for Social Media Content

A multilingual XLM-R-based (100 languages) hate speech classification model fine-tuned on English, Italian and Slovenian data. Paper out soon...

Authors: Patricia-Carla Grigor, Bojan Evkoski, Petra Kralj Novak

Data available here: English; Italian; Slovenian

Model output The model classifies each input into one of four distinct classes:

  • 0 - appropriate
  • 1 - inappropriate
  • 2 - offensive
  • 3 - violent

Training data

  • 103k English Youtube comments
  • 119k Italian Youtube comments
  • 50k Slovenian Twitter comments

Evaluation data

  • 20k English Youtube comments
  • 21k Italian Youtube comments
  • 10k Slovenian Twitter comments

Fine-tuning hyperparameters

  num_train_epochs=3,
  train_batch_size=8,
  learning_rate=6e-6

Evaluation Results Model agreement (accuracy) vs. Inter-annotator agreement (0 - no agreement; 100 - perfect agreement):

Model-annotator Agreement Inter-annotator Agreement
English 79.97 82.91
Italian 82.00 81.79
Slovenian 78.84 79.43

Class-specific model F1-scores:

Appropriate Inappropriate Offensive Violent
English 86.10 39.16 68.24 27.82
Italian 89.77 58.45 60.42 44.97
Slovenian 84.30 45.22 69.69 24.79

Usage

from transformers import AutoModelForSequenceClassification, TextClassificationPipeline, AutoTokenizer, AutoConfig

MODEL = "IMSyPP/hate_speech_multilingual"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True,
task='sentiment_analysis', device=0, function_to_apply="none")
pipe([
"Thank you for using our model",
"Grazie per aver utilizzato il nostro modello"
"Hvala za uporabo našega modela"
])
Downloads last month
115
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for IMSyPP/hate_speech_multilingual

Finetuned
this model