EIStakovskii
/

xlm_roberta_base_multilingual_toxicity_classifier_plus

Text Classification

Inference Endpoints

Model card Files Files and versions Community

EIStakovskii commited on Oct 25, 2022

Commit

a9cc25a

•

1 Parent(s): 5e139d7

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -27,3 +27,11 @@ widget:
 license: other
 ---

 license: other
 ---
+This model is trained for multilingual toxicity labeling. Label_1 means TOXIC, Label_0 means NOT_TOXIC.
+The model was fine-tuned based off the xlm_roberta_base model for 4 languages: EN, RU, FR, DE
+The validation accuracy is 92%.
+The model was finetuned on the total sum of 100933k sentences. The train data for English and Russian came from https://github.com/s-nlp/multilingual_detox, French data comprised the translated to French data from https://github.com/s-nlp/multilingual_detox as well as all the French data from the Jigsaw dataset, the German data was similarly composed using translations and semi-manual data collection techniquies.