--- license: mit language: - ru metrics: - f1 - roc_auc - precision - recall pipeline_tag: text-classification tags: - rubert - emotion - emotion-classification datasets: - cedr --- This is [RuBERT-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for __emotion classification__ of short __Russian__ texts. The task is a __multi-label classification__ with the following labels: ```yaml 0: no_emotion 1: joy 2: sadness 3: surprise 4: fear 5: anger ``` ## Usage ```python from transformers import pipeline model = pipeline(model="seara/rubert-tiny2-cedr") model("Привет, ты мне нравишься!") # [{'label': 'joy', 'score': 0.9605025053024292}] ``` ## Dataset This model was trained on the [CEDR dataset](https://huggingface.co/datasets/cedr). An overview of the training data can be found in the source [article](https://www.sciencedirect.com/science/article/pii/S1877050921013247). ## Training Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters: ```yaml tokenizer.max_length: null batch_size: 64 optimizer: adam lr: 0.00001 weight_decay: 0 num_epochs: 30 ``` ## Eval results (on test split) | |no_emotion|joy |sadness|surprise|fear |anger |micro avg|macro avg|weighted avg|samples avg| |---------|----------|------|-------|--------|-------|------|---------|---------|------------|-----------| |precision|0.8176 |0.8371|0.8425 |0.7902 |0.7833 |0.5467|0.811 |0.7696 |0.8034 |0.7811 | |recall |0.8365 |0.83 |0.847 |0.6647 |0.6667 |0.328 |0.776 |0.6955 |0.776 |0.7792 | |f1-score |0.8269 |0.8336|0.8447 |0.722 |0.7203 |0.41 |0.7931 |0.7263 |0.787 |0.7788 | |support |734.0 |353.0 |379.0 |170.0 |141.0 |125.0 |1902.0 |1902.0 |1902.0 |1902.0 | |auc-roc |0.9241 |0.9649|0.9557 |0.913 |0.9118 |0.7732|0.9355 |0.9071 |0.9261 | |