---
license: mit
language:
- ru
metrics:
- f1
- roc_auc
- precision
- recall
pipeline_tag: text-classification
tags:
- rubert
- emotion
- emotion-classification
datasets:
- cedr
---

This is [RuBERT-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for __emotion classification__ of short __Russian__ texts.
The task is a __multi-label classification__ with the following labels:

```yaml
0: no_emotion
1: joy
2: sadness
3: surprise
4: fear
5: anger
```

## Usage

```python
from transformers import pipeline
model = pipeline(model="seara/rubert-tiny2-cedr")
model("Привет, ты мне нравишься!")
# [{'label': 'joy', 'score': 0.9605025053024292}]
```

## Dataset

This model was trained on the [CEDR dataset](https://huggingface.co/datasets/cedr).

An overview of the training data can be found in the source [article](https://www.sciencedirect.com/science/article/pii/S1877050921013247).

## Training

Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters:

```yaml
tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 30
```

## Eval results (on test split)

|         |no_emotion|joy   |sadness|surprise|fear   |anger |micro avg|macro avg|weighted avg|samples avg|
|---------|----------|------|-------|--------|-------|------|---------|---------|------------|-----------|
|precision|0.8176    |0.8371|0.8425 |0.7902  |0.7833 |0.5467|0.811    |0.7696   |0.8034      |0.7811     |
|recall   |0.8365    |0.83  |0.847  |0.6647  |0.6667 |0.328 |0.776    |0.6955   |0.776       |0.7792     |
|f1-score |0.8269    |0.8336|0.8447 |0.722   |0.7203 |0.41  |0.7931   |0.7263   |0.787       |0.7788     |
|support  |734.0     |353.0 |379.0  |170.0   |141.0  |125.0 |1902.0   |1902.0   |1902.0      |1902.0     |
|auc-roc  |0.9241    |0.9649|0.9557 |0.913   |0.9118 |0.7732|0.9355   |0.9071   |0.9261      |           |