README.md · seara/rubert-tiny2-russian-sentiment at dd9f2a394a71e38dd635dcb1ca0af5dc7ca166a5

rubert-tiny2-russian-sentiment / README.md

seara

Create README.md

dd9f2a3 about 1 year ago

preview code

raw

history blame

No virus

1.97 kB

	---
	license: mit
	language:
	- ru
	metrics:
	- f1
	- roc_auc
	- precision
	- recall
	pipeline_tag: text-classification
	tags:
	- rubert
	- sentiment
	datasets:
	- sismetanin/rureviews
	- RuSentiment
	- LinisCrowd2015
	- LinisCrowd2016
	- KaggleRussianNews
	---

	This is [RuBERT-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for __sentiment classification__ of short __Russian__ texts.
	The task is a __multi-class classification__ with the following labels:

	```yaml
	0: neutral
	1: positive
	2: negative
	```

	## Usage

	```python
	from transformers import pipeline
	model = pipeline(model="seara/rubert-tiny2-russian-sentiment")
	model("Привет, ты мне нравишься!")
	# [{'label': 'positive', 'score': 0.9398769736289978}]
	```

	## Dataset

	This model was trained on the union of the following datasets:

	- Kaggle Russian News Dataset
	- Linis Crowd 2015
	- Linis Crowd 2016
	- RuReviews
	- RuSentiment

	An overview of the training data can be found on [S. Smetanin Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian).

	__Download links for all Russian sentiment datasets collected by Smetanin can be found in this [repository](https://github.com/searayeah/russian-sentiment-emotions-datasets).__

	## Training

	Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters:

	```yaml
	max_length: 512
	batch_size: 64
	optimizer: adam
	lr: 0.00001
	weight_decay: 0
	num_epochs: 5
	```

	Train/validation/test splits are 80%/10%/10%.

	## Eval results (on test split)


	\| \|neutral\|positive\|negative\|macro avg\|weighted avg\|
	\|---------\|-------\|--------\|--------\|---------\|------------\|
	\|precision\|0.69 \|0.83 \|0.74 \|0.75 \|0.75 \|
	\|recall \|0.73 \|0.82 \|0.68 \|0.75 \|0.75 \|
	\|f1-score \|0.71 \|0.83 \|0.71 \|0.75 \|0.75 \|
	\|support \|5196 \|3831 \|3599 \|12626 \|12626 \|
	\|auc-roc \|0.84 \|0.95 \|0.90 \|0.90 \|0.89 \|