File size: 4,131 Bytes
e29eabb 1611116 f664777 1611116 5009d15 46b6413 5009d15 16b199f 3a5a5c0 16b199f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: mit
language: ["ru"]
tags:
- russian
- classification
- emotion
- emotion-detection
- emotion-recognition
- multiclass
widget:
- text: "Как дела?"
- text: "Дурак твой дед"
- text: "Только попробуй!!!"
- text: "Не хочу в школу("
- text: "Сейчас ровно час дня"
- text: "А ты уверен, что эти полоски снизу не врут? Точно уверен? Вот прям 100 процентов?"
datasets:
- Aniemore/cedr-m7
model-index:
- name: RuBERT tiny2 For Russian Text Emotion Detection by Ilya Lubenets
results:
- task:
name: Multilabel Text Classification
type: multilabel-text-classification
dataset:
name: CEDR M7
type: Aniemore/cedr-m7
args: ru
metrics:
- name: multilabel accuracy
type: accuracy
value: 85%
- task:
name: Text Classification
type: text-classification
dataset:
name: CEDR M7
type: Aniemore/cedr-m7
args: ru
metrics:
- name: accuracy
type: accuracy
value: 76%
---
# First - you should prepare few functions to talk to model
```python
import torch
from transformers import BertForSequenceClassification, AutoTokenizer
LABELS = ['neutral', 'happiness', 'sadness', 'enthusiasm', 'fear', 'anger', 'disgust']
tokenizer = AutoTokenizer.from_pretrained('Aniemore/rubert-tiny2-russian-emotion-detection')
model = BertForSequenceClassification.from_pretrained('Aniemore/rubert-tiny2-russian-emotion-detection')
@torch.no_grad()
def predict_emotion(text: str) -> str:
"""
We take the input text, tokenize it, pass it through the model, and then return the predicted label
:param text: The text to be classified
:type text: str
:return: The predicted emotion
"""
inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
outputs = model(**inputs)
predicted = torch.nn.functional.softmax(outputs.logits, dim=1)
predicted = torch.argmax(predicted, dim=1).numpy()
return LABELS[predicted[0]]
@torch.no_grad()
def predict_emotions(text: str) -> list:
"""
It takes a string of text, tokenizes it, feeds it to the model, and returns a dictionary of emotions and their
probabilities
:param text: The text you want to classify
:type text: str
:return: A dictionary of emotions and their probabilities.
"""
inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
outputs = model(**inputs)
predicted = torch.nn.functional.softmax(outputs.logits, dim=1)
emotions_list = {}
for i in range(len(predicted.numpy()[0].tolist())):
emotions_list[LABELS[i]] = predicted.numpy()[0].tolist()[i]
return emotions_list
```
# And then - just gently ask a model to predict your emotion
```python
simple_prediction = predict_emotion("Какой же сегодня прекрасный день, братья")
not_simple_prediction = predict_emotions("Какой же сегодня прекрасный день, братья")
print(simple_prediction)
print(not_simple_prediction)
# happiness
# {'neutral': 0.0004941817605867982, 'happiness': 0.9979524612426758, 'sadness': 0.0002536600804887712, 'enthusiasm': 0.0005498139653354883, 'fear': 0.00025326196919195354, 'anger': 0.0003583927755244076, 'disgust': 0.00013807788491249084}
```
# Or, just simply use [our package (GitHub)](https://github.com/aniemore/Aniemore), that can do whatever you want (or maybe not)
🤗
# Citations
```
@misc{Aniemore,
author = {Артем Аментес, Илья Лубенец, Никита Давидчук},
title = {Открытая библиотека искусственного интеллекта для анализа и выявления эмоциональных оттенков речи человека},
year = {2022},
publisher = {Hugging Face},
journal = {Hugging Face Hub},
howpublished = {\url{https://huggingface.com/aniemore/Aniemore}},
email = {hello@socialcode.ru}
}
``` |