Model bertin_base_sentiment_analysis_es

A finetuned model for Sentiment analysis in Spanish

This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container, The base model is Bertin base which is a RoBERTa-base model pre-trained on the Spanish portion of mC4 using Flax. It was trained by the Bertin Project.Link to base model

Article: BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

Dataset

The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.

Sizes of datasets:

  • Train dataset: 42,500
  • Validation dataset: 3,750
  • Test dataset: 3,750

Intended uses & limitations

This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.

Hyperparameters

{
"epochs": "4",
"train_batch_size": "32",    
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"bertin-project/bertin-roberta-base-spanish\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\"",
}

Evaluation results

  • Accuracy = 0.8989333333333334

  • F1 Score = 0.8989063750333421

  • Precision = 0.877147319104633

  • Recall = 0.9217724288840262

Test results

Model in action

Usage for Sentiment Analysis

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("edumunozsala/bertin_base_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/bertin_base_sentiment_analysis_es")

text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"

input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)

Created by Eduardo Muñoz/@edumunozsala

Downloads last month
49
Safetensors
Model size
125M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results