M2SA / README.md
thak123's picture
Update README.md
5c08785 verified
metadata
base_model: cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual
tags:
  - generated_from_trainer
datasets:
  - all
metrics:
  - precision
  - recall
  - f1
model-index:
  - name: twitter-xlmr-clip-finetuned-all-123
    results: []

twitter-xlmr-clip-finetuned-all-123

This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual on the all dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7405
  • Precision: 0.6431
  • Recall: 0.6554
  • F1: 0.6401

Model description

More information needed

Usage

To use the model use the following script. Kindly refer to the app.py for the Transform and VisionTextDualEncoderModel class definitions.

import torch
import torch.nn as nn

import torchvision
from torchvision.transforms import CenterCrop, ConvertImageDtype, Normalize, Resize
from torchvision.transforms.functional import InterpolationMode
from torchvision import transforms
from torchvision.io import ImageReadMode, read_image


from transformers import CLIPModel, AutoModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_model

from datasets import load_dataset, load_metric
from transformers import (
    AutoConfig,
AutoImageProcessor,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    logging,
)

id2label = {0: "negative", 1: "neutral", 2: "positive"}
label2id = {"negative": 0, "neutral": 1, "positive": 2}

tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual")

model = VisionTextDualEncoderModel(num_classes=3)
config = model.vision_encoder.config

# https://huggingface.co/FFZG-cleopatra/M2SA/blob/main/model.safetensors
sf_filename = hf_hub_download("FFZG-cleopatra/M2SA", filename="model.safetensors")

load_model(model, sf_filename) 
image_processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32")

def predict_sentiment(text, image):
    # read the image file   
    image = read_image(image, mode=ImageReadMode.RGB)
       
    text_inputs = tokenizer(
            text,
            max_length=512,
            padding="max_length",
            truncation=True,
            return_tensors="pt"
        )
    
    image_transformations = Transform(
        config.vision_config.image_size,
        image_processor.image_mean,
        image_processor.image_std,
    )
    image_transformations = torch.jit.script(image_transformations)
    pixel_values = image_transformations(image)
    text_inputs["pixel_values"] = pixel_values.unsqueeze(0)
   
    prediction = None
    with torch.no_grad():
        outputs = model(**text_inputs)
        print(outputs)
        prediction = np.argmax(outputs["logits"], axis=-1)
        print(id2label[prediction[0].item()])
    return id2label[prediction[0].item()]

text = "I feel good today"
image = "link-to-image"
predict_sentiment(text, image)

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 123
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1
0.6444 0.06 500 0.8771 0.6905 0.4537 0.4197
0.5499 0.12 1000 0.8167 0.7197 0.4260 0.4117
0.5357 0.18 1500 0.8084 0.7263 0.4696 0.4424
0.5175 0.24 2000 0.8704 0.6666 0.4266 0.3717
0.5285 0.3 2500 0.9067 0.7529 0.4565 0.4221
0.5081 0.36 3000 0.7414 0.7655 0.6114 0.6356
0.506 0.42 3500 0.8713 0.5830 0.6591 0.5786
0.5049 0.48 4000 0.7514 0.5551 0.4568 0.4464
0.4999 0.54 4500 0.7584 0.6519 0.5502 0.5767
0.507 0.6 5000 0.8072 0.6479 0.5626 0.5636
0.5048 0.66 5500 0.8080 0.6260 0.5725 0.5730
0.4907 0.72 6000 0.7966 0.6976 0.5138 0.5224
0.493 0.78 6500 0.8193 0.7099 0.4949 0.4922
0.4668 0.84 7000 0.7502 0.6282 0.6942 0.6501
0.4717 0.9 7500 0.7636 0.6372 0.5109 0.5191
0.4774 0.96 8000 0.7652 0.7513 0.5360 0.5587
0.4676 1.02 8500 0.8482 0.6372 0.5918 0.5836
0.4361 1.08 9000 0.7456 0.6687 0.5177 0.5175
0.4536 1.14 9500 0.8449 0.7363 0.5160 0.5156
0.4277 1.2 10000 0.8648 0.6382 0.5247 0.5173
0.4444 1.26 10500 0.8723 0.5871 0.6622 0.5959
0.4269 1.32 11000 0.7856 0.6151 0.5521 0.5526
0.4322 1.38 11500 0.7405 0.6431 0.6554 0.6401
0.4435 1.44 12000 0.7682 0.6568 0.5751 0.5923
0.4429 1.5 12500 0.8824 0.5956 0.6006 0.5545
0.4381 1.56 13000 0.7879 0.4457 0.4727 0.4395
0.4389 1.62 13500 0.7555 0.6260 0.6984 0.6502
0.4529 1.68 14000 0.7981 0.6621 0.5546 0.5663
0.4509 1.74 14500 0.7827 0.6160 0.6321 0.6172
0.4413 1.8 15000 0.7895 0.6381 0.6357 0.6285
0.4198 1.86 15500 0.8345 0.5940 0.5526 0.5602
0.4415 1.92 16000 0.8746 0.6615 0.6612 0.6459
0.443 1.98 16500 0.8155 0.6516 0.5265 0.5352
0.4068 2.04 17000 0.7642 0.5838 0.6220 0.5975
0.3905 2.1 17500 0.7929 0.6720 0.5555 0.5740
0.3969 2.16 18000 0.8949 0.5330 0.4771 0.4687
0.3841 2.22 18500 0.9233 0.6028 0.5410 0.5492
0.4031 2.28 19000 0.7720 0.6089 0.5719 0.5776
0.3878 2.34 19500 0.9046 0.6265 0.5358 0.5318
0.4001 2.41 20000 0.8451 0.6960 0.5622 0.5761
0.3997 2.47 20500 0.8964 0.6170 0.5665 0.5541
0.3945 2.53 21000 0.8001 0.5553 0.5180 0.5195
0.4005 2.59 21500 0.8357 0.5519 0.5100 0.5170
0.3907 2.65 22000 0.8017 0.5884 0.5409 0.5552
0.3858 2.71 22500 0.8283 0.6036 0.5792 0.5862
0.3973 2.77 23000 0.9024 0.5770 0.5665 0.5393
0.3969 2.83 23500 0.8341 0.5642 0.5528 0.5558
0.3911 2.89 24000 0.8966 0.6045 0.5088 0.5070
0.3856 2.95 24500 0.8349 0.6021 0.5586 0.5689
0.3961 3.01 25000 0.9364 0.6119 0.5412 0.5585
0.3301 3.07 25500 0.9542 0.5757 0.6084 0.5813
0.3385 3.13 26000 1.0137 0.5563 0.5294 0.5346
0.3475 3.19 26500 0.9311 0.6359 0.5675 0.5822

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2