Llama-3.2-1B-imdb / README.md
yash3056's picture
Fix: update example in readme
2645f7a verified
metadata
license: llama3.2
datasets:
  - stanfordnlp/imdb
language:
  - en
metrics:
  - accuracy
base_model:
  - meta-llama/Llama-3.2-1B
new_version: yash3056/Llama-3.2-1B-imdb
pipeline_tag: text-classification
library_name: transformers
tags:
  - transformers
  - pytorch
  - llama
  - llama-3
  - 1b

Model Details

Model Description

Uses

This model is designed for text classification tasks, specifically for binary sentiment analysis on datasets like IMDb, where the goal is to classify text as positive or negative. It can be used by data scientists, researchers, and developers to build applications for sentiment analysis, content moderation, or customer feedback analysis. The model can be fine-tuned for other binary or multi-class classification tasks in domains like social media monitoring, product reviews, and support ticket triage. Foreseeable users include AI researchers, developers, and businesses looking to automate text analysis at scale.

Direct Use

This model can be used directly to identify sentiments from text-based reviews, such as classifying whether a movie or product review is positive or negative. Without any further fine-tuning, it performs well on binary sentiment analysis tasks and can be employed out of the box for various applications like analyzing customer feedback, monitoring social media opinions, or automating sentiment tagging. The model is ideal for scenarios where sentiment needs to be quickly assessed from textual input without the need for deeper customizations.

Downstream Use

Fine-tuning for Binary Classification

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load IMDb dataset for binary classification
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("yash3056/Llama-3.2-1B-imdb")

# Tokenize the dataset
def preprocess(example):
    return tokenizer(example['text'], truncation=True, padding='max_length', max_length=128)

tokenized_datasets = dataset.map(preprocess, batched=True)

# Load model for binary classification (num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("yash3056/Llama-3.2-1B-imdb", num_labels=2)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)

# Fine-tune the model
trainer.train()

Fine-tuning for Multi-Class Classification

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load AG News dataset for multi-class classification (4 labels)
dataset = load_dataset("ag_news")
tokenizer = AutoTokenizer.from_pretrained("yash3056/Llama-3.2-1B-imdb")

# Tokenize the dataset
def preprocess(example):
    return tokenizer(example['text'], truncation=True, padding='max_length', max_length=128)

tokenized_datasets = dataset.map(preprocess, batched=True)

# Load model for multi-class classification (num_labels=4)
model = AutoModelForSequenceClassification.from_pretrained("yash3056/Llama-3.2-1B-imdb", num_labels=4)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)

# Fine-tune the model
trainer.train()

[More Information Needed] -->

Bias, Risks, and Limitations

While this model is effective for text classification and sentiment analysis, it has certain limitations and potential biases. The training data, such as the IMDb dataset, may contain inherent biases related to language use, cultural context, or demographics of reviewers, which could influence the model’s predictions. For example, the model might struggle with nuanced sentiment, sarcasm, or slang, leading to misclassifications. Additionally, it could exhibit biases toward particular opinions or groups if those were overrepresented or underrepresented in the training data.

The model is also limited to binary sentiment classification, meaning it may oversimplify more complex emotional states expressed in text. Users should be cautious when applying the model in sensitive domains such as legal, medical, or psychological settings, where misclassification could have serious consequences. Proper review and adjustment of predictions are recommended, especially in high-stakes applications.

Recommendations

Users (both direct and downstream) should be aware of the potential risks, biases, and limitations inherent in this model. Given that the model may reflect biases present in the training data, it is recommended that users critically evaluate the model’s performance on specific datasets or contexts where fairness and accuracy are essential.

For applications in sensitive areas like legal, healthcare, or hiring decisions, additional care should be taken to review the model's predictions, possibly combining them with human oversight. Fine-tuning the model on domain-specific data or implementing bias mitigation techniques can help reduce unintended bias. Additionally, regular re-evaluation and monitoring of the model in production environments are encouraged to ensure it continues to meet desired ethical and performance standards.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load Model and tokenizers
tokenizer = AutoTokenizer.from_pretrained("yash3056/Llama-3.2-1B-imdb")
model = AutoModelForSequenceClassification.from_pretrained("yash3056/Llama-3.2-1B-imdb", num_labels=n) #n is the number of labels in the code

Training Details

Training Data

The model was trained on the IMDb dataset, a widely used benchmark for binary sentiment classification tasks. The dataset consists of movie reviews labeled as positive or negative, making it suitable for training models to understand sentiment in text. The dataset contains 50,000 reviews in total, evenly split between positive and negative labels, providing a balanced dataset for training and evaluation. Preprocessing involved tokenizing the text using the AutoTokenizer from Hugging Face's Transformers library, truncating and padding the sequences to a maximum length of 512 tokens. The training data was further split into training and validation sets with an 80-20 ratio.

More information about the IMDb dataset can be found here.

Training Procedure

Training Procedure The training procedure used the Llama-3.2-1B model with modifications to suit the binary sentiment classification task. Training was performed for 10 epochs using a batch size of 8 and the AdamW optimizer with a learning rate of 3e-5. The learning rate was adjusted with a linear schedule, including a warmup of 40% of the total steps. The model was fine-tuned using the IMDb training dataset and evaluated on a separate test set.

Validation and evaluation metrics were calculated after each epoch, including accuracy, precision, recall, F1-score, and ROC-AUC. The final model was saved after the last epoch, along with the tokenizer. Several plots, such as loss curves, accuracy curves, confusion matrix, and ROC curve, were generated to visually assess the model's performance.

Preprocessing [optional]

Text data was preprocessed by tokenizing with the Llama-3.2-1B model tokenizer. Sequences were truncated and padded to a maximum length of 512 tokens to ensure consistent input sizes for the model. Labels were encoded as integers (0 for negative and 1 for positive) for compatibility with the model.

Evaluation

Training Loss: 0.0030, Accuracy: 0.9999 Validation Loss: 0.1196, Accuracy: 0.9628

Testing Data, Factors & Metrics

Testing Data

Test Loss: 0.1315 Test Accuracy: 0.9604 Precision: 0.9604 Recall: 0.9604 F1-score: 0.9604 AUC: 0.9604

Summary

Technical Specifications

Hardware

Intel® Data Center GPU Max 1550

Model Card Authors

-Yash Prakash Narayan (github)