scenAIrio / README.md
martinvanaud's picture
Update README.md
b50e4ea verified
metadata
language:
  - fr
library_name: transformers
tags:
  - pytorch
datasets:
  - martinvanaud/scenario-2043-05042024
pipeline_tag: text-classification

raccord/scenAIrio-classification

Model Description

The scenAIrio-classification-model is designed to classify parts of a movie script or scenario into one of three categories: NOTES, DIALOGUE, or SEQUENCE. It leverages a BERT transformer architecture to understand and classify text based on contextual nuances typical in scripts.

Intended Use

This model is intended for use in applications involving the processing and analysis of movie scripts or scenarios. It can help scriptwriters, editors, and directors to automatically categorize script segments, facilitating easier script breakdowns and edits.

Training Data

The model was trained on a dataset consisting of annotated movie scripts. Each part of the script was labeled as NOTES, DIALOGUE, or SEQUENCE.

Training Procedure

The model was trained using the following training arguments:

  • Output Directory: ./scenAIrio-modal
  • Training: Enabled
  • Evaluation: Enabled
  • Epochs: 3
  • Training Batch Size per Device: 16
  • Evaluation Batch Size per Device: 32
  • Warmup Steps: 100
  • Weight Decay: 0.01
  • Logging: Every 50 steps to ./multi-class-logs
  • Evaluation Strategy: Every 50 steps
  • Save Strategy: Save checkpoints every 50 steps
  • Best Model Loading: At the end of training, the best performing model is loaded

Model Architecture

The model is based on a BERT transformer, specifically adapted for multi-class classification tasks.

Evaluation Results

Phase Loss Accuracy F1-Score Precision Recall
Val 0.21253 93.73% 95.37% 95.53% 95.24%
Train 0.08378 97.94% 98.47% 98.56% 98.39%
Test 0.26723 91.59% 93.49% 93.17% 93.84%

Limitations

  • The model is specifically trained on French-language scripts and may not perform well with scripts in other languages.
  • Performance can vary significantly depending on the specific characteristics and formatting of the input scripts.

Conclusion

The scenAIrio-classification-model provides a robust tool for analyzing and categorizing parts of movie scripts. With high accuracy and precision, it is poised to be a valuable asset in the film and television industry.