whisper-small-ta / README.md
Lingalingeswaran's picture
Update README.md
aed56c0 verified
|
raw
history blame
2.29 kB
metadata
license: mit
datasets:
  - mozilla-foundation/common_voice_17_0
language:
  - en
  - ta
metrics:
  - wer
base_model:
  - openai/whisper-small
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
  - language-identification
  - speech-to-text

Model Name

A brief description of the model and its purpose.

Model Overview

This model is fine-tuned from openai/whisper-small using the Mozilla Common Voice 17.0 dataset for language identification and transcription in Tamil and Sinhala. The model is designed to accurately transcribe spoken audio into text and identify whether the language is Tamil or Sinhala.

Key Features:

  • Languages: Tamil, Sinhala
  • Base Model: Whisper-small from OpenAI
  • Dataset: Mozilla Common Voice 17.0

Intended Use

The model is designed for automatic speech recognition (ASR) in Tamil and Sinhala, making it suitable for transcription and language identification in real-time applications.

Training Details

This model was fine-tuned using a subset of the Mozilla Common Voice dataset. The dataset contains X samples of Tamil and Y samples of Sinhala.

Fine-tuning Process:

  • The fine-tuning was performed on Whisper-small, a smaller version of OpenAI's Whisper model, for reduced latency and higher accuracy for low-resource languages.
  • The model was trained for Z epochs on a Google Colab Pro environment.

Performance

The model achieved a Word Error Rate (WER) of 32% on Tamil and 28% on Sinhala, using a validation dataset with X hours of audio. We expect further improvements with continued training.

Usage

You can use this model with the following code:

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model = WhisperForConditionalGeneration.from_pretrained("your_model_name")
processor = WhisperProcessor.from_pretrained("your_model_name")

# Example audio input
audio = "path_to_audio_file"

inputs = processor(audio, return_tensors="pt", padding="longest")
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_ids)
    
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)