license: mit
datasets:
- mozilla-foundation/common_voice_17_0
language:
- en
- ta
metrics:
- wer
base_model:
- openai/whisper-small
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- language-identification
- speech-to-text
Model Name
A brief description of the model and its purpose.
Model Overview
This model is fine-tuned from openai/whisper-small
using the Mozilla Common Voice 17.0 dataset for language identification and transcription in Tamil and Sinhala. The model is designed to accurately transcribe spoken audio into text and identify whether the language is Tamil or Sinhala.
Key Features:
- Languages: Tamil, Sinhala
- Base Model: Whisper-small from OpenAI
- Dataset: Mozilla Common Voice 17.0
Intended Use
The model is designed for automatic speech recognition (ASR) in Tamil and Sinhala, making it suitable for transcription and language identification in real-time applications.
Training Details
This model was fine-tuned using a subset of the Mozilla Common Voice dataset. The dataset contains X
samples of Tamil and Y
samples of Sinhala.
Fine-tuning Process:
- The fine-tuning was performed on
Whisper-small
, a smaller version of OpenAI's Whisper model, for reduced latency and higher accuracy for low-resource languages. - The model was trained for
Z
epochs on aGoogle Colab Pro
environment.
Performance
The model achieved a Word Error Rate (WER) of 32%
on Tamil and 28%
on Sinhala, using a validation dataset with X
hours of audio.
We expect further improvements with continued training.
Usage
You can use this model with the following code:
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
model = WhisperForConditionalGeneration.from_pretrained("your_model_name")
processor = WhisperProcessor.from_pretrained("your_model_name")
# Example audio input
audio = "path_to_audio_file"
inputs = processor(audio, return_tensors="pt", padding="longest")
with torch.no_grad():
predicted_ids = model.generate(inputs.input_ids)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)