Lingalingeswaran
commited on
Commit
•
aed56c0
1
Parent(s):
504ae80
Update README.md
Browse files
README.md
CHANGED
@@ -17,4 +17,49 @@ tags:
|
|
17 |
|
18 |
|
19 |
|
20 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
|
19 |
|
20 |
+
---
|
21 |
+
|
22 |
+
# Model Name
|
23 |
+
A brief description of the model and its purpose.
|
24 |
+
|
25 |
+
## Model Overview
|
26 |
+
This model is fine-tuned from `openai/whisper-small` using the [Mozilla Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) dataset for language identification and transcription in **Tamil** and **Sinhala**. The model is designed to accurately transcribe spoken audio into text and identify whether the language is Tamil or Sinhala.
|
27 |
+
|
28 |
+
### Key Features:
|
29 |
+
- **Languages**: Tamil, Sinhala
|
30 |
+
- **Base Model**: Whisper-small from OpenAI
|
31 |
+
- **Dataset**: Mozilla Common Voice 17.0
|
32 |
+
|
33 |
+
## Intended Use
|
34 |
+
The model is designed for automatic speech recognition (ASR) in Tamil and Sinhala, making it suitable for transcription and language identification in real-time applications.
|
35 |
+
|
36 |
+
## Training Details
|
37 |
+
This model was fine-tuned using a subset of the Mozilla Common Voice dataset. The dataset contains `X` samples of Tamil and `Y` samples of Sinhala.
|
38 |
+
|
39 |
+
### Fine-tuning Process:
|
40 |
+
- The fine-tuning was performed on `Whisper-small`, a smaller version of OpenAI's Whisper model, for reduced latency and higher accuracy for low-resource languages.
|
41 |
+
- The model was trained for `Z` epochs on a `Google Colab Pro` environment.
|
42 |
+
|
43 |
+
## Performance
|
44 |
+
The model achieved a **Word Error Rate (WER)** of `32%` on Tamil and `28%` on Sinhala, using a validation dataset with `X` hours of audio.
|
45 |
+
We expect further improvements with continued training.
|
46 |
+
|
47 |
+
## Usage
|
48 |
+
You can use this model with the following code:
|
49 |
+
|
50 |
+
```python
|
51 |
+
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
52 |
+
import torch
|
53 |
+
|
54 |
+
model = WhisperForConditionalGeneration.from_pretrained("your_model_name")
|
55 |
+
processor = WhisperProcessor.from_pretrained("your_model_name")
|
56 |
+
|
57 |
+
# Example audio input
|
58 |
+
audio = "path_to_audio_file"
|
59 |
+
|
60 |
+
inputs = processor(audio, return_tensors="pt", padding="longest")
|
61 |
+
with torch.no_grad():
|
62 |
+
predicted_ids = model.generate(inputs.input_ids)
|
63 |
+
|
64 |
+
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
|
65 |
+
print(transcription)
|