Edit model card

Distilled Small Whisper ASR Model for Thai

Model Description

This is a distilled Automatic Speech Recognition (ASR) model, based on the Whisper architecture. It has been specifically tailored for Thai language speech recognition. The model features 4 decoder layers (vs 12 in teacher model) and has been distilled from a larger teacher model, focusing on enhancing performance and efficiency.

Distillation Details

  • Teacher Model: Small Whisper ASR model
  • Datasets Used for Distillation:
    • Common Voice v13
    • Gowajee
    • Thai Elderly Speech Corpus
    • Custom Scraped Data
    • Thai-Central Dialect from SLSCU Thai Dialect Corpus

Model Performance

  • DeepCut Tokenized WER on Common Voice 13 Test Set:
    • Distilled Model: 11.23%
    • Teacher Model: 13.14%

This shows an improvement in Word Error Rate (WER), indicating enhanced accuracy in speech recognition tasks for the Thai language.

Intended Use

This model is intended for use in applications requiring Thai language speech recognition.

Limitations

  • The model is specifically trained for the Thai language and may not perform well with other languages.
  • Performance might vary across different Thai dialects and accents.
  • As with any ASR system, background noise and speech clarity can impact recognition accuracy.

Acknowledgments

This model was developed using resources and datasets provided by the speech and language technology community. Special thanks to the teams behind Common Voice, Gowajee, SLSCU, and the Thai Elderly Speech Corpus for their valuable datasets.

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.2
  • Datasets 2.16.1
  • Tokenizers 0.15.0

Citation

Cite using Bibtex:

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/distil-whisper-th-small },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}

Downloads last month
353
Safetensors
Model size
206M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.