Monsoon-Whisper-Medium-Gigaspeech2

Monsoon-Whisper-Medium-GigaSpeech2 is a 🇹🇭 Thai Automatic Speech Recognition (ASR) model. It is based on Whisper-Medium and fine-tuned on GigaSpeech2.

Originally developed as a scale experiment for research on emergent capabilities in ASR tasks. It performs well in the wild, including with audio sourced from YouTube and in noisy environments.

More details can be found in our Typhoon-Audio Release Blog.

Model Description

  • Model type: Whisper Medium.
  • Requirement: transformers 4.38.0 or newer.
  • Primary Language(s): Thai 🇹🇭
  • License: Apache 2.0

Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
import torch

model_path = "scb10x/monsoon-whisper-medium-gigaspeech2"
device = "cuda"
filepath = 'audio.wav'

processor = WhisperProcessor.from_pretrained(model_path)
model = WhisperForConditionalGeneration.from_pretrained(
    model_path, torch_dtype=torch.bfloat16
)
model.to(device)
model.eval()

model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="th", task="transcribe"
)
array, sr = torchaudio.load(filepath)
input_features = (
    processor(array, sampling_rate=sr, return_tensors="pt")
    .to(device)
    .to(torch.bfloat16)
    .input_features
)
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

Evaluation Results

Model WER (GS2) WER (CV17) CER (GS2) CER (CV17)
whisper-large-v3 37.02 22.63 24.03 8.49
whisper-medium 55.64 43.01 37.55 16.41
biodatlab-whisper-th-medium-combined 31.00 14.25 21.20 5.69
biodatlab-whisper-th-large-v3-combined 29.02 15.72 19.96 6.32
monsoon-whisper-medium-gigaspeech2 22.74 20.79 14.15 6.92

Intended Uses & Limitations

This model is experimental and may not always be accurate. Developers should carefully assess potential risks in the context of their specific applications.

Follow us & Support

Typhoon Team

Kunat Pipatanakul, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Krisanapong Jirayoot, Pathomporn Chokchainant, Kasima Tharnpipitchai

Downloads last month
337
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including scb10x/monsoon-whisper-medium-gigaspeech2