Monsoon-Whisper-Medium-Gigaspeech2

Monsoon-Whisper-Medium-GigaSpeech2 is a 🇹🇭 Thai Automatic Speech Recognition (ASR) model. It is based on Whisper-Medium and fine-tuned on GigaSpeech2.

Originally developed as a scale experiment for research on emergent capabilities in ASR tasks. It performs well in the wild, including with audio sourced from YouTube and in noisy environments.

More details can be found in our Typhoon-Audio Release Blog.

Model Description

Model type: Whisper Medium.
Requirement: transformers 4.38.0 or newer.
Primary Language(s): Thai 🇹🇭
License: Apache 2.0

Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
import torch

model_path = "scb10x/monsoon-whisper-medium-gigaspeech2"
device = "cuda"
filepath = 'audio.wav'

processor = WhisperProcessor.from_pretrained(model_path)
model = WhisperForConditionalGeneration.from_pretrained(
    model_path, torch_dtype=torch.bfloat16
)
model.to(device)
model.eval()

model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="th", task="transcribe"
)
array, sr = torchaudio.load(filepath)
input_features = (
    processor(array, sampling_rate=sr, return_tensors="pt")
    .to(device)
    .to(torch.bfloat16)
    .input_features
)
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

Evaluation Results

Model	WER (GS2)	WER (CV17)	CER (GS2)	CER (CV17)
whisper-large-v3	37.02	22.63	24.03	8.49
whisper-medium	55.64	43.01	37.55	16.41
biodatlab-whisper-th-medium-combined	31.00	14.25	21.20	5.69
biodatlab-whisper-th-large-v3-combined	29.02	15.72	19.96	6.32
monsoon-whisper-medium-gigaspeech2	22.74	20.79	14.15	6.92

Intended Uses & Limitations

This model is experimental and may not always be accurate. Developers should carefully assess potential risks in the context of their specific applications.

Follow us & Support

Typhoon Team

Kunat Pipatanakul, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Krisanapong Jirayoot, Pathomporn Chokchainant, Kasima Tharnpipitchai

scb10x
/

monsoon-whisper-medium-gigaspeech2