A newer version of this model is available: suzii/vi-whisper-large-v3-turbo

Fine-tuned Whisper-V3-Turbo for Vietnamese ASR

This project involves fine-tuning the Whisper-V3-Turbo model to improve its performance for Automatic Speech Recognition (ASR) in the Vietnamese language. The model was trained for 240 hours using a single Nvidia A6000 GPU.

Data Sources

The training data comes from various Vietnamese speech corpora. Below is a list of datasets used for training:

  1. capleaf/viVoice
  2. NhutP/VSV-1100
  3. doof-ferb/fpt_fosd
  4. doof-ferb/infore1_25hours
  5. google/fleurs (vi_vn)
  6. doof-ferb/LSVSC
  7. quocanh34/viet_vlsp
  8. linhtran92/viet_youtube_asr_corpus_v2
  9. doof-ferb/infore2_audiobooks
  10. linhtran92/viet_bud500

Model

The model used in this project is the Whisper-V3-Turbo. Whisper is a multilingual ASR model trained on a large and diverse dataset. The version used here has been fine-tuned specifically for the Vietnamese language.

Training Configuration

  • GPU Used: Nvidia A6000
  • Training Time: 240 hours

Usage

To use the fine-tuned model, follow the steps below:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "suzii/vi-whisper-large-v3-turbo-v1"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)
result = pipe("your-audio.mp3", return_timestamps=True)

Acknowledgements

This project would not be possible without the following datasets:

Downloads last month
4
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for suzii/vi-whisper-large-v3-turbo-v1

Finetuned
(140)
this model

Datasets used to train suzii/vi-whisper-large-v3-turbo-v1