Edit model card
Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Model Details

Model Description

Whisper large-v3 trained on common-voice-13 Hindi dataset using LoRA

Model Sources

Uses

  • Automatic Speech Recognition (ASR)

Direct Use

from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, WhisperProcessor

peft_model_id = "kasunw/whisper-large-v3-hindi"

peft_config = PeftConfig.from_pretrained(peft_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
    peft_config.base_model_name_or_path, device_map="auto", torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(model, peft_model_id)
model.config.use_cache = True

processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language="Hindi", task="transcribe")
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=model.device,
)

path_to_audio = "audio.mp3"

result = pipe(path_to_audio)
print(result["text"])

Training Details

Training Data

common-voice-13.0 Hindi Portion

Training Procedure

Followed the instruction given in this notebook

Training Hyperparameters

  • per_device_train_batch_size=16
  • gradient_accumulation_steps=1
  • learning_rate=1e-5
  • warmup_steps=50
  • fp16=True
  • max_steps=1000

Metrics

  • word error rate (WER)
Downloads last month
15
Inference Examples
Inference API (serverless) does not yet support peft models for this pipeline type.

Dataset used to train kasunw/whisper-large-v3-hindi