Model Card for Model ID

This model is a fine-tuned version of oyqiz/uzbek_stt based mainly on laws and military related dataset.

Model Details

Model Description

  • Developed by: Sara Musaeva
  • Funded by: SSD
  • Model type: Transformers
  • Language(s) (NLP): Uzbek
  • Finetuned from model: Oyqiz/uzbek-stt

Model Sources

Uses

Intended for Speech-to-text conversion

How to Get Started with the Model

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import torchaudio

model_name = "sarahai/uzbek-stt-3"
model = Wav2Vec2ForCTC.from_pretrained(model_name)
processor = Wav2Vec2Processor.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def load_and_preprocess_audio(file_path):
    speech_array, sampling_rate = torchaudio.load(file_path)
    if sampling_rate != 16000:
        resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate, new_freq=16000)
        speech_array = resampler(speech_array)
    return speech_array.squeeze().numpy()

def replace_unk(transcription):
    return transcription.replace("[UNK]", "สผ")

audio_file = "/content/audio_2024-08-13_15-20-53.ogg"
speech_array = load_and_preprocess_audio(audio_file)

input_values = processor(speech_array, sampling_rate=16000, return_tensors="pt").input_values.to(device)

with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

transcription_text = replace_unk(transcription[0])

print("Transcription:", transcription_text)
Downloads last month
16
Safetensors
Model size
315M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using sarahai/uzbek-stt-3 1