Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ASR_yoruba_Azure

This model achieves the following results on the evaluation set:

  • Loss: 0.2229
  • Wer: 0.2481

Model description

This model, based on the wav2vec2 architecture, has 965 million parameters and was trained on 36 hours of Yoruba audio data from multiple speakers. It currently achieves a Word Error Rate (WER) of twenty-four percent.

Intended uses & limitations

-The model is designed for Automatic Speech Recognition (ASR) in the Yoruba language. -It can be utilized for transcribing spoken Yoruba into text, supporting applications like voice-activated systems, automated transcription services, and linguistic research. -The model's current Word Error Rate (WER) of twenty-four percent indicates room for improvement in transcription accuracy. Performance may be affected by background noise, accents, and variations in speaker pronunciation. -It is optimized for short audio clips (up to five minutes) due to GPU memory constraints

Training and evaluation data

-The model was trained on 36 hours of Yoruba audio data, encompassing various speakers to capture diverse accents and speech patterns. The data includes conversational speech, read speech, and different audio qualities to enhance robustness.

-The evaluation data set used to measure the model's performance included a representative sample of Yoruba speech not seen during training. The WER of twenty-four percent reflects the model's accuracy on this evaluation data, which includes various speech scenarios.

Training procedure

The model was trained using the wav2vec2 architecture, which involves pre-training on large-scale unlabeled data followed by fine-tuning on specific Yoruba audio data. The training process included optimizing model parameters to minimize transcription errors, employing techniques such as data augmentation and regularization to improve performance. Training was performed on high-performance GPUs to handle the large-scale data and model parameters, with iterative evaluations to monitor progress and adjust training strategies.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 400
  • num_epochs: 64
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.2635 1.5238 400 0.2428 0.2582
0.2494 3.0476 800 0.2321 0.2571
0.2438 4.5714 1200 0.2315 0.2517
0.2417 6.0952 1600 0.2282 0.2591
0.2349 7.6190 2000 0.2299 0.2529
0.237 9.1429 2400 0.2301 0.2545
0.2355 10.6667 2800 0.2262 0.2559
0.2321 12.1905 3200 0.2290 0.2527
0.235 13.7143 3600 0.2265 0.2546
0.2289 15.2381 4000 0.2260 0.2551
0.2305 16.7619 4400 0.2267 0.2519
0.2314 18.2857 4800 0.2308 0.2583
0.2283 19.8095 5200 0.2243 0.2486
0.2288 21.3333 5600 0.2288 0.2563
0.2303 22.8571 6000 0.2244 0.2466
0.2275 24.3810 6400 0.2266 0.2471
0.2261 25.9048 6800 0.2264 0.2509
0.2271 27.4286 7200 0.2244 0.2494
0.2321 28.9524 7600 0.2257 0.2477
0.2261 30.4762 8000 0.2243 0.2533
0.2247 32.0 8400 0.2255 0.2449
0.2229 33.5238 8800 0.2268 0.2471
0.2242 35.0476 9200 0.2233 0.2459
0.2299 36.5714 9600 0.2268 0.2527
0.2272 38.0952 10000 0.2248 0.2471
0.2242 39.6190 10400 0.2249 0.2462
0.2249 41.1429 10800 0.2245 0.2469
0.2244 42.6667 11200 0.2249 0.2534
0.2264 44.1905 11600 0.2247 0.2457
0.2252 45.7143 12000 0.2237 0.2464
0.2239 47.2381 12400 0.2240 0.2495
0.2268 48.7619 12800 0.2240 0.2494
0.2264 50.2857 13200 0.2243 0.2528
0.2244 51.8095 13600 0.2238 0.2495
0.2236 53.3333 14000 0.2226 0.2475
0.2266 54.8571 14400 0.2230 0.2470
0.225 56.3810 14800 0.2232 0.2453
0.2233 57.9048 15200 0.2227 0.2467
0.223 59.4286 15600 0.2226 0.2496
0.224 60.9524 16000 0.2226 0.2472
0.2225 62.4762 16400 0.2229 0.2481

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
965M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train FarmerlineML/ASR_yoruba_AI4G