T5-based Audio Transcription Fusion Model

This model combines transcriptions from multiple sources separated by '/' to generate an optimal transcription. It is fine-tuned on a dataset where each sample has three candidate transcriptions and a reference transcription.

Training Details

Model trained on 21000 samples for 10 epochs with T5-small as the base model.

Training Loss: 0.004994123708456755

Evaluation Details

Test Loss: 0.011637951454891172 Word Error Rate (WER): 0.0726561850095666

Downloads last month
174
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .