Reggie's picture
Update README.md
9684528 verified
|
raw
history blame
1.14 kB
metadata
license: mit

This is the GGUF version of a whisper-small tamil finetune by vasista22.

For use with whisper.cpp

The vanilla OpenAI whisper model is pretty bad at transcribing long chunks of audio in Tamil. It tends to miss out big portions of the text. This model has the same problem but to a lesser extent.

One way around this is to segment your audio into 15-sec chunks and pass each of them separately for transcription. You can do the segmenting with ffmpeg like so:

ffmpeg -i input.wav -f segment -segment_time 15 -c copy output_%03d.wav

This will create files of the type output_000.wav in the same folder. You can change the path as necessary.

When using whisper.cpp on finetuned models, you might want to add the --no-fallback flag to speed things up. See this issue.

You can line up multiple files to transcribe serially in whisper like this: ./main -m ggml-tamil-small-vasista22.bin -t 4 -osrt --no-fallback -f output_000.wav -f output_001.wav etc