Spaces:
Runtime error
Is it possible to add faster-whisper as a backend?
Faster-whisper is around 4x faster on GPU. https://github.com/guillaumekln/faster-whisper
Is it possible to add this as a backend?
Perhaps a drop-down/CLI flag where users can choose between the default whisper and faster-whisper?
That's quite an optimization - I was able to run the large-v2
model on my RTX 2080 Super 8GB, unlike the default Whisper implementation. The lower memory requirements alone can really help make Whisper easier to deploy. It also appears to be a lot faster, the 4x figure is probably not far off.
So I spent some time adding it as a backend to the WebUI, and it is now done. Though in order to run it, it is recommended that you create a new virtual environment, install CUDA and cuDNN, and then the requirements for fast-whisper:
pip install -r requirements-fasterWhisper.txt
Then you can switch to fast-whisper in the UI/CLI using a command line argument:
python app.py --whisper_implementation faster-whisper --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True
You can also use the environment variable WHISPER_IMPLEMENTATION
, or change the field whisper_implementation
in config.json5
.
Finally, I've also published this as a Docker container at registry.gitlab.com/aadnk/whisper-webui:latest-fastest
:
sudo docker run -d --gpus all -p 7860:7860 \
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
--mount type=bind,source=/home/administrator/.cache/huggingface,target=/root/.cache/huggingface \
--restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest-fastest \
app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True \
--default_vad silero-vad --default_model_name large-v2
EDIT: Changed to faster-whisper
It looks great! Thanks for implementing!
No problem. ๐
I've also made a separate space for Faster Whisper, so people can try it out directly:
The only difference is that I've set "whisper_implementation" to "faster-whisper" in the config, and also updated the README and requirements.txt.
Sorry, is it also possible to add float32 to --compute_type?
Float32 is the default and int8 is the less precise version for CPU.
I just added float32
to the CLI options - try upgrading your Git repository.
It's set to "auto" by default, however, so it should pick the correct compute type depending on the hardware. But yeah, it will likely downgrade to int8 when running on CPU.
python app.py --whisper_implementation fast-whisper --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True
I think it has to be --whisper_implementation faster-whisper, instead of fast-whisper, right?
Trying it out now ๐Thanks for implementing so quickly
EDIT:
I'm not sure if i did something incorrectly, but i got this:
Repository Not Found for url: https://huggingface.co/api/models/guillaumekln/faster-whisper-large/revision/main.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
EDIT 2:
Using Large-V2, it works. (SUPER FAST ๐)
Trying it out now ๐Thanks for implementing so quickly
No problem: ๐
I think it has to be --whisper_implementation faster-whisper, instead of fast-whisper, right?
Ah, sorry, I initially called it "fast-whisper" by mistake, but I've since renamed it to "faster-whisper". I must have forgotten to update the command line in my comment.