faster inference

#10

by LukeJacob2023 - opened Nov 18, 2023

LukeJacob2023

Nov 18, 2023

The xtts v2 is good for the quatity of the voice, but the inference speed is a little slow. I am developing a speech translate software and want the tts can be inference less than 500 ms on a T4 GPU. So can you reference a half precise version or faster inference engine like onnx or ctranslate2?

gorkemgoknar

Coqui.ai org Nov 18, 2023

Inference speed first latency is ~0.2 seconds if you are using it with deepspeed (it is faster than onnx).
https://huggingface.co/spaces/coqui/xtts

Latency to first audio chunk: 212 milliseconds
Real-time factor (RTF): 0.25

You can squeeze may be %2-4 faster more with covering your inference code with torch.float16 autocasting (but that will slightly affect output quality , you may or may not notice depending on your need)

gorkemgoknar changed discussion status to closed Nov 18, 2023

LukeJacob2023

Nov 18, 2023

It would be greatly appreciated if you could provide the source code, I need deepspeed and half precise both.

home56200

Nov 20, 2023

Could you please provide onnx or trt to speed up the model inference time? It would be greatly appreciated

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment