faster inference
The xtts v2 is good for the quatity of the voice, but the inference speed is a little slow. I am developing a speech translate software and want the tts can be inference less than 500 ms on a T4 GPU. So can you reference a half precise version or faster inference engine like onnx or ctranslate2?
Inference speed first latency is ~0.2 seconds if you are using it with deepspeed (it is faster than onnx).
https://huggingface.co/spaces/coqui/xtts
Latency to first audio chunk: 212 milliseconds
Real-time factor (RTF): 0.25
You can squeeze may be %2-4 faster more with covering your inference code with torch.float16 autocasting (but that will slightly affect output quality , you may or may not notice depending on your need)
It would be greatly appreciated if you could provide the source code, I need deepspeed and half precise both.
Could you please provide onnx or trt to speed up the model inference time? It would be greatly appreciated