@hexgrad on Hugging Face: "`self.brag():` Kokoro finally got 300 votes in…"

hexgrad

posted an update 2 days ago

Post

2553

self.brag(): Kokoro finally got 300 votes in Pendrokar/TTS-Spaces-Arena after @Pendrokar was kind enough to add it 3 weeks ago.
Discounting the small sample size of votes, I think it is safe to say that hexgrad/Kokoro-TTS is currently a top 3 model among the contenders in that Arena. This is notable because:
- At 82M params, Kokoro is one of the smaller models in the Arena
- MeloTTS has 52M params
- F5 TTS has 330M params
- XTTSv2 has 467M params

Pendrokar

2 days ago

The original Arena's threshold is at 700 votes. But I am sure Kokoro will hold the position. The voice quality actually sounds close to ElevenLabs.

But StyleTTS usually is not very emotional. So it will fail where Edge TTS does. The phrases where the voice has to be sad or angry. For example Parler Expresso was overly jolly.

hexgrad

2 days ago

The voice quality actually sounds close to ElevenLabs.

I might've mentioned this elsewhere, but if you plug Kokoro outputs for named ElevenLabs voices into https://elevenlabs.io/ai-speech-classifier you should get very reliable positives (98% confident generated by ElevenLabs).

By ear, I think Kokoro is indeed close to ElevenLabs, especially on certain voices. For Nicole, they are indistinguishable to me. Michael is pretty close; Adam is still somewhat weak.

But StyleTTS usually is not very emotional.

I agree. Kokoro also has 2 specific issues in this area: (1) little to no emotional audio seen during training, and (2) even if there was, the stock voices are average style vectors over 10-100 samples, creating an average/neutral style anyway.

Delta-Vector

2 days ago

Kokoro TTS is quite possibly the best TTS model i've used thus far, I hope weights are released soon!

abhinandan1111

about 23 hours ago

bro can you suggest some lightweight models for Natural TTS

Join the conversation