-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 54 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 14 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2402.08093
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 53 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 54 -
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 34 -
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Paper • 2406.05370 • Published • 14
-
metavoiceio/metavoice-1B-v0.1
Text-to-Speech • Updated • 2.21k • 763 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 54 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 188 -
SWivid/F5-TTS
Text-to-Speech • Updated • 430k • 701