-
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Paper • 2409.00750 • Published • 2 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 40 -
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Paper • 2409.10058 • Published • 1 -
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Paper • 2406.18009 • Published • 18
Collections
Discover the best community collections!
Collections including paper arxiv:2410.06885
-
Movie Gen: A Cast of Media Foundation Models
Paper • 2410.13720 • Published • 86 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 40 -
Flow Matching for Generative Modeling
Paper • 2210.02747 • Published • 1 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 62 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 33 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 14 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 56
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 2 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 3 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 183 -
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Paper • 2401.11053 • Published • 9 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 40