Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.06885

Zero-Shot Voice Cloning

TTS models that support zero-shot voice cloning

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Paper • 2409.00750 • Published Sep 1 • 2
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published 29 days ago • 40
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

Paper • 2409.10058 • Published Sep 16 • 1
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26 • 18

Papers - Flow Matching

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published 21 days ago • 86
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published 29 days ago • 40
Flow Matching for Generative Modeling

Paper • 2210.02747 • Published Oct 6, 2022 • 1
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 11

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published 29 days ago • 40

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published 29 days ago • 40
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26 • 18

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published 29 days ago • 40

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22 • 33
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22 • 14
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56

Papers - Audio - TTS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 2
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Paper • 2403.16973 • Published Mar 25 • 2
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 3
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 183
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Paper • 2401.11053 • Published Jan 19 • 9
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published 29 days ago • 40

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs