-
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 29 -
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Paper • 2306.15687 • Published -
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 34 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 11
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03100
-
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 75 -
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 34 -
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 31 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 30
-
FastPitch: Parallel Text-to-speech with Pitch Prediction
Paper • 2006.06873 • Published -
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Paper • 2010.05646 • Published -
Tacotron: Towards End-to-End Speech Synthesis
Paper • 1703.10135 • Published -
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Paper • 2010.11439 • Published
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Paper • 2402.14797 • Published • 19 -
Subobject-level Image Tokenization
Paper • 2402.14327 • Published • 17 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 126 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 19