Text-to-Video - a gary109 Collection

gary109 's Collections

video segmentation

LLM

Representations

Robot

Vision Transformers

Diffusion Model

ML

RLHF

Image Completion

Others

Auto

Vision-Language

Cost

Semantic Segmentation

Video Generation

Code Generation

ASR

Whisper

AGI

Funny

music

SVC

yolo

生成式AI導論 2024

Text-to-Embedding

RAG

OCR

Audio

Text-to-Video

updated Aug 26

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

Paper • 2309.03549 • Published Sep 7, 2023 • 5
CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper • 2309.16496 • Published Sep 28, 2023 • 9
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Paper • 2310.11440 • Published Oct 17, 2023 • 15
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

Paper • 2310.10769 • Published Oct 16, 2023 • 8
Video Language Planning

Paper • 2310.10625 • Published Oct 16, 2023 • 9
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling

Paper • 2310.15169 • Published Oct 23, 2023 • 9
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Paper • 2310.20700 • Published Oct 31, 2023 • 9
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 13
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 56
Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14 • 27
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19
Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27 • 16
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Paper • 2406.08845 • Published Jun 13 • 8

Note 本論文提出的T2VHE協議大大提高了T2V模型評估的可靠性、重現性和實用性，並承諾開源所有評估流程和代碼，以促進社群內的模型評估和改進。
Vivid-ZOO: Multi-View Video Generation with Diffusion Model

Paper • 2406.08659 • Published Jun 12 • 8

Note 研究者採用了擴散模型，將T2MVid生成問題分解為視角空間和時間組件，並利用預訓練的多視角圖像和2D視頻擴散模型層來確保視頻的多視角一致性和時間連續性。引入對齊模塊解決了由於2D和多視角數據之間的領域差異引起的層不兼容問題。此外，還貢獻了一個新的多視角視頻數據集。
MotionBooth: Motion-Aware Customized Text-to-Video Generation

Paper • 2406.17758 • Published Jun 25 • 18
Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 26
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35
T3M: Text Guided 3D Human Motion Synthesis from Speech

Paper • 2408.12885 • Published Aug 23 • 10