video - a GEONTT Collection

Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

GEONTT 's Collections

base

3D

LLM

audio

video

image

RAG

video

updated Aug 21

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1 • 31
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3 • 26

Note https://deaddawn.github.io/MovieLLM/
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8 • 18
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11 • 28
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 27
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

Paper • 2403.08268 • Published Mar 13 • 15
Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14 • 21
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 13
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Paper • 2403.12962 • Published Mar 19 • 7

Note 换脸方案
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Paper • 2403.12365 • Published Mar 19 • 10
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20 • 77
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Paper • 2403.14148 • Published Mar 21 • 18
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Paper • 2403.14773 • Published Mar 21 • 10
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 22
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

Paper • 2403.17005 • Published Mar 25 • 13
LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27 • 18
Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1 • 11
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Paper • 2404.05014 • Published Apr 7 • 53
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8 • 20
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Paper • 2404.04421 • Published Apr 5 • 16
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 35
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Paper • 2404.19759 • Published Apr 30 • 24
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Paper • 2403.06098 • Published Mar 10 • 15

Note prompt-video dataset
iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30 • 10
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 15
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23
Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10 • 22
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8 • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10 • 50
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32
Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12 • 13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11 • 14
Training-free Camera Control for Video Generation

Paper • 2406.10126 • Published Jun 14 • 12
Vivid-ZOO: Multi-View Video Generation with Diffusion Model

Paper • 2406.08659 • Published Jun 12 • 8
VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18 • 29
Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24 • 32
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24 • 28
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Paper • 2407.01519 • Published Jul 1 • 22
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Paper • 2407.02398 • Published Jul 2 • 14
Compositional Video Generation as Flow Equalization

Paper • 2407.06182 • Published Jun 10 • 12
VIMI: Grounding Video Generation through Multi-modal Instruction

Paper • 2407.06304 • Published Jul 8 • 9
VEnhancer: Generative Space-Time Enhancement for Video Generation

Paper • 2407.07667 • Published Jul 10 • 12
SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11 • 22
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

Paper • 2407.08701 • Published Jul 11 • 10
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 39
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 27
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 47
Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

Paper • 2408.00762 • Published Aug 1 • 9

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs