view post Post 2497 Reply I will be giving a tutorial at AAAI 2025! Quite excited to share the recent advancements in the field and my contributions to it! Stay tuned for more updates. Link: https://x.com/EzgiKorkmazAI/status/1854525141897671111 🔥 6 6 ❤️ 2 2 🚀 1 1 🧠 1 1 +
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Paper • 2411.04989 • Published 23 days ago • 14
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Paper • 2411.04999 • Published 23 days ago • 16
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models Paper • 2411.04075 • Published 24 days ago • 15
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published 23 days ago • 27
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models Paper • 2411.05005 • Published 23 days ago • 13
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 23 days ago • 48
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 23 days ago • 48
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published 25 days ago • 25
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published 23 days ago • 69
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 23 days ago • 109