Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published 3 days ago • 21
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published 3 days ago • 30
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published 8 days ago • 44
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published 15 days ago • 29
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation Paper • 2410.18565 • Published 15 days ago • 42
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published 16 days ago • 48
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models Paper • 2410.13370 • Published 22 days ago • 36
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published 17 days ago • 65
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors Paper • 2410.16271 • Published 17 days ago • 80
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes Paper • 2410.17249 • Published 16 days ago • 39
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published 16 days ago • 43
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published 16 days ago • 86
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published 28 days ago • 44
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published Sep 5 • 25
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30 • 35
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Paper • 2408.14819 • Published Aug 27 • 19
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 116