ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published Oct 3, 2024 • 10
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 66
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation Paper • 2304.09801 • Published Apr 19, 2023
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations Paper • 2202.07800 • Published Feb 16, 2022
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training Paper • 2309.13942 • Published Sep 25, 2023 • 1
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7, 2024 • 39
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation Paper • 2312.04557 • Published Dec 7, 2023 • 12
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing Paper • 2310.05922 • Published Oct 9, 2023 • 4
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest Paper • 2307.03601 • Published Jul 7, 2023 • 12
InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language Paper • 2305.05662 • Published May 9, 2023 • 4