Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 14
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 42
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models Paper • 2404.02747 • Published Apr 3 • 11
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published Apr 3 • 20
Prompt-to-Prompt Image Editing with Cross Attention Control Paper • 2208.01626 • Published Aug 2, 2022 • 2
HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral Denoising Paper • 2404.09697 • Published Apr 15 • 1
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models Paper • 2404.09204 • Published Apr 14 • 10
MultiBooth: Towards Generating All Your Concepts in an Image from Text Paper • 2404.14239 • Published Apr 22 • 8
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference Paper • 2404.15420 • Published Apr 23 • 7
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 71
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30 • 22