Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction Paper • 2411.14762 • Published 6 days ago • 10
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published 15 days ago • 25
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published 15 days ago • 26
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published 16 days ago • 21
Scaling Properties of Diffusion Models for Perceptual Tasks Paper • 2411.08034 • Published 15 days ago • 13
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings Paper • 2411.08017 • Published 15 days ago • 11
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published about 1 month ago • 75
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated about 15 hours ago • 181