Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17 • 29
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 1 day ago • 46
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published 5 days ago • 38
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published 11 days ago • 46
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 6 days ago • 36
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 12 days ago • 60
view article Article AIGS: Generating Science from AI-Powered Automated Falsification By mikelabs • 5 days ago • 2
LLäMmlein Chat Preview 🐑 Collection https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/ • 8 items • Updated 5 days ago • 9
view article Article Unlock the Power of AI in Your Browser with Transformers.js By luigi12345 • 9 days ago • 2
view article Article Understanding the Algorithm of Thoughts: A Heuristic Approach Beyond LLMs By TuringsSolutions • 8 days ago • 2
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • 8 days ago • 83
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 12 days ago • 102
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published 13 days ago • 66
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published 15 days ago • 59
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated about 10 hours ago • 181
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 88