Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.18869

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 10 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1 • 21
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1 • 79
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 141
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30 • 25

This collection is for Transformer Articles

about 5 hours ago

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

Paper • 2307.03712 • Published Jul 7, 2023 • 1
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4
Arcee's MergeKit: A Toolkit for Merging Large Language Models

Paper • 2403.13257 • Published Mar 20 • 19
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

about 2 hours ago

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 178
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1 • 14
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 45
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40

about 11 hours ago

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 5 days ago • 62

about 14 hours ago

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 5 days ago • 62

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published 6 days ago • 42
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 5 days ago • 62

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 5 days ago • 62

random interest papers

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published 6 days ago • 42
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Paper • 2409.14683 • Published 9 days ago • 8
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published 6 days ago • 21
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 5 days ago • 62

about 14 hours ago

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published 19 days ago • 10
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 20 days ago • 42
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 14 days ago • 68
LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published 14 days ago • 30

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 36
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 15 days ago • 26

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs