-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 38 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 47 -
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Paper • 2407.18219 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2409.18869
-
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper • 2406.16860 • Published • 55 -
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 65 -
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper • 2407.12580 • Published • 38 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 62
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 11 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 30