-
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published -
Mapping Natural Language Commands to Web Elements
Paper • 1808.09132 • Published -
Learning to Navigate the Web
Paper • 1812.09195 • Published -
Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations
Paper • 1909.00031 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2410.23218
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
-
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
Paper • 2410.22476 • Published • 24 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 43 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 20
-
MobA: A Two-Level Agent System for Efficient Mobile Task Automation
Paper • 2410.13757 • Published • 30 -
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Paper • 2410.08164 • Published • 24 -
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Paper • 2408.15978 • Published -
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Paper • 2409.17140 • Published
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 38 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 117 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 47 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 41
-
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Paper • 2407.15762 • Published • 9 -
HuggingFaceTB/SmolLM-135M
Text Generation • Updated • 40.6k • 170 -
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model
Paper • 2408.10198 • Published • 32 -
fishaudio/fish-speech-1.4
Text-to-Speech • Updated • 9.09k • 410
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 30
-
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Paper • 2311.09278 • Published • 7 -
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Paper • 2406.11736 • Published • 4 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 43
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 13 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 24 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 29 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 5