TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 16 days ago • 47
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published Oct 21, 2024 • 22
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents Paper • 2310.11667 • Published Oct 18, 2023 • 2
A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest Paper • 2311.10614 • Published Nov 17, 2023
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1, 2024 • 10
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents Paper • 2403.08715 • Published Mar 13, 2024 • 20
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 24
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue Paper • 2210.04443 • Published Oct 10, 2022
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements Paper • 2306.01985 • Published Jun 3, 2023 • 1
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents Paper • 2310.11667 • Published Oct 18, 2023 • 2
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 24
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation Paper • 1810.10147 • Published Oct 24, 2018
FewRel 2.0: Towards More Challenging Few-Shot Relation Classification Paper • 1910.07124 • Published Oct 16, 2019