LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published 8 days ago • 31
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 9 days ago • 44
The Open Source Advantage in Large Language Models (LLMs) Paper • 2412.12004 • Published 11 days ago • 9