BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published 8 days ago • 17