Running on CPU Upgrade 11.9k 🏆 Open LLM Leaderboard 2 Track, rank and evaluate open LLMs and chatbots
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published Oct 21 • 58
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 105
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 32
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
InternVL 2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 18 items • Updated 1 day ago • 82
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published Jun 20 • 34
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20 • 32