CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published Oct 21 • 58
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16 • 13
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24 • 41
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models Paper • 2407.15415 • Published Jul 22 • 1
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
Improving Pixel-based MIM by Reducing Wasted Modeling Capability Paper • 2308.00261 • Published Aug 1, 2023 • 2