LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Paper • 2410.10783 • Published Oct 14 • 25
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification Paper • 2410.05057 • Published Oct 7 • 7
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community Paper • 2408.08291 • Published Aug 15 • 10
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community Paper • 2408.08291 • Published Aug 15 • 10
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31 • 10
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation Paper • 2407.13696 • Published Jul 18 • 5
Lots-of-LoRAs/task717_mmmlu_answer_generation_logical_fallacies Viewer • Updated Jul 16 • 170 • 184