Papers-Benchmarks - a sugatoray Collection

sugatoray 's Collections

Bookmark::Models

LLMs

AV LLMs

LLM Training Datasets

Papers

Leaderboards 🔥

Papers-Fundamentals

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLM + Datasets : Finance

Papers-Benchmarks

updated 7 days ago

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12 • 15
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13 • 24
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26 • 32
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10 • 25
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published 16 days ago • 20
opendatalab/OmniDocBench

Viewer • Updated 1 day ago • 984 • 2.63k • 19
Sleeping

3

🥇

OmniEval
RUC-NLPIR/OmniEval-AutoGen-Dataset

Updated 8 days ago • 49 • 1