princeton-nlp/SWE-bench
Viewer
•
Updated
•
21.5k
•
30.2k
•
79
SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues.