SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues.
Princeton NLP group
princeton-nlp
AI & ML interests
None yet
Recent Activity
updated
a model
about 1 month ago
princeton-nlp/Llama-3-8B-ProLong-512k-Instruct
updated
a model
about 1 month ago
princeton-nlp/Llama-3-8B-ProLong-512k-Base
updated
a model
about 1 month ago
princeton-nlp/Llama-3-8B-ProLong-64k-Instruct
Organizations
models
259
princeton-nlp/Llama-3-8B-ProLong-512k-Instruct
Updated
•
24k
•
17
princeton-nlp/Llama-3-8B-ProLong-512k-Base
Updated
•
181
•
6
princeton-nlp/Llama-3-8B-ProLong-64k-Instruct
Text Generation
•
Updated
•
2.8k
•
13
princeton-nlp/Llama-3-8B-ProLong-64k-Base
Text Generation
•
Updated
•
3.08k
•
5
princeton-nlp/Mistral-7B-Base-SFT-CPO
Text Generation
•
Updated
•
3k
•
1
princeton-nlp/Mistral-7B-Base-SFT-RRHF
Text Generation
•
Updated
•
2.99k
princeton-nlp/gemma-2-9b-it-SimPO
Text Generation
•
Updated
•
17.5k
•
126
princeton-nlp/gemma-2-9b-it-DPO
Text Generation
•
Updated
•
2.62k
•
5
princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2
Text Generation
•
Updated
•
3.06k
•
5
princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2
Text Generation
•
Updated
•
2.6k
•
1
datasets
44
princeton-nlp/SWE-bench
Viewer
•
Updated
•
21.5k
•
31.9k
•
81
princeton-nlp/prolong-ultrachat-64K
Preview
•
Updated
•
101
princeton-nlp/HELMET
Viewer
•
Updated
•
516
•
90
•
4
princeton-nlp/SWE-bench_Multimodal
Viewer
•
Updated
•
619
•
261
•
7
princeton-nlp/prolong-data-64K
Updated
•
18.6k
•
10
princeton-nlp/prolong-data-512K
Updated
•
8.45k
•
3
princeton-nlp/CharXiv
Viewer
•
Updated
•
2.32k
•
638
•
31
princeton-nlp/SWE-bench_Verified
Viewer
•
Updated
•
500
•
46.6k
•
115
princeton-nlp/gemma2-ultrafeedback-armorm
Viewer
•
Updated
•
61.5k
•
531
•
35
princeton-nlp/llama3-ultrafeedback-armorm
Viewer
•
Updated
•
61.8k
•
821
•
15