open_medical_llm_leaderboard

Running

aaditya commited on Apr 25

Commit

12d33c4

•

1 Parent(s): 0202f10

Update src/about.py

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -54,7 +54,8 @@ The main evaluation metric used is Accuracy (ACC). Submit a model for automated
 The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
-The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy.
 """

 The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
+The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy. Gemini results are taken from recent Clinical-NLP <a href="https://arxiv.org/abs/2402.07023">(NAACL 24) Paper</a>
 """