Spaces:

openlifescienceai
/

open_medical_llm_leaderboard

Running on CPU Upgrade

aaditya commited on Apr 25

Commit

0202f10

•

1 Parent(s): f2d9565

Update src/about.py

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -52,7 +52,10 @@ The datasets cover various aspects of medicine such as general medical knowledge
 The main evaluation metric used is Accuracy (ACC). Submit a model for automated evaluation on the "Submit" page. If you have comments or suggestions on additional medical datasets to include, please reach out to us in our discussion forum.
 The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
 """
 LLM_BENCHMARKS_TEXT = f"""

 The main evaluation metric used is Accuracy (ACC). Submit a model for automated evaluation on the "Submit" page. If you have comments or suggestions on additional medical datasets to include, please reach out to us in our discussion forum.
 The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
+The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy.
 """
 LLM_BENCHMARKS_TEXT = f"""