Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update src/about.py
Browse files- src/about.py +2 -1
src/about.py
CHANGED
@@ -54,7 +54,8 @@ The main evaluation metric used is Accuracy (ACC). Submit a model for automated
|
|
54 |
|
55 |
|
56 |
The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
|
57 |
-
|
|
|
58 |
|
59 |
"""
|
60 |
|
|
|
54 |
|
55 |
|
56 |
The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
|
57 |
+
|
58 |
+
The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy. Gemini results are taken from recent Clinical-NLP <a href="https://arxiv.org/abs/2402.07023">(NAACL 24) Paper</a>
|
59 |
|
60 |
"""
|
61 |
|