Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update src/about.py
Browse files- src/about.py +3 -0
src/about.py
CHANGED
@@ -52,7 +52,10 @@ The datasets cover various aspects of medicine such as general medical knowledge
|
|
52 |
The main evaluation metric used is Accuracy (ACC). Submit a model for automated evaluation on the "Submit" page. If you have comments or suggestions on additional medical datasets to include, please reach out to us in our discussion forum.
|
53 |
|
54 |
|
|
|
55 |
The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
|
|
|
|
|
56 |
"""
|
57 |
|
58 |
LLM_BENCHMARKS_TEXT = f"""
|
|
|
52 |
The main evaluation metric used is Accuracy (ACC). Submit a model for automated evaluation on the "Submit" page. If you have comments or suggestions on additional medical datasets to include, please reach out to us in our discussion forum.
|
53 |
|
54 |
|
55 |
+
|
56 |
The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
|
57 |
+
The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy.
|
58 |
+
|
59 |
"""
|
60 |
|
61 |
LLM_BENCHMARKS_TEXT = f"""
|