aaditya commited on
Commit
12d33c4
1 Parent(s): 0202f10

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +2 -1
src/about.py CHANGED
@@ -54,7 +54,8 @@ The main evaluation metric used is Accuracy (ACC). Submit a model for automated
54
 
55
 
56
  The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
57
- The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy.
 
58
 
59
  """
60
 
 
54
 
55
 
56
  The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
57
+
58
+ The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy. Gemini results are taken from recent Clinical-NLP <a href="https://arxiv.org/abs/2402.07023">(NAACL 24) Paper</a>
59
 
60
  """
61