Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,12 @@ And then a DPO finetune using:
|
|
20 |
- [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
|
21 |
- [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
|
22 |
|
23 |
-
#
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
20 |
- [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
|
21 |
- [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
|
22 |
|
23 |
+
# Evaluations
|
24 |
+
Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval).
|
25 |
+
Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
|
26 |
+
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
27 |
+
|----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
28 |
+
|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)| **33.29**| 71.39| 49.9| 37.14| **47.93**|
|
29 |
+
|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)| 30.39| **71.68**| **50.75**| 34.9| 46.93|
|
30 |
+
|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)| 33.12| 69.85| 47.39| **37.2**| 46.89|
|
31 |
+
|[phi-2](https://huggingface.co/microsoft/phi-2)| 27.98| 70.8| 44.43| 35.21| 44.61|
|