metadata
license: mit
datasets:
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- LDJnr/Verified-Camel
- LDJnr/Pure-Dove
- LDJnr/Capybara
- meta-math/MetaMathQA
- Intel/orca_dpo_pairs
- argilla/ultrafeedback-binarized-preferences-cleaned
Phi-2 Orange
A two-step finetune of Phi-2, with a bit of zest.
First using a collection of broad training data:
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- LDJnr/Verified-Camel
- LDJnr/Pure-Dove
- LDJnr/Capybara
- meta-math/MetaMathQA
And then a DPO finetune using:
Evaluations
Evaluations done using mlabonne's usefull Colab notebook llm-autoeval. Also check out the alternative leaderboard at Yet_Another_LLM_Leaderboard
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
phi-2-orange | 33.37 | 71.33 | 49.87 | 37.3 | 47.97 |
phi-2-dpo | 30.39 | 71.68 | 50.75 | 34.9 | 46.93 |
dolphin-2_6-phi-2 | 33.12 | 69.85 | 47.39 | 37.2 | 46.89 |
phi-2 | 27.98 | 70.8 | 44.43 | 35.21 | 44.61 |