phi-2-orange / README.md

Updates to evaluations from Yet Another LLM Leaderboard results

fa70f50 verified 10 months ago

2.12 kB

	---
	license: mit
	datasets:
	- Open-Orca/SlimOrca-Dedup
	- migtissera/Synthia-v1.3
	- LDJnr/Verified-Camel
	- LDJnr/Pure-Dove
	- LDJnr/Capybara
	- meta-math/MetaMathQA
	- Intel/orca_dpo_pairs
	- argilla/ultrafeedback-binarized-preferences-cleaned
	---
	![Phi-2 Orange](https://huggingface.co/rhysjones/phi-2-orange/resolve/main/phi-2-orange.jpg)

	# Phi-2 Orange

	A two-step finetune of Phi-2, with a bit of zest.

	First using a collection of broad training data:

	- [Open-Orca/SlimOrca-Dedup](https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup)
	- [migtissera/Synthia-v1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3)
	- [LDJnr/Verified-Camel](https://huggingface.co/datasets/LDJnr/Verified-Camel)
	- [LDJnr/Pure-Dove](https://huggingface.co/datasets/LDJnr/Pure-Dove)
	- [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
	- [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)

	And then a DPO finetune using:

	- [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
	- [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)

	# Evaluations
	Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval).
	Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|----------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)\| 33.37\| 71.33\| 49.87\| 37.3\| 47.97\|
	\|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)\| 30.39\| 71.68\| 50.75\| 34.9\| 46.93\|
	\|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)\| 33.12\| 69.85\| 47.39\| 37.2\| 46.89\|
	\|[phi-2](https://huggingface.co/microsoft/phi-2)\| 27.98\| 70.8\| 44.43\| 35.21\| 44.61\|