reciprocate
commited on
Commit
•
f224fe9
1
Parent(s):
03b5bc9
update mt bench plot & clean up minor typos
Browse files
README.md
CHANGED
@@ -80,29 +80,27 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
80 |
1. SFT Datasets
|
81 |
- HuggingFaceH4/ultrachat_200k
|
82 |
- meta-math/MetaMathQA
|
83 |
-
-
|
84 |
- Open-Orca/SlimOrca
|
85 |
2. Preference Datasets:
|
86 |
- HuggingFaceH4/ultrafeedback_binarized
|
87 |
- Intel/orca_dpo_pairs
|
88 |
|
89 |
-
|
90 |
-
### Training Procedure
|
91 |
-
|
92 |
## Performance
|
93 |
|
94 |
### MT-Bench and Alpaca Bench
|
95 |
|
96 |
-
|
|
|
97 |
|
98 |
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
99 |
|-------------|-----|----|---------------|--------------|
|
100 |
| **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
|
101 |
-
|
|
102 |
| Capybara v1.9 | 3B | dSFT | 5.94 | - |
|
103 |
| MPT-Chat | 7B |dSFT |5.42| -|
|
104 |
-
| Xwin-
|
105 |
-
| Mistral-
|
106 |
| Zephyr-7b-α |7B| dDPO| 6.88| -|
|
107 |
| Zephyr-7b-β| 7B | dDPO | 7.34 | 90.60 |
|
108 |
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|
|
|
80 |
1. SFT Datasets
|
81 |
- HuggingFaceH4/ultrachat_200k
|
82 |
- meta-math/MetaMathQA
|
83 |
+
- WizardLM/WizardLM_evol_instruct_V2_196k
|
84 |
- Open-Orca/SlimOrca
|
85 |
2. Preference Datasets:
|
86 |
- HuggingFaceH4/ultrafeedback_binarized
|
87 |
- Intel/orca_dpo_pairs
|
88 |
|
|
|
|
|
|
|
89 |
## Performance
|
90 |
|
91 |
### MT-Bench and Alpaca Bench
|
92 |
|
93 |
+
|
94 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6310474ca119d49bc1eb0d80/8WIZS6dAlu5kSH-382pMl.png" alt="mt_bench_plot" width="600"/>
|
95 |
|
96 |
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
97 |
|-------------|-----|----|---------------|--------------|
|
98 |
| **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
|
99 |
+
| StableLM Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
|
100 |
| Capybara v1.9 | 3B | dSFT | 5.94 | - |
|
101 |
| MPT-Chat | 7B |dSFT |5.42| -|
|
102 |
+
| Xwin-LM v0.1 | 7B| dPPO| 6.19| 87.83|
|
103 |
+
| Mistral-Instruct v0.1 | 7B| - | 6.84 |-|
|
104 |
| Zephyr-7b-α |7B| dDPO| 6.88| -|
|
105 |
| Zephyr-7b-β| 7B | dDPO | 7.34 | 90.60 |
|
106 |
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|