reciprocate commited on
Commit
f224fe9
1 Parent(s): 03b5bc9

update mt bench plot & clean up minor typos

Browse files
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -80,29 +80,27 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
80
  1. SFT Datasets
81
  - HuggingFaceH4/ultrachat_200k
82
  - meta-math/MetaMathQA
83
- - Wizard Dataset
84
  - Open-Orca/SlimOrca
85
  2. Preference Datasets:
86
  - HuggingFaceH4/ultrafeedback_binarized
87
  - Intel/orca_dpo_pairs
88
 
89
-
90
- ### Training Procedure
91
-
92
  ## Performance
93
 
94
  ### MT-Bench and Alpaca Bench
95
 
96
- <img src="https://cdn-uploads.huggingface.co/production/uploads/6310474ca119d49bc1eb0d80/jwpbBHzdCkHm0rMvPUVxC.png" alt="mt_bench_plot" width="600"/>
 
97
 
98
  | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
99
  |-------------|-----|----|---------------|--------------|
100
  | **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
101
- | Stable Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
102
  | Capybara v1.9 | 3B | dSFT | 5.94 | - |
103
  | MPT-Chat | 7B |dSFT |5.42| -|
104
- | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
105
- | Mistral-Instructv0.1 | 7B| - | 6.84 |-|
106
  | Zephyr-7b-α |7B| dDPO| 6.88| -|
107
  | Zephyr-7b-β| 7B | dDPO | 7.34 | 90.60 |
108
  | Falcon-Instruct | 40B |dSFT |5.17 |45.71|
 
80
  1. SFT Datasets
81
  - HuggingFaceH4/ultrachat_200k
82
  - meta-math/MetaMathQA
83
+ - WizardLM/WizardLM_evol_instruct_V2_196k
84
  - Open-Orca/SlimOrca
85
  2. Preference Datasets:
86
  - HuggingFaceH4/ultrafeedback_binarized
87
  - Intel/orca_dpo_pairs
88
 
 
 
 
89
  ## Performance
90
 
91
  ### MT-Bench and Alpaca Bench
92
 
93
+
94
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6310474ca119d49bc1eb0d80/8WIZS6dAlu5kSH-382pMl.png" alt="mt_bench_plot" width="600"/>
95
 
96
  | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
97
  |-------------|-----|----|---------------|--------------|
98
  | **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
99
+ | StableLM Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
100
  | Capybara v1.9 | 3B | dSFT | 5.94 | - |
101
  | MPT-Chat | 7B |dSFT |5.42| -|
102
+ | Xwin-LM v0.1 | 7B| dPPO| 6.19| 87.83|
103
+ | Mistral-Instruct v0.1 | 7B| - | 6.84 |-|
104
  | Zephyr-7b-α |7B| dDPO| 6.88| -|
105
  | Zephyr-7b-β| 7B | dDPO | 7.34 | 90.60 |
106
  | Falcon-Instruct | 40B |dSFT |5.17 |45.71|