killawhale2
commited on
Commit
•
2b079b2
1
Parent(s):
6625d6d
add fine-tuning dataset details
Browse files
README.md
CHANGED
@@ -1,5 +1,12 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
# **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
|
@@ -19,11 +26,35 @@ Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness an
|
|
19 |
# **Instruction Fine-Tuning Strategy**
|
20 |
|
21 |
We utilize state-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO) [1].
|
22 |
-
Using open source datasets with Alpaca- and OpenOrca-style and generated synthetic datasets, we apply iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
# **Evaluation Results**
|
29 |
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- c-s-ale/alpaca-gpt4-data
|
5 |
+
- Open-Orca/OpenOrca
|
6 |
+
- Intel/orca_dpo_pairs
|
7 |
+
- allenai/ultrafeedback_binarized_cleaned
|
8 |
+
language:
|
9 |
+
- en
|
10 |
---
|
11 |
|
12 |
# **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
|
|
|
26 |
# **Instruction Fine-Tuning Strategy**
|
27 |
|
28 |
We utilize state-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO) [1].
|
|
|
29 |
|
30 |
+
We used a mixture of the following datasets
|
31 |
+
- c-s-ale/alpaca-gpt4-data (SFT)
|
32 |
+
- Open-Orca/OpenOrca (SFT)
|
33 |
+
- in-house generated data utilizing Metamath [2] (SFT, DPO)
|
34 |
+
- Intel/orca_dpo_pairs (DPO)
|
35 |
+
- allenai/ultrafeedback_binarized_cleaned (DPO)
|
36 |
|
37 |
+
where we were careful of data contamination by not using GSM8K samples when generating data and filtering tasks when applicable via the following list.
|
38 |
+
```python
|
39 |
+
filtering_task_list = [
|
40 |
+
'task228_arc_answer_generation_easy',
|
41 |
+
'ai2_arc/ARC-Challenge:1.0.0',
|
42 |
+
'ai2_arc/ARC-Easy:1.0.0',
|
43 |
+
'task229_arc_answer_generation_hard',
|
44 |
+
'hellaswag:1.1.0',
|
45 |
+
'task1389_hellaswag_completion',
|
46 |
+
'cot_gsm8k',
|
47 |
+
'cot_gsm8k_ii',
|
48 |
+
'drop:2.0.0',
|
49 |
+
'winogrande:1.1.0'
|
50 |
+
]
|
51 |
+
```
|
52 |
+
|
53 |
+
Using the datasets mentioned above, we apply SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
|
54 |
+
|
55 |
+
[1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
|
56 |
+
|
57 |
+
[2] Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J.T., Li, Z., Weller, A. and Liu, W., 2023. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
|
58 |
|
59 |
# **Evaluation Results**
|
60 |
|