killawhale2
commited on
Commit
•
9442df0
1
Parent(s):
2b079b2
Update README.md
Browse files
README.md
CHANGED
@@ -50,7 +50,7 @@ filtering_task_list = [
|
|
50 |
]
|
51 |
```
|
52 |
|
53 |
-
Using the datasets mentioned above, we
|
54 |
|
55 |
[1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
|
56 |
|
|
|
50 |
]
|
51 |
```
|
52 |
|
53 |
+
Using the datasets mentioned above, we applied SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
|
54 |
|
55 |
[1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
|
56 |
|