chujiezheng
commited on
Commit
•
00293b3
1
Parent(s):
3683640
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ The extrapolated (ExPO) model based on [`shenzhi-wang/Llama3-8B-Chinese-Chat`](h
|
|
11 |
|
12 |
Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.
|
13 |
|
14 |
-
**Note:** This is an experimental model, as I have not comprehensively evaluated its Chinese ability.
|
15 |
|
16 |
## Evaluation Results
|
17 |
|
|
|
11 |
|
12 |
Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.
|
13 |
|
14 |
+
**Note:** This is an experimental model, as I have not comprehensively evaluated its Chinese ability. **Unexpected issues may occur when we apply extrapolation to the DPO/RLHF alignment training for new languages (e.g., Chinese).**
|
15 |
|
16 |
## Evaluation Results
|
17 |
|