Step-Controlled DPO
Collection
Models and Datasets of Step-Controlled DPO.
•
6 items
•
Updated
•
2
This model is a fine-tuned version of the InternLM2-20B model using SFT and SCDPO. It achieves the following results on the evaluation set:
This is a model fine-tuned for mathematical problem-solving.
The model is intended for solving math problems.
gsm8k | math | ape | cmath | mgsm_zh | |
---|---|---|---|---|---|
InternLM2-SFT | 86.4 | 55.8 | 77.1 | 88.4 | 74.8 |
InternLM2-SFT-DPO | 87 | 57.6 | 78.7 | 89.9 | 76 |
InternLM2-SFT-DPO (data-equal) | 88.2 | 57.5 | 78.8 | 89.3 | 76 |
InternLM2-SFT-SCDPO | 88.5 | 58.1 | 79.3 | 90.3 | 80.4 |
The following hyperparameters were used during training: