Step-Controlled DPO
Collection
Models and Datasets of Step-Controlled DPO.
•
6 items
•
Updated
•
2
This model is a fine-tuned version of Mistral-7B-v0.1. It achieves the following results on the evaluation set:
This is a model fine-tuned for mathematical problem-solving.
The model is intended for solving math problems.
The following hyperparameters were used during training: