SELM-Zephyr
Collection
See our paper at https://huggingface.co/papers/2405.19332.
β’
5 items
β’
Updated
β’
1
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
---|---|---|
SELM-Zephyr-7B-iter-3 | β β ββ 24.00 | β β β 7.48 |
SELM-Zephyr-7B-iter-2 | β β ββ 23.40 | β β β 7.72 |
SELM-Zephyr-7B-iter-1 | β β ββ 20.28 | β β β 7.42 |
DPO-Zephyr-7B | β β ββ 14.45 | β β β 7.28 |
The following hyperparameters were used during training: