FactAlign
Collection
Models and datasets of our EMNLP 2024 paper "FactAlign: Long-form Factuality Alignment of Large Language Models"
•
7 items
•
Updated
•
1
This model is aligned with our FactAlign framework for improved long-form factuality, from meta-llama/Meta-Llama-3-8B-Instruct.
For more information, please refer to our paper: FactAlign: Long-form Factuality Alignment of Large Language Models.
More information needed
More information needed
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the trl-lib/kto-mix-14k and the chaoweihuang/lf-response-llama3-f1_100_0.8-fg0.5 datasets. It achieves the following results on the evaluation set:
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Rewards/rejected | Logps/rejected | Rewards/margins | Kl | Fg Rewards/chosen Sum | Fg Logps/policy Chosen | Fg Logps/reference Chosen | Count/fg Chosen | Fg Rewards/rejected Sum | Fg Logps/policy Rejected | Fg Logps/reference Rejected | Count/fg Rejected | Fg Logps/policy Kl | Fg Logps/reference Kl | Fg Kl | Fg Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4478 | 0.4103 | 400 | 0.4325 | 1.3169 | -340.2313 | -1.7364 | -400.8539 | 3.0534 | 0.0280 | -1.3939 | -6.6287 | -6.0419 | 30.1832 | -0.6768 | -8.3632 | -7.5807 | 6.9239 | -13.6783 | -11.4736 | nan | 0.7654 |
0.4043 | 0.8205 | 800 | 0.4110 | 1.7360 | -336.0412 | -2.2628 | -406.1173 | 3.9987 | 0.0141 | -1.5560 | -6.7332 | -6.0419 | 30.1832 | -0.9033 | -8.6269 | -7.5807 | 6.9239 | -14.7946 | -11.4736 | nan | 0.7625 |