chaoweihuang
/

FactAlign-LLaMA-3-8B

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

FactAlign-LLaMA-3-8B / README.md

chaoweihuang's picture

Update README.md

78a4bb6 verified about 1 month ago

|

3.87 kB

	---
	license: llama3
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	tags:
	- alignment-handbook
	- generated_from_trainer
	datasets:
	- trl-lib/kto-mix-14k
	- chaoweihuang/lf-response-llama3-f1_100_0.8-fg0.5
	model-index:
	- name: kto-mix-14k-lf-response-llama3-f1_100_0.8-fg0.5-fgudw4.0-kto-fg
	results: []
	---

	# FactAlign-LLaMA-3-8B
	This model is aligned with our FactAlign framework for improved long-form factuality, from [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).

	For more information, please refer to our paper: [FactAlign: Long-form Factuality Alignment of Large Language Models](https://huggingface.co/papers/2410.01691).


	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data


	This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the trl-lib/kto-mix-14k and the chaoweihuang/lf-response-llama3-f1_100_0.8-fg0.5 datasets.
	It achieves the following results on the evaluation set:
	- Loss: 0.4110
	- Rewards/chosen: 1.7360
	- Logps/chosen: -336.0412
	- Rewards/rejected: -2.2628
	- Logps/rejected: -406.1173
	- Rewards/margins: 3.9987
	- Kl: 0.0141
	- Fg Rewards/chosen Sum: -1.5560
	- Fg Logps/policy Chosen: -6.7332
	- Fg Logps/reference Chosen: -6.0419
	- Count/fg Chosen: 30.1832
	- Fg Rewards/rejected Sum: -0.9033
	- Fg Logps/policy Rejected: -8.6269
	- Fg Logps/reference Rejected: -7.5807
	- Count/fg Rejected: 6.9239
	- Fg Logps/policy Kl: -14.7946
	- Fg Logps/reference Kl: -11.4736
	- Fg Kl: nan
	- Fg Loss: 0.7625


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Logps/chosen \| Rewards/rejected \| Logps/rejected \| Rewards/margins \| Kl \| Fg Rewards/chosen Sum \| Fg Logps/policy Chosen \| Fg Logps/reference Chosen \| Count/fg Chosen \| Fg Rewards/rejected Sum \| Fg Logps/policy Rejected \| Fg Logps/reference Rejected \| Count/fg Rejected \| Fg Logps/policy Kl \| Fg Logps/reference Kl \| Fg Kl \| Fg Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:------------:\|:----------------:\|:--------------:\|:---------------:\|:------:\|:---------------------:\|:----------------------:\|:-------------------------:\|:---------------:\|:-----------------------:\|:------------------------:\|:---------------------------:\|:-----------------:\|:------------------:\|:---------------------:\|:-----:\|:-------:\|
	\| 0.4478 \| 0.4103 \| 400 \| 0.4325 \| 1.3169 \| -340.2313 \| -1.7364 \| -400.8539 \| 3.0534 \| 0.0280 \| -1.3939 \| -6.6287 \| -6.0419 \| 30.1832 \| -0.6768 \| -8.3632 \| -7.5807 \| 6.9239 \| -13.6783 \| -11.4736 \| nan \| 0.7654 \|
	\| 0.4043 \| 0.8205 \| 800 \| 0.4110 \| 1.7360 \| -336.0412 \| -2.2628 \| -406.1173 \| 3.9987 \| 0.0141 \| -1.5560 \| -6.7332 \| -6.0419 \| 30.1832 \| -0.9033 \| -8.6269 \| -7.5807 \| 6.9239 \| -14.7946 \| -11.4736 \| nan \| 0.7625 \|


	### Framework versions

	- Transformers 4.41.1
	- Pytorch 2.3.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1