David-Xu
/

cira-7b-dpo-lora-merge

alignment-handbook

Generated from Trainer

4-bit precision

Model card Files Files and versions Metrics Training metrics Community

cira-7b-dpo-lora-merge / README.md

David-Xu's picture

Update README.md

7764fd2 verified 8 months ago

|

history blame contribute delete

3.58 kB

	---
	library_name: peft
	tags:
	- alignment-handbook
	- generated_from_trainer
	datasets:
	- David-Xu/astronomy-stack-dpo-20-percent
	base_model: meta-llama/Llama-2-7b-chat-hf
	model-index:
	- name: cira-7b-dpo-lora-merge
	results: []
	license: mit
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# cira-7b-dpo-lora-merge

	This model is a fine-tuned version of [David-Xu/llama-2-7b-cira-sft-v0.1-merge](https://huggingface.co/David-Xu/llama-2-7b-cira-sft-v0.1-merge) on the David-Xu/astronomy-stack-dpo-20-percent dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6183
	- Rewards/chosen: 0.5535
	- Rewards/rejected: 0.3385
	- Rewards/accuracies: 0.6784
	- Rewards/margins: 0.2150
	- Logps/rejected: -652.2422
	- Logps/chosen: -795.1126
	- Logits/rejected: -1.1812
	- Logits/chosen: -1.0305

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Logits/chosen \| Logits/rejected \| Logps/chosen \| Logps/rejected \| Validation Loss \| Rewards/accuracies \| Rewards/chosen \| Rewards/margins \| Rewards/rejected \|
	\|:-------------:\|:-----:\|:----:\|:-------------:\|:---------------:\|:------------:\|:--------------:\|:---------------:\|:------------------:\|:--------------:\|:---------------:\|:----------------:\|
	\| 0.6618 \| 0.11 \| 100 \| -0.8082 \| -1.0029 \| -823.6102 \| -665.3923 \| 0.6664 \| 0.6432 \| 0.2685 \| 0.0615 \| 0.2070 \|
	\| 0.6079 \| 0.22 \| 200 \| -1.0530 \| -1.2188 \| -794.3279 \| -642.6389 \| 0.6463 \| 0.6508 \| 0.5613 \| 0.1268 \| 0.4345 \|
	\| 0.6029 \| 0.33 \| 300 \| -1.0367 \| -1.1965 \| -793.2078 \| -644.8513 \| 0.6360 \| 0.6558 \| 0.5725 \| 0.1601 \| 0.4124 \|
	\| 0.6123 \| 0.45 \| 400 \| -1.1220 \| -1.2658 \| -787.7750 \| -641.9633 \| 0.6291 \| 0.6608 \| 0.6269 \| 0.1856 \| 0.4413 \|
	\| 0.5596 \| 0.56 \| 500 \| -1.0852 \| -1.2330 \| -790.7928 \| -646.7930 \| 0.6230 \| 0.6683 \| 0.5967 \| 0.2037 \| 0.3930 \|
	\| 0.5382 \| 0.67 \| 600 \| -1.0547 \| -1.2034 \| -793.2486 \| -650.0926 \| 0.6199 \| 0.6709 \| 0.5721 \| 0.2121 \| 0.3600 \|
	\| 0.5952 \| 0.78 \| 700 \| -1.0324 \| -1.1827 \| -794.9604 \| -652.0420 \| 0.6186 \| 0.6784 \| 0.5550 \| 0.2145 \| 0.3405 \|
	\| 0.5792 \| 0.89 \| 800 \| -1.0308 \| -1.1812 \| -795.125 \| -652.2705 \| 0.6182 \| 0.6784 \| 0.5534 \| 0.2151 \| 0.3382 \|


	### Framework versions

	- PEFT 0.9.0
	- Transformers 4.36.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.14.6
	- Tokenizers 0.15.2