tianlinliu0121
/

zephyr-7b-dpo-full-debug-regression

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

zephyr-7b-dpo-full-debug-regression / README.md

tianlinliu0121's picture

Model save

1af6181 12 months ago

|

history blame contribute delete

4.05 kB

	---
	license: mit
	base_model: HuggingFaceH4/mistral-7b-sft-beta
	tags:
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-dpo-full-debug-regression
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-full-debug-regression

	This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7240
	- Rewards/chosen: -4.3843
	- Rewards/rejected: -7.9101
	- Rewards/accuracies: 0.7640
	- Rewards/margins: 3.5258
	- Logps/rejected: -311.4621
	- Logps/chosen: -319.5667
	- Logits/rejected: -2.4790
	- Logits/chosen: -2.5088

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 32
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.533 \| 0.26 \| 500 \| 0.5084 \| -0.1902 \| -1.3680 \| 0.7780 \| 1.1778 \| -246.0413 \| -277.6251 \| -2.9319 \| -2.9487 \|
	\| 0.4907 \| 0.52 \| 1000 \| 0.5234 \| -0.3346 \| -1.8153 \| 0.7620 \| 1.4807 \| -250.5139 \| -279.0693 \| -2.8401 \| -2.8442 \|
	\| 0.4388 \| 0.77 \| 1500 \| 0.5202 \| -0.7856 \| -2.2720 \| 0.7920 \| 1.4864 \| -255.0812 \| -283.5798 \| -2.7420 \| -2.7444 \|
	\| 0.0651 \| 1.03 \| 2000 \| 0.5049 \| -1.0044 \| -2.8702 \| 0.7860 \| 1.8658 \| -261.0635 \| -285.7675 \| -2.7335 \| -2.7412 \|
	\| 0.0887 \| 1.29 \| 2500 \| 0.5946 \| -1.9888 \| -3.9256 \| 0.7480 \| 1.9368 \| -271.6175 \| -295.6113 \| -2.5940 \| -2.6173 \|
	\| 0.0747 \| 1.55 \| 3000 \| 0.5748 \| -1.9590 \| -4.0271 \| 0.7560 \| 2.0681 \| -272.6327 \| -295.3135 \| -2.4969 \| -2.5205 \|
	\| 0.101 \| 1.81 \| 3500 \| 0.5783 \| -1.9521 \| -4.1853 \| 0.7680 \| 2.2332 \| -274.2144 \| -295.2442 \| -2.5069 \| -2.5278 \|
	\| 0.0195 \| 2.07 \| 4000 \| 0.6253 \| -2.9322 \| -5.7633 \| 0.7600 \| 2.8310 \| -289.9938 \| -305.0455 \| -2.4935 \| -2.5158 \|
	\| 0.0191 \| 2.32 \| 4500 \| 0.7215 \| -4.2183 \| -7.6216 \| 0.7620 \| 3.4034 \| -308.5774 \| -317.9060 \| -2.4756 \| -2.5036 \|
	\| 0.0105 \| 2.58 \| 5000 \| 0.7341 \| -4.2607 \| -7.7440 \| 0.7600 \| 3.4833 \| -309.8016 \| -318.3306 \| -2.5156 \| -2.5437 \|
	\| 0.0092 \| 2.84 \| 5500 \| 0.7330 \| -4.3756 \| -7.9435 \| 0.7600 \| 3.5679 \| -311.7966 \| -319.4794 \| -2.4856 \| -2.5149 \|


	### Framework versions

	- Transformers 4.35.0
	- Pytorch 2.1.0+cu118
	- Datasets 2.14.6
	- Tokenizers 0.14.1