End of training

e7f09d0 verified 7 days ago

No virus

4.59 kB

	---
	license: other
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: taide/TAIDE-LX-7B-Chat
	model-index:
	- name: ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V5

	This model is a fine-tuned version of [taide/TAIDE-LX-7B-Chat](https://huggingface.co/taide/TAIDE-LX-7B-Chat) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3726

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 4.8629 \| 0.0321 \| 100 \| 3.4698 \|
	\| 4.6474 \| 0.0643 \| 200 \| 3.2343 \|
	\| 3.8261 \| 0.0964 \| 300 \| 2.8205 \|
	\| 3.2992 \| 0.1285 \| 400 \| 2.5034 \|
	\| 2.9369 \| 0.1607 \| 500 \| 2.2639 \|
	\| 2.2674 \| 0.1928 \| 600 \| 1.9493 \|
	\| 2.1393 \| 0.2249 \| 700 \| 1.7888 \|
	\| 1.8195 \| 0.2571 \| 800 \| 1.6285 \|
	\| 1.678 \| 0.2892 \| 900 \| 1.4983 \|
	\| 1.6242 \| 0.3213 \| 1000 \| 1.4013 \|
	\| 1.344 \| 0.3535 \| 1100 \| 1.1522 \|
	\| 1.0894 \| 0.3856 \| 1200 \| 1.0704 \|
	\| 1.2033 \| 0.4177 \| 1300 \| 1.0537 \|
	\| 0.9503 \| 0.4499 \| 1400 \| 0.9160 \|
	\| 0.9901 \| 0.4820 \| 1500 \| 0.8751 \|
	\| 1.0363 \| 0.5141 \| 1600 \| 0.7942 \|
	\| 0.9986 \| 0.5463 \| 1700 \| 0.7668 \|
	\| 0.9407 \| 0.5784 \| 1800 \| 0.6912 \|
	\| 0.9347 \| 0.6105 \| 1900 \| 0.6543 \|
	\| 0.8109 \| 0.6427 \| 2000 \| 0.6498 \|
	\| 0.8848 \| 0.6748 \| 2100 \| 0.6077 \|
	\| 0.8937 \| 0.7069 \| 2200 \| 0.5865 \|
	\| 0.7895 \| 0.7391 \| 2300 \| 0.5780 \|
	\| 0.8044 \| 0.7712 \| 2400 \| 0.5646 \|
	\| 0.8317 \| 0.8033 \| 2500 \| 0.5449 \|
	\| 0.858 \| 0.8355 \| 2600 \| 0.5132 \|
	\| 0.8519 \| 0.8676 \| 2700 \| 0.4940 \|
	\| 0.7554 \| 0.8997 \| 2800 \| 0.4972 \|
	\| 0.758 \| 0.9319 \| 2900 \| 0.4809 \|
	\| 0.8866 \| 0.9640 \| 3000 \| 0.4714 \|
	\| 0.7028 \| 0.9961 \| 3100 \| 0.4608 \|
	\| 0.7031 \| 1.0283 \| 3200 \| 0.4458 \|
	\| 0.6623 \| 1.0604 \| 3300 \| 0.4427 \|
	\| 0.671 \| 1.0925 \| 3400 \| 0.4366 \|
	\| 0.6588 \| 1.1247 \| 3500 \| 0.4327 \|
	\| 0.6422 \| 1.1568 \| 3600 \| 0.4239 \|
	\| 0.643 \| 1.1889 \| 3700 \| 0.4235 \|
	\| 0.6747 \| 1.2211 \| 3800 \| 0.4204 \|
	\| 0.6911 \| 1.2532 \| 3900 \| 0.4130 \|
	\| 0.7354 \| 1.2853 \| 4000 \| 0.4092 \|
	\| 0.6233 \| 1.3175 \| 4100 \| 0.4070 \|
	\| 0.6005 \| 1.3496 \| 4200 \| 0.4055 \|
	\| 0.624 \| 1.3817 \| 4300 \| 0.4033 \|
	\| 0.623 \| 1.4139 \| 4400 \| 0.3976 \|
	\| 0.6419 \| 1.4460 \| 4500 \| 0.3966 \|
	\| 0.6329 \| 1.4781 \| 4600 \| 0.3914 \|
	\| 0.6395 \| 1.5103 \| 4700 \| 0.3934 \|
	\| 0.6541 \| 1.5424 \| 4800 \| 0.3916 \|
	\| 0.6538 \| 1.5746 \| 4900 \| 0.3917 \|
	\| 0.6214 \| 1.6067 \| 5000 \| 0.3840 \|
	\| 0.6303 \| 1.6388 \| 5100 \| 0.3844 \|
	\| 0.6547 \| 1.6710 \| 5200 \| 0.3816 \|
	\| 0.6264 \| 1.7031 \| 5300 \| 0.3844 \|
	\| 0.5896 \| 1.7352 \| 5400 \| 0.3801 \|
	\| 0.6082 \| 1.7674 \| 5500 \| 0.3786 \|
	\| 0.5772 \| 1.7995 \| 5600 \| 0.3737 \|
	\| 0.5839 \| 1.8316 \| 5700 \| 0.3745 \|
	\| 0.6201 \| 1.8638 \| 5800 \| 0.3737 \|
	\| 0.5643 \| 1.8959 \| 5900 \| 0.3717 \|
	\| 0.6258 \| 1.9280 \| 6000 \| 0.3726 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.44.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1

	---
	license: other
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: taide/TAIDE-LX-7B-Chat
	model-index:
	- name: ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V5

	This model is a fine-tuned version of [taide/TAIDE-LX-7B-Chat](https://huggingface.co/taide/TAIDE-LX-7B-Chat) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3726

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 4.8629 \| 0.0321 \| 100 \| 3.4698 \|
	\| 4.6474 \| 0.0643 \| 200 \| 3.2343 \|
	\| 3.8261 \| 0.0964 \| 300 \| 2.8205 \|
	\| 3.2992 \| 0.1285 \| 400 \| 2.5034 \|
	\| 2.9369 \| 0.1607 \| 500 \| 2.2639 \|
	\| 2.2674 \| 0.1928 \| 600 \| 1.9493 \|
	\| 2.1393 \| 0.2249 \| 700 \| 1.7888 \|
	\| 1.8195 \| 0.2571 \| 800 \| 1.6285 \|
	\| 1.678 \| 0.2892 \| 900 \| 1.4983 \|
	\| 1.6242 \| 0.3213 \| 1000 \| 1.4013 \|
	\| 1.344 \| 0.3535 \| 1100 \| 1.1522 \|
	\| 1.0894 \| 0.3856 \| 1200 \| 1.0704 \|
	\| 1.2033 \| 0.4177 \| 1300 \| 1.0537 \|
	\| 0.9503 \| 0.4499 \| 1400 \| 0.9160 \|
	\| 0.9901 \| 0.4820 \| 1500 \| 0.8751 \|
	\| 1.0363 \| 0.5141 \| 1600 \| 0.7942 \|
	\| 0.9986 \| 0.5463 \| 1700 \| 0.7668 \|
	\| 0.9407 \| 0.5784 \| 1800 \| 0.6912 \|
	\| 0.9347 \| 0.6105 \| 1900 \| 0.6543 \|
	\| 0.8109 \| 0.6427 \| 2000 \| 0.6498 \|
	\| 0.8848 \| 0.6748 \| 2100 \| 0.6077 \|
	\| 0.8937 \| 0.7069 \| 2200 \| 0.5865 \|
	\| 0.7895 \| 0.7391 \| 2300 \| 0.5780 \|
	\| 0.8044 \| 0.7712 \| 2400 \| 0.5646 \|
	\| 0.8317 \| 0.8033 \| 2500 \| 0.5449 \|
	\| 0.858 \| 0.8355 \| 2600 \| 0.5132 \|
	\| 0.8519 \| 0.8676 \| 2700 \| 0.4940 \|
	\| 0.7554 \| 0.8997 \| 2800 \| 0.4972 \|
	\| 0.758 \| 0.9319 \| 2900 \| 0.4809 \|
	\| 0.8866 \| 0.9640 \| 3000 \| 0.4714 \|
	\| 0.7028 \| 0.9961 \| 3100 \| 0.4608 \|
	\| 0.7031 \| 1.0283 \| 3200 \| 0.4458 \|
	\| 0.6623 \| 1.0604 \| 3300 \| 0.4427 \|
	\| 0.671 \| 1.0925 \| 3400 \| 0.4366 \|
	\| 0.6588 \| 1.1247 \| 3500 \| 0.4327 \|
	\| 0.6422 \| 1.1568 \| 3600 \| 0.4239 \|
	\| 0.643 \| 1.1889 \| 3700 \| 0.4235 \|
	\| 0.6747 \| 1.2211 \| 3800 \| 0.4204 \|
	\| 0.6911 \| 1.2532 \| 3900 \| 0.4130 \|
	\| 0.7354 \| 1.2853 \| 4000 \| 0.4092 \|
	\| 0.6233 \| 1.3175 \| 4100 \| 0.4070 \|
	\| 0.6005 \| 1.3496 \| 4200 \| 0.4055 \|
	\| 0.624 \| 1.3817 \| 4300 \| 0.4033 \|
	\| 0.623 \| 1.4139 \| 4400 \| 0.3976 \|
	\| 0.6419 \| 1.4460 \| 4500 \| 0.3966 \|
	\| 0.6329 \| 1.4781 \| 4600 \| 0.3914 \|
	\| 0.6395 \| 1.5103 \| 4700 \| 0.3934 \|
	\| 0.6541 \| 1.5424 \| 4800 \| 0.3916 \|
	\| 0.6538 \| 1.5746 \| 4900 \| 0.3917 \|
	\| 0.6214 \| 1.6067 \| 5000 \| 0.3840 \|
	\| 0.6303 \| 1.6388 \| 5100 \| 0.3844 \|
	\| 0.6547 \| 1.6710 \| 5200 \| 0.3816 \|
	\| 0.6264 \| 1.7031 \| 5300 \| 0.3844 \|
	\| 0.5896 \| 1.7352 \| 5400 \| 0.3801 \|
	\| 0.6082 \| 1.7674 \| 5500 \| 0.3786 \|
	\| 0.5772 \| 1.7995 \| 5600 \| 0.3737 \|
	\| 0.5839 \| 1.8316 \| 5700 \| 0.3745 \|
	\| 0.6201 \| 1.8638 \| 5800 \| 0.3737 \|
	\| 0.5643 \| 1.8959 \| 5900 \| 0.3717 \|
	\| 0.6258 \| 1.9280 \| 6000 \| 0.3726 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.44.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1