End of training

a431732 verified 3 months ago

4.44 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: myBit-Llama2-jp-127M-4
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# myBit-Llama2-jp-127M-4

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.0920

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0024
	- train_batch_size: 96
	- eval_batch_size: 96
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: polynomial
	- lr_scheduler_warmup_steps: 5000
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|
	\| 4.6932 \| 0.02 \| 2000 \| 3.3504 \|
	\| 3.252 \| 0.03 \| 4000 \| 3.1987 \|
	\| 3.1379 \| 0.05 \| 6000 \| 3.0873 \|
	\| 3.0466 \| 0.06 \| 8000 \| 3.0233 \|
	\| 2.9925 \| 0.08 \| 10000 \| 2.9819 \|
	\| 2.9553 \| 0.1 \| 12000 \| 2.9471 \|
	\| 2.9292 \| 0.11 \| 14000 \| 2.9278 \|
	\| 2.9158 \| 0.13 \| 16000 \| 2.9159 \|
	\| 2.907 \| 0.15 \| 18000 \| 2.9084 \|
	\| 2.9018 \| 0.16 \| 20000 \| 2.9015 \|
	\| 2.8945 \| 0.18 \| 22000 \| 2.8971 \|
	\| 2.8901 \| 0.19 \| 24000 \| 2.9014 \|
	\| 2.8906 \| 0.21 \| 26000 \| 2.8980 \|
	\| 2.8943 \| 0.23 \| 28000 \| 2.9010 \|
	\| 2.8985 \| 0.24 \| 30000 \| 2.9165 \|
	\| 3.0191 \| 0.26 \| 32000 \| 3.3484 \|
	\| 3.5616 \| 0.28 \| 34000 \| 3.4516 \|
	\| 3.2849 \| 0.29 \| 36000 \| 3.0454 \|
	\| 3.2425 \| 0.31 \| 38000 \| 3.7183 \|
	\| 3.655 \| 0.32 \| 40000 \| 3.8947 \|
	\| 3.3151 \| 0.34 \| 42000 \| 3.6150 \|
	\| 3.3482 \| 0.36 \| 44000 \| 3.1714 \|
	\| 3.1433 \| 0.37 \| 46000 \| 3.1073 \|
	\| 3.0462 \| 0.39 \| 48000 \| 2.9786 \|
	\| 3.0889 \| 0.41 \| 50000 \| 3.3002 \|
	\| 3.4652 \| 0.42 \| 52000 \| 3.3920 \|
	\| 3.3726 \| 0.44 \| 54000 \| 3.1293 \|
	\| 3.2314 \| 0.45 \| 56000 \| 3.3841 \|
	\| 3.5303 \| 0.47 \| 58000 \| 3.3865 \|
	\| 3.2828 \| 0.49 \| 60000 \| 3.2591 \|
	\| 3.0219 \| 0.5 \| 62000 \| 2.9431 \|
	\| 3.0714 \| 0.52 \| 64000 \| 3.2328 \|
	\| 3.1354 \| 0.54 \| 66000 \| 3.0794 \|
	\| 3.2194 \| 0.55 \| 68000 \| 3.1326 \|
	\| 3.394 \| 0.57 \| 70000 \| 3.5974 \|
	\| 3.2692 \| 0.58 \| 72000 \| 3.1522 \|
	\| 3.1513 \| 0.6 \| 74000 \| 3.1398 \|
	\| 3.2473 \| 0.62 \| 76000 \| 3.1921 \|
	\| 3.1717 \| 0.63 \| 78000 \| 3.1827 \|
	\| 3.211 \| 0.65 \| 80000 \| 3.0845 \|
	\| 2.9955 \| 0.67 \| 82000 \| 3.0229 \|
	\| 3.3145 \| 0.68 \| 84000 \| 3.3382 \|
	\| 3.0703 \| 0.7 \| 86000 \| 3.5395 \|
	\| 3.234 \| 0.71 \| 88000 \| 2.9486 \|
	\| 3.1077 \| 0.73 \| 90000 \| 2.9488 \|
	\| 3.1097 \| 0.75 \| 92000 \| 2.9597 \|
	\| 2.8979 \| 0.76 \| 94000 \| 3.0215 \|
	\| 3.236 \| 0.78 \| 96000 \| 3.1758 \|
	\| 3.1365 \| 0.8 \| 98000 \| 3.4841 \|
	\| 3.1954 \| 0.81 \| 100000 \| 2.9520 \|
	\| 3.2054 \| 0.83 \| 102000 \| 3.6384 \|
	\| 3.2957 \| 0.84 \| 104000 \| 2.9212 \|
	\| 2.9358 \| 0.86 \| 106000 \| 3.0166 \|
	\| 3.221 \| 0.88 \| 108000 \| 3.3753 \|
	\| 3.2241 \| 0.89 \| 110000 \| 3.0858 \|
	\| 3.1497 \| 0.91 \| 112000 \| 2.9252 \|
	\| 3.198 \| 0.93 \| 114000 \| 3.8514 \|
	\| 3.1427 \| 0.94 \| 116000 \| 4.1130 \|
	\| 3.2371 \| 0.96 \| 118000 \| 2.8639 \|
	\| 3.2576 \| 0.97 \| 120000 \| 2.9192 \|
	\| 3.3229 \| 0.99 \| 122000 \| 3.0920 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.15.2

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: myBit-Llama2-jp-127M-4
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# myBit-Llama2-jp-127M-4

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.0920

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0024
	- train_batch_size: 96
	- eval_batch_size: 96
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: polynomial
	- lr_scheduler_warmup_steps: 5000
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|
	\| 4.6932 \| 0.02 \| 2000 \| 3.3504 \|
	\| 3.252 \| 0.03 \| 4000 \| 3.1987 \|
	\| 3.1379 \| 0.05 \| 6000 \| 3.0873 \|
	\| 3.0466 \| 0.06 \| 8000 \| 3.0233 \|
	\| 2.9925 \| 0.08 \| 10000 \| 2.9819 \|
	\| 2.9553 \| 0.1 \| 12000 \| 2.9471 \|
	\| 2.9292 \| 0.11 \| 14000 \| 2.9278 \|
	\| 2.9158 \| 0.13 \| 16000 \| 2.9159 \|
	\| 2.907 \| 0.15 \| 18000 \| 2.9084 \|
	\| 2.9018 \| 0.16 \| 20000 \| 2.9015 \|
	\| 2.8945 \| 0.18 \| 22000 \| 2.8971 \|
	\| 2.8901 \| 0.19 \| 24000 \| 2.9014 \|
	\| 2.8906 \| 0.21 \| 26000 \| 2.8980 \|
	\| 2.8943 \| 0.23 \| 28000 \| 2.9010 \|
	\| 2.8985 \| 0.24 \| 30000 \| 2.9165 \|
	\| 3.0191 \| 0.26 \| 32000 \| 3.3484 \|
	\| 3.5616 \| 0.28 \| 34000 \| 3.4516 \|
	\| 3.2849 \| 0.29 \| 36000 \| 3.0454 \|
	\| 3.2425 \| 0.31 \| 38000 \| 3.7183 \|
	\| 3.655 \| 0.32 \| 40000 \| 3.8947 \|
	\| 3.3151 \| 0.34 \| 42000 \| 3.6150 \|
	\| 3.3482 \| 0.36 \| 44000 \| 3.1714 \|
	\| 3.1433 \| 0.37 \| 46000 \| 3.1073 \|
	\| 3.0462 \| 0.39 \| 48000 \| 2.9786 \|
	\| 3.0889 \| 0.41 \| 50000 \| 3.3002 \|
	\| 3.4652 \| 0.42 \| 52000 \| 3.3920 \|
	\| 3.3726 \| 0.44 \| 54000 \| 3.1293 \|
	\| 3.2314 \| 0.45 \| 56000 \| 3.3841 \|
	\| 3.5303 \| 0.47 \| 58000 \| 3.3865 \|
	\| 3.2828 \| 0.49 \| 60000 \| 3.2591 \|
	\| 3.0219 \| 0.5 \| 62000 \| 2.9431 \|
	\| 3.0714 \| 0.52 \| 64000 \| 3.2328 \|
	\| 3.1354 \| 0.54 \| 66000 \| 3.0794 \|
	\| 3.2194 \| 0.55 \| 68000 \| 3.1326 \|
	\| 3.394 \| 0.57 \| 70000 \| 3.5974 \|
	\| 3.2692 \| 0.58 \| 72000 \| 3.1522 \|
	\| 3.1513 \| 0.6 \| 74000 \| 3.1398 \|
	\| 3.2473 \| 0.62 \| 76000 \| 3.1921 \|
	\| 3.1717 \| 0.63 \| 78000 \| 3.1827 \|
	\| 3.211 \| 0.65 \| 80000 \| 3.0845 \|
	\| 2.9955 \| 0.67 \| 82000 \| 3.0229 \|
	\| 3.3145 \| 0.68 \| 84000 \| 3.3382 \|
	\| 3.0703 \| 0.7 \| 86000 \| 3.5395 \|
	\| 3.234 \| 0.71 \| 88000 \| 2.9486 \|
	\| 3.1077 \| 0.73 \| 90000 \| 2.9488 \|
	\| 3.1097 \| 0.75 \| 92000 \| 2.9597 \|
	\| 2.8979 \| 0.76 \| 94000 \| 3.0215 \|
	\| 3.236 \| 0.78 \| 96000 \| 3.1758 \|
	\| 3.1365 \| 0.8 \| 98000 \| 3.4841 \|
	\| 3.1954 \| 0.81 \| 100000 \| 2.9520 \|
	\| 3.2054 \| 0.83 \| 102000 \| 3.6384 \|
	\| 3.2957 \| 0.84 \| 104000 \| 2.9212 \|
	\| 2.9358 \| 0.86 \| 106000 \| 3.0166 \|
	\| 3.221 \| 0.88 \| 108000 \| 3.3753 \|
	\| 3.2241 \| 0.89 \| 110000 \| 3.0858 \|
	\| 3.1497 \| 0.91 \| 112000 \| 2.9252 \|
	\| 3.198 \| 0.93 \| 114000 \| 3.8514 \|
	\| 3.1427 \| 0.94 \| 116000 \| 4.1130 \|
	\| 3.2371 \| 0.96 \| 118000 \| 2.8639 \|
	\| 3.2576 \| 0.97 \| 120000 \| 2.9192 \|
	\| 3.3229 \| 0.99 \| 122000 \| 3.0920 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.15.2