amdchess-v7 / README.md

End of training

ca52a3e verified about 1 month ago

5.7 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: amd/AMD-Llama-135m
	tags:
	- generated_from_trainer
	model-index:
	- name: amdchess-v7
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# amdchess-v7

	This model is a fine-tuned version of [amd/AMD-Llama-135m](https://huggingface.co/amd/AMD-Llama-135m) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7964

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use OptimizerNames.GROKADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 6.2092 \| 0.0030 \| 5 \| 5.2057 \|
	\| 1.9922 \| 0.0059 \| 10 \| 1.8157 \|
	\| 1.7403 \| 0.0089 \| 15 \| 1.6004 \|
	\| 1.3742 \| 0.0118 \| 20 \| 1.3543 \|
	\| 1.3517 \| 0.0148 \| 25 \| 1.2096 \|
	\| 1.215 \| 0.0177 \| 30 \| 1.1421 \|
	\| 1.2121 \| 0.0207 \| 35 \| 1.1437 \|
	\| 1.097 \| 0.0236 \| 40 \| 1.0869 \|
	\| 1.1186 \| 0.0266 \| 45 \| 1.0722 \|
	\| 1.0991 \| 0.0295 \| 50 \| 1.0526 \|
	\| 0.9758 \| 0.0325 \| 55 \| 1.0194 \|
	\| 0.9827 \| 0.0354 \| 60 \| 1.0219 \|
	\| 1.0428 \| 0.0384 \| 65 \| 0.9899 \|
	\| 0.9846 \| 0.0413 \| 70 \| 1.0065 \|
	\| 0.996 \| 0.0443 \| 75 \| 0.9968 \|
	\| 0.9908 \| 0.0472 \| 80 \| 0.9694 \|
	\| 1.0172 \| 0.0502 \| 85 \| 0.9688 \|
	\| 0.9956 \| 0.0531 \| 90 \| 0.9557 \|
	\| 0.9629 \| 0.0561 \| 95 \| 0.9466 \|
	\| 1.0187 \| 0.0590 \| 100 \| 0.9421 \|
	\| 0.9079 \| 0.0620 \| 105 \| 0.9248 \|
	\| 0.8152 \| 0.0649 \| 110 \| 0.9273 \|
	\| 0.953 \| 0.0679 \| 115 \| 0.9179 \|
	\| 0.9545 \| 0.0708 \| 120 \| 0.9109 \|
	\| 0.8649 \| 0.0738 \| 125 \| 0.9023 \|
	\| 0.9308 \| 0.0767 \| 130 \| 0.8915 \|
	\| 0.9197 \| 0.0797 \| 135 \| 0.8992 \|
	\| 0.9684 \| 0.0826 \| 140 \| 0.8931 \|
	\| 0.9329 \| 0.0856 \| 145 \| 0.8973 \|
	\| 0.8679 \| 0.0885 \| 150 \| 0.8864 \|
	\| 0.8754 \| 0.0915 \| 155 \| 0.8890 \|
	\| 0.8532 \| 0.0945 \| 160 \| 0.8793 \|
	\| 0.8818 \| 0.0974 \| 165 \| 0.8777 \|
	\| 0.9161 \| 0.1004 \| 170 \| 0.8765 \|
	\| 0.7303 \| 0.1033 \| 175 \| 0.8744 \|
	\| 0.9087 \| 0.1063 \| 180 \| 0.8697 \|
	\| 0.884 \| 0.1092 \| 185 \| 0.8648 \|
	\| 0.9259 \| 0.1122 \| 190 \| 0.8589 \|
	\| 0.866 \| 0.1151 \| 195 \| 0.8574 \|
	\| 0.8716 \| 0.1181 \| 200 \| 0.8517 \|
	\| 0.8068 \| 0.1210 \| 205 \| 0.8488 \|
	\| 0.8382 \| 0.1240 \| 210 \| 0.8478 \|
	\| 0.8372 \| 0.1269 \| 215 \| 0.8462 \|
	\| 0.8477 \| 0.1299 \| 220 \| 0.8433 \|
	\| 0.838 \| 0.1328 \| 225 \| 0.8425 \|
	\| 0.8585 \| 0.1358 \| 230 \| 0.8403 \|
	\| 0.892 \| 0.1387 \| 235 \| 0.8378 \|
	\| 0.8794 \| 0.1417 \| 240 \| 0.8360 \|
	\| 0.8468 \| 0.1446 \| 245 \| 0.8321 \|
	\| 0.8417 \| 0.1476 \| 250 \| 0.8305 \|
	\| 0.8785 \| 0.1505 \| 255 \| 0.8267 \|
	\| 0.9016 \| 0.1535 \| 260 \| 0.8258 \|
	\| 0.86 \| 0.1564 \| 265 \| 0.8243 \|
	\| 0.8777 \| 0.1594 \| 270 \| 0.8214 \|
	\| 0.6465 \| 0.1623 \| 275 \| 0.8210 \|
	\| 0.7967 \| 0.1653 \| 280 \| 0.8186 \|
	\| 0.774 \| 0.1682 \| 285 \| 0.8173 \|
	\| 0.7545 \| 0.1712 \| 290 \| 0.8162 \|
	\| 0.8684 \| 0.1741 \| 295 \| 0.8147 \|
	\| 0.7596 \| 0.1771 \| 300 \| 0.8132 \|
	\| 0.8279 \| 0.1800 \| 305 \| 0.8108 \|
	\| 0.7538 \| 0.1830 \| 310 \| 0.8087 \|
	\| 0.848 \| 0.1860 \| 315 \| 0.8075 \|
	\| 0.8526 \| 0.1889 \| 320 \| 0.8064 \|
	\| 0.8053 \| 0.1919 \| 325 \| 0.8057 \|
	\| 0.8598 \| 0.1948 \| 330 \| 0.8040 \|
	\| 0.8076 \| 0.1978 \| 335 \| 0.8026 \|
	\| 0.7292 \| 0.2007 \| 340 \| 0.8028 \|
	\| 0.8058 \| 0.2037 \| 345 \| 0.8015 \|
	\| 0.8 \| 0.2066 \| 350 \| 0.8003 \|
	\| 0.8038 \| 0.2096 \| 355 \| 0.8002 \|
	\| 0.7639 \| 0.2125 \| 360 \| 0.7998 \|
	\| 0.7838 \| 0.2155 \| 365 \| 0.7991 \|
	\| 0.8139 \| 0.2184 \| 370 \| 0.7986 \|
	\| 0.844 \| 0.2214 \| 375 \| 0.7982 \|
	\| 0.7417 \| 0.2243 \| 380 \| 0.7978 \|
	\| 0.7987 \| 0.2273 \| 385 \| 0.7975 \|
	\| 0.8319 \| 0.2302 \| 390 \| 0.7971 \|
	\| 0.7383 \| 0.2332 \| 395 \| 0.7968 \|
	\| 0.7886 \| 0.2361 \| 400 \| 0.7966 \|
	\| 0.8127 \| 0.2391 \| 405 \| 0.7965 \|
	\| 0.8213 \| 0.2420 \| 410 \| 0.7964 \|
	\| 0.7952 \| 0.2450 \| 415 \| 0.7964 \|
	\| 0.8518 \| 0.2479 \| 420 \| 0.7964 \|


	### Framework versions

	- Transformers 4.46.0
	- Pytorch 2.4.0+cu121
	- Datasets 3.0.2
	- Tokenizers 0.20.1

	---
	library_name: transformers
	license: apache-2.0
	base_model: amd/AMD-Llama-135m
	tags:
	- generated_from_trainer
	model-index:
	- name: amdchess-v7
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# amdchess-v7

	This model is a fine-tuned version of [amd/AMD-Llama-135m](https://huggingface.co/amd/AMD-Llama-135m) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7964

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use OptimizerNames.GROKADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 6.2092 \| 0.0030 \| 5 \| 5.2057 \|
	\| 1.9922 \| 0.0059 \| 10 \| 1.8157 \|
	\| 1.7403 \| 0.0089 \| 15 \| 1.6004 \|
	\| 1.3742 \| 0.0118 \| 20 \| 1.3543 \|
	\| 1.3517 \| 0.0148 \| 25 \| 1.2096 \|
	\| 1.215 \| 0.0177 \| 30 \| 1.1421 \|
	\| 1.2121 \| 0.0207 \| 35 \| 1.1437 \|
	\| 1.097 \| 0.0236 \| 40 \| 1.0869 \|
	\| 1.1186 \| 0.0266 \| 45 \| 1.0722 \|
	\| 1.0991 \| 0.0295 \| 50 \| 1.0526 \|
	\| 0.9758 \| 0.0325 \| 55 \| 1.0194 \|
	\| 0.9827 \| 0.0354 \| 60 \| 1.0219 \|
	\| 1.0428 \| 0.0384 \| 65 \| 0.9899 \|
	\| 0.9846 \| 0.0413 \| 70 \| 1.0065 \|
	\| 0.996 \| 0.0443 \| 75 \| 0.9968 \|
	\| 0.9908 \| 0.0472 \| 80 \| 0.9694 \|
	\| 1.0172 \| 0.0502 \| 85 \| 0.9688 \|
	\| 0.9956 \| 0.0531 \| 90 \| 0.9557 \|
	\| 0.9629 \| 0.0561 \| 95 \| 0.9466 \|
	\| 1.0187 \| 0.0590 \| 100 \| 0.9421 \|
	\| 0.9079 \| 0.0620 \| 105 \| 0.9248 \|
	\| 0.8152 \| 0.0649 \| 110 \| 0.9273 \|
	\| 0.953 \| 0.0679 \| 115 \| 0.9179 \|
	\| 0.9545 \| 0.0708 \| 120 \| 0.9109 \|
	\| 0.8649 \| 0.0738 \| 125 \| 0.9023 \|
	\| 0.9308 \| 0.0767 \| 130 \| 0.8915 \|
	\| 0.9197 \| 0.0797 \| 135 \| 0.8992 \|
	\| 0.9684 \| 0.0826 \| 140 \| 0.8931 \|
	\| 0.9329 \| 0.0856 \| 145 \| 0.8973 \|
	\| 0.8679 \| 0.0885 \| 150 \| 0.8864 \|
	\| 0.8754 \| 0.0915 \| 155 \| 0.8890 \|
	\| 0.8532 \| 0.0945 \| 160 \| 0.8793 \|
	\| 0.8818 \| 0.0974 \| 165 \| 0.8777 \|
	\| 0.9161 \| 0.1004 \| 170 \| 0.8765 \|
	\| 0.7303 \| 0.1033 \| 175 \| 0.8744 \|
	\| 0.9087 \| 0.1063 \| 180 \| 0.8697 \|
	\| 0.884 \| 0.1092 \| 185 \| 0.8648 \|
	\| 0.9259 \| 0.1122 \| 190 \| 0.8589 \|
	\| 0.866 \| 0.1151 \| 195 \| 0.8574 \|
	\| 0.8716 \| 0.1181 \| 200 \| 0.8517 \|
	\| 0.8068 \| 0.1210 \| 205 \| 0.8488 \|
	\| 0.8382 \| 0.1240 \| 210 \| 0.8478 \|
	\| 0.8372 \| 0.1269 \| 215 \| 0.8462 \|
	\| 0.8477 \| 0.1299 \| 220 \| 0.8433 \|
	\| 0.838 \| 0.1328 \| 225 \| 0.8425 \|
	\| 0.8585 \| 0.1358 \| 230 \| 0.8403 \|
	\| 0.892 \| 0.1387 \| 235 \| 0.8378 \|
	\| 0.8794 \| 0.1417 \| 240 \| 0.8360 \|
	\| 0.8468 \| 0.1446 \| 245 \| 0.8321 \|
	\| 0.8417 \| 0.1476 \| 250 \| 0.8305 \|
	\| 0.8785 \| 0.1505 \| 255 \| 0.8267 \|
	\| 0.9016 \| 0.1535 \| 260 \| 0.8258 \|
	\| 0.86 \| 0.1564 \| 265 \| 0.8243 \|
	\| 0.8777 \| 0.1594 \| 270 \| 0.8214 \|
	\| 0.6465 \| 0.1623 \| 275 \| 0.8210 \|
	\| 0.7967 \| 0.1653 \| 280 \| 0.8186 \|
	\| 0.774 \| 0.1682 \| 285 \| 0.8173 \|
	\| 0.7545 \| 0.1712 \| 290 \| 0.8162 \|
	\| 0.8684 \| 0.1741 \| 295 \| 0.8147 \|
	\| 0.7596 \| 0.1771 \| 300 \| 0.8132 \|
	\| 0.8279 \| 0.1800 \| 305 \| 0.8108 \|
	\| 0.7538 \| 0.1830 \| 310 \| 0.8087 \|
	\| 0.848 \| 0.1860 \| 315 \| 0.8075 \|
	\| 0.8526 \| 0.1889 \| 320 \| 0.8064 \|
	\| 0.8053 \| 0.1919 \| 325 \| 0.8057 \|
	\| 0.8598 \| 0.1948 \| 330 \| 0.8040 \|
	\| 0.8076 \| 0.1978 \| 335 \| 0.8026 \|
	\| 0.7292 \| 0.2007 \| 340 \| 0.8028 \|
	\| 0.8058 \| 0.2037 \| 345 \| 0.8015 \|
	\| 0.8 \| 0.2066 \| 350 \| 0.8003 \|
	\| 0.8038 \| 0.2096 \| 355 \| 0.8002 \|
	\| 0.7639 \| 0.2125 \| 360 \| 0.7998 \|
	\| 0.7838 \| 0.2155 \| 365 \| 0.7991 \|
	\| 0.8139 \| 0.2184 \| 370 \| 0.7986 \|
	\| 0.844 \| 0.2214 \| 375 \| 0.7982 \|
	\| 0.7417 \| 0.2243 \| 380 \| 0.7978 \|
	\| 0.7987 \| 0.2273 \| 385 \| 0.7975 \|
	\| 0.8319 \| 0.2302 \| 390 \| 0.7971 \|
	\| 0.7383 \| 0.2332 \| 395 \| 0.7968 \|
	\| 0.7886 \| 0.2361 \| 400 \| 0.7966 \|
	\| 0.8127 \| 0.2391 \| 405 \| 0.7965 \|
	\| 0.8213 \| 0.2420 \| 410 \| 0.7964 \|
	\| 0.7952 \| 0.2450 \| 415 \| 0.7964 \|
	\| 0.8518 \| 0.2479 \| 420 \| 0.7964 \|


	### Framework versions

	- Transformers 4.46.0
	- Pytorch 2.4.0+cu121
	- Datasets 3.0.2
	- Tokenizers 0.20.1