stackoverflow_tag_classification/modernBERT_vs_Deberta/ModernBERT-base/whimsical-crane-88

4835159 verified 7 days ago

3.12 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	model-index:
	- name: whimsical-crane-88
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# whimsical-crane-88

	This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3017
	- Hamming Loss: 0.1008
	- Zero One Loss: 0.8638
	- Jaccard Score: 0.8619
	- Hamming Loss Optimised: 0.1005
	- Hamming Loss Threshold: 0.4669
	- Zero One Loss Optimised: 0.7662
	- Zero One Loss Threshold: 0.2371
	- Jaccard Score Optimised: 0.7019
	- Jaccard Score Threshold: 0.1424

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5.6029632367762424e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 2024
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.8378996868939299,0.8949526992950398) and epsilon=1e-07 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Hamming Loss \| Zero One Loss \| Jaccard Score \| Hamming Loss Optimised \| Hamming Loss Threshold \| Zero One Loss Optimised \| Zero One Loss Threshold \| Jaccard Score Optimised \| Jaccard Score Threshold \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------------:\|:-------------:\|:-------------:\|:----------------------:\|:----------------------:\|:-----------------------:\|:-----------------------:\|:-----------------------:\|:-----------------------:\|
	\| No log \| 1.0 \| 100 \| 0.3418 \| 0.1123 \| 0.99 \| 0.99 \| 0.1121 \| 0.5223 \| 0.8287 \| 0.2215 \| 0.7749 \| 0.1699 \|
	\| No log \| 2.0 \| 200 \| 0.3238 \| 0.1098 \| 0.965 \| 0.9644 \| 0.1069 \| 0.4149 \| 0.7863 \| 0.2162 \| 0.7400 \| 0.1647 \|
	\| No log \| 3.0 \| 300 \| 0.3082 \| 0.1021 \| 0.8775 \| 0.8762 \| 0.1014 \| 0.4735 \| 0.7688 \| 0.2369 \| 0.7141 \| 0.1721 \|
	\| No log \| 4.0 \| 400 \| 0.3017 \| 0.1008 \| 0.8638 \| 0.8619 \| 0.1005 \| 0.4669 \| 0.7662 \| 0.2371 \| 0.7019 \| 0.1424 \|


	### Framework versions

	- PEFT 0.13.2
	- Transformers 4.48.0.dev0
	- Pytorch 2.5.1+cu124
	- Datasets 3.1.0
	- Tokenizers 0.21.0