distilhubert-finetuned-gtzan

This model is a fine-tuned version of ntu-spml/distilhubert on the GTZAN dataset. It achieves the following results on the evaluation set on best epoch:

Loss: 0.7305
Accuracy: 0.9

Model description

Distilhubert is distilled version of the HuBERT and pretrained on data set with 16k frequency.
Architecture of this model is CTC or Connectionist Temporal Classification is a technique that is used with encoder-only transformer.

Training and evaluation data

Training + Evaluation data set is GTZAN which is a popular dataset of 999 songs for music genre classification.
Each song is a 30-second clip from one of 10 genres of music, spanning disco to metal.
Train set is 899 songs and Evaluation set is 100 songs remainings.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 35
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.1728	1.0	225	2.0896	0.42
1.4211	2.0	450	1.4951	0.55
1.2155	3.0	675	1.0669	0.72
1.0175	4.0	900	0.8862	0.69
0.3516	5.0	1125	0.6265	0.83
0.6135	6.0	1350	0.6485	0.78
0.0807	7.0	1575	0.6567	0.78
0.0303	8.0	1800	0.7615	0.83
0.2663	9.0	2025	0.6612	0.86
0.0026	10.0	2250	0.8354	0.85
0.0337	11.0	2475	0.6768	0.87
0.0013	12.0	2700	0.7718	0.87
0.001	13.0	2925	0.7570	0.88
0.0008	14.0	3150	0.8170	0.89
0.0006	15.0	3375	0.7920	0.89
0.0005	16.0	3600	0.9859	0.83
0.0004	17.0	3825	0.8190	0.9
0.0003	18.0	4050	0.7305	0.9
0.0003	19.0	4275	0.8025	0.88
0.0002	20.0	4500	0.8208	0.87
0.0003	21.0	4725	0.7358	0.88
0.0002	22.0	4950	0.8681	0.87
0.0002	23.0	5175	0.7831	0.9
0.0003	24.0	5400	0.8583	0.88
0.0002	25.0	5625	0.8138	0.88
0.0002	26.0	5850	0.7871	0.89
0.0002	27.0	6075	0.8893	0.88
0.0002	28.0	6300	0.8284	0.89
0.0001	29.0	6525	0.8388	0.89
0.0001	30.0	6750	0.8305	0.9
0.0001	31.0	6975	0.8377	0.88
0.0153	32.0	7200	0.8496	0.88
0.0001	33.0	7425	0.8381	0.88
0.0001	34.0	7650	0.8440	0.88
0.0001	35.0	7875	0.8458	0.88

Framework versions

Transformers 4.29.2
Pytorch 1.13.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3

WasuratS
/

distilhubert-finetuned-gtzan