distilhubert-finetuned-gtzan
This model is a fine-tuned version of ntu-spml/distilhubert on the GTZAN dataset. It achieves the following results on the evaluation set on best epoch:
- Loss: 0.7305
- Accuracy: 0.9
Model description
Distilhubert is distilled version of the HuBERT and pretrained on data set with 16k frequency.
Architecture of this model is CTC or Connectionist Temporal Classification is a technique that is used with encoder-only transformer.
Training and evaluation data
Training + Evaluation data set is GTZAN which is a popular dataset of 999 songs for music genre classification.
Each song is a 30-second clip from one of 10 genres of music, spanning disco to metal.
Train set is 899 songs and Evaluation set is 100 songs remainings.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 35
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.1728 | 1.0 | 225 | 2.0896 | 0.42 |
1.4211 | 2.0 | 450 | 1.4951 | 0.55 |
1.2155 | 3.0 | 675 | 1.0669 | 0.72 |
1.0175 | 4.0 | 900 | 0.8862 | 0.69 |
0.3516 | 5.0 | 1125 | 0.6265 | 0.83 |
0.6135 | 6.0 | 1350 | 0.6485 | 0.78 |
0.0807 | 7.0 | 1575 | 0.6567 | 0.78 |
0.0303 | 8.0 | 1800 | 0.7615 | 0.83 |
0.2663 | 9.0 | 2025 | 0.6612 | 0.86 |
0.0026 | 10.0 | 2250 | 0.8354 | 0.85 |
0.0337 | 11.0 | 2475 | 0.6768 | 0.87 |
0.0013 | 12.0 | 2700 | 0.7718 | 0.87 |
0.001 | 13.0 | 2925 | 0.7570 | 0.88 |
0.0008 | 14.0 | 3150 | 0.8170 | 0.89 |
0.0006 | 15.0 | 3375 | 0.7920 | 0.89 |
0.0005 | 16.0 | 3600 | 0.9859 | 0.83 |
0.0004 | 17.0 | 3825 | 0.8190 | 0.9 |
0.0003 | 18.0 | 4050 | 0.7305 | 0.9 |
0.0003 | 19.0 | 4275 | 0.8025 | 0.88 |
0.0002 | 20.0 | 4500 | 0.8208 | 0.87 |
0.0003 | 21.0 | 4725 | 0.7358 | 0.88 |
0.0002 | 22.0 | 4950 | 0.8681 | 0.87 |
0.0002 | 23.0 | 5175 | 0.7831 | 0.9 |
0.0003 | 24.0 | 5400 | 0.8583 | 0.88 |
0.0002 | 25.0 | 5625 | 0.8138 | 0.88 |
0.0002 | 26.0 | 5850 | 0.7871 | 0.89 |
0.0002 | 27.0 | 6075 | 0.8893 | 0.88 |
0.0002 | 28.0 | 6300 | 0.8284 | 0.89 |
0.0001 | 29.0 | 6525 | 0.8388 | 0.89 |
0.0001 | 30.0 | 6750 | 0.8305 | 0.9 |
0.0001 | 31.0 | 6975 | 0.8377 | 0.88 |
0.0153 | 32.0 | 7200 | 0.8496 | 0.88 |
0.0001 | 33.0 | 7425 | 0.8381 | 0.88 |
0.0001 | 34.0 | 7650 | 0.8440 | 0.88 |
0.0001 | 35.0 | 7875 | 0.8458 | 0.88 |
Framework versions
- Transformers 4.29.2
- Pytorch 1.13.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 33
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.