ViT distilled to MobileNet

This model is a distilled model, where teacher model is merve/beans-vit-224, fine-tuned google/vit-base-patch16-224-in21k on the beans dataset. Student model is randomly initialized MobileNetV2. It achieves the following results on the evaluation set:

  • Loss: 0.5922
  • Accuracy: 0.7266

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 25

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.9217 1.0 130 1.0079 0.3835
0.8973 2.0 260 0.8349 0.4286
0.7912 3.0 390 0.8905 0.5414
0.7151 4.0 520 1.1400 0.4887
0.6797 5.0 650 4.5343 0.4135
0.6471 6.0 780 2.1551 0.3985
0.5989 7.0 910 0.8552 0.6090
0.6252 8.0 1040 1.7453 0.5489
0.6025 9.0 1170 0.7852 0.6466
0.5643 10.0 1300 1.4728 0.6090
0.5505 11.0 1430 1.1570 0.6015
0.5207 12.0 1560 3.2526 0.4436
0.4957 13.0 1690 0.6617 0.6541
0.4935 14.0 1820 0.7502 0.6241
0.4836 15.0 1950 1.2039 0.5338
0.4648 16.0 2080 1.0283 0.5338
0.4662 17.0 2210 0.6695 0.7293
0.4351 18.0 2340 0.8694 0.5940
0.4286 19.0 2470 1.2751 0.4737
0.4166 20.0 2600 0.8719 0.6241
0.4263 21.0 2730 0.8767 0.6015
0.4261 22.0 2860 1.2780 0.5564
0.4124 23.0 2990 1.4095 0.5940
0.4082 24.0 3120 0.9104 0.6015
0.3923 25.0 3250 0.6430 0.7068

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.5
  • Tokenizers 0.14.1
Downloads last month
20
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train merve/vit-mobilenet-beans-224

Evaluation results