Edit model card

Model Details

This model is a variant of the ViT architecture, specifically based on the 'vit_base_patch16_224' configuration fine-tuned for satellite image classification tasks using the EuroSAT dataset.

Model type: Vision Transformer (ViT)

Finetuned from model : "timm/vit_base_patch16_224.augreg2_in21k_ft_in1k"

Model Sources

Repository: https://github.com/chathumal93/EuroSat-RGB-Classifiers

Training Details

Training Data

The dataset comprises JPEG composite chips extracted from Sentinel-2 satellite imagery, representing the Red, Green, and Blue bands. It encompasses 27,000 labeled and geo-referenced images across 10 Land Use and Land Cover (LULC) classes

Training Procedure

Preprocessing: Standard image preprocessing including resizing, center cropping, normalization, and data augmentation techniques [RandomHorizontalFlip and RandomVerticalFlip]

Training Hyperparameters

  • Learning rate: 3e-5
  • Batch size: 64
  • Optimizer: AdamW
  • Scheduler: PolynomialLR
  • Loss: CrossEntropyLoss
  • Betas=(0.9, 0.999)
  • Weight_decay=0.01
  • Epochs: 20

Evaluation

Results

Results on test dataset at 8th epoch.

Model Phase Avg Loss Accuracy
vit-base-patch16-224-eurosat Train 0.012038 99.61%
Validation 0.023757 99.04%
Test 0.040557 98.67%
Model Accuracy Precision Recall F1
vit-base-patch16-224-eurosat 98.67% 0.98673 0.98667 0.98668
Downloads last month
21
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cm93/vit-base-patch16-224-eurosat