metadata
license: apache-2.0
ViT Fine-tuned on Stanford Car Dataset
Base model: https://huggingface.co/google/vit-base-patch16-224
This achieves around 86% on the testing set, you can use it as a baseline for further tuning.
Dataset Description
The Stanford car dataset contains 16,185 images of 196 classes of cars. Classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe. The data is split into 8144 training images, 6,041 testing images, and 2000 validation images in this case.
** Please note: this dataset does not contain newer car models **
Using the Model in the Transformer Library
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("therealcyberlord/stanford-car-vit-patch16")
model = AutoModelForImageClassification.from_pretrained("therealcyberlord/stanford-car-vit-patch16")
Citations
3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.