Model Card
Model Details
- Architecture: ViT-Base with patch size 32
- Training Data: oxford-iiit-pet dataset
Training Details
Adam Optimizer with a constant learning rate 1e-5 for 4000 steps training (batch_size=32). Only the vision encoder is fine-tuned.
Evaluation Results
- pre-trained: 0.8317149877548218
- fine-tuned: 0.9084667563438416
Usage
load vision model
from transformers import CLIPVisionModel
vision_model = CLIPVisionModel.from_pretrained('tanganke/clip-vit-base-patch32_oxford-iiit-pet')
substitute the vision encoder of clip
from transformers import CLIPModel
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_model.vision_model.load_state_dict(vision_model.vision_model.state_dict())
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tanganke/clip-vit-base-patch32_oxford-iiit-pet
Base model
openai/clip-vit-base-patch32