samim2024
/

clip

Zero-Shot Image Classification

Inference Endpoints

Model card Files Files and versions Community

clip / README.md

samim2024's picture

Update README.md

2b35ef1 verified 5 months ago

|

history blame contribute delete

1.3 kB

	---
	tags:
	- vision

	library_name: transformers
	---


	## Model Details

	### The CLIP model was pretrained from openai/clip-vit-base-patch32 , to learn about what contributes to robustness in computer vision tasks.
	### The model has the ability to generalize to arbitrary image classification tasks in a zero-shot manner.


	Top predictions:

	Saree: 64.89%
	Dupatta: 25.81%
	Lehenga: 7.51%
	Leggings and Salwar: 0.84%
	Women Kurta: 0.44%

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/660bc03b5294ca0aada80fb9/Kl8Yd8fwFLtmeDbBLi4Fz.png)




	### Use with Transformers

	```python3
	from PIL import Image
	import requests

	from transformers import CLIPProcessor, CLIPModel

	model = CLIPModel.from_pretrained("samim2024/clip")
	processor = CLIPProcessor.from_pretrained("samim2024/clip")

	url = "https://www.istockphoto.com/photo/indian-saris-gm93355119-10451468"
	image = Image.open(requests.get(url, stream=True).raw)

	inputs = processor(text=["a photo of a saree", "a photo of a blouse"], images=image, return_tensors="pt", padding=True)

	outputs = model(**inputs)
	logits_per_image = outputs.logits_per_image # this is the image-text similarity score
	probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
	```