|
--- |
|
license: apache-2.0 |
|
pipeline_tag: zero-shot-image-classification |
|
library_name: openclip |
|
--- |
|
|
|
# LongCLIP model |
|
|
|
This repository contains the weights of the LongCLIP model. |
|
|
|
Paper: https://huggingface.co/papers/2403.15378 |
|
|
|
Github repository: https://github.com/beichenzbc/long-clip |
|
|
|
## Installation |
|
|
|
```bash |
|
git clone https://github.com/beichenzbc/Long-CLIP.git |
|
cd Long-CLIP |
|
``` |
|
|
|
## Usage |
|
|
|
``` |
|
from model import longclip |
|
import torch |
|
from PIL import Image |
|
from huggingface_hub import hf_hub_download |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
filepath = hf_hub_download(repo_id="BeichenZhang/LongCLIP-L-336px", filename="longclip-L@336px.pt") |
|
model, preprocess = longclip.load(filepath, device=device) |
|
|
|
text = longclip.tokenize(["A man is crossing the street with a red car parked nearby.", "A man is driving a car in an urban scene."]).to(device) |
|
image = preprocess(Image.open("./img/demo.png")).unsqueeze(0).to(device) |
|
|
|
with torch.no_grad(): |
|
image_features = model.encode_image(image) |
|
text_features = model.encode_text(text) |
|
|
|
logits_per_image = image_features @ text_features.T |
|
probs = logits_per_image.softmax(dim=-1).cpu().numpy() |
|
|
|
print("Label probs:", probs) |
|
``` |