--- license: apache-2.0 pipeline_tag: zero-shot-image-classification library_name: openclip --- # LongCLIP model This repository contains the weights of the LongCLIP model. Paper: https://huggingface.co/papers/2403.15378 Github repository: https://github.com/beichenzbc/long-clip ## Installation ```bash git clone https://github.com/beichenzbc/Long-CLIP.git cd Long-CLIP ``` ## Usage ``` from model import longclip import torch from PIL import Image from huggingface_hub import hf_hub_download device = "cuda" if torch.cuda.is_available() else "cpu" filepath = hf_hub_download(repo_id="BeichenZhang/LongCLIP-L-336px", filename="longclip-L@336px.pt") model, preprocess = longclip.load(filepath, device=device) text = longclip.tokenize(["A man is crossing the street with a red car parked nearby.", "A man is driving a car in an urban scene."]).to(device) image = preprocess(Image.open("./img/demo.png")).unsqueeze(0).to(device) with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image = image_features @ text_features.T probs = logits_per_image.softmax(dim=-1).cpu().numpy() print("Label probs:", probs) ```