---
license: apache-2.0
language:
- en
---


# =====CLIP-ViT-L-14-448px-MedICaT-ROCO=====

## Pretrained Biomed CLIP model with higher resolution. Suitable for many medical downstream tasks. 

**Dataset**: MedICaT-200k, ROCO-80k

**Base model**: [https://huggingface.co/ryanyip7777/pmc_vit_l_14]

**Training config**:  
img-size: 448  
lr: 1.024e-6  
epoch: 6  
batchsize: 16

**Benchmark**: ROCO-validation-8785samples

| model                       | clip_val_loss | image_to_text_mean_rank | image_to_text_R@10 | text_to_image_mean_rank | text_to_image_R@10 |
|-----------------------------|---------------|-------------------------|--------------------|-------------------------|--------------------|
| pmc_vit_l_14                | 0.6886        | 41.4641                 | 0.6263             | 54.4236                 | 0.6410             |
| CLIP-ViT-L-14-448px-MedICaT-ROCO | 0.3266        | 34.4018                 | 0.6748             | 42.0458                 | 0.6791             |

We use code base from open_clip[https://github.com/mlfoundations/open_clip]  
Add personal configs in path **./open_clip-main/src/open_clip/model_configs** to load this model  

```
import torch
from PIL import Image
import open_clip

model, _ , preprocess = open_clip.create_model_and_transforms('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')
tokenizer = open_clip.get_tokenizer('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')

image = preprocess(Image.open("xray.png")).unsqueeze(0)
text = tokenizer(["xray", "CT", "MRI"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)
```