--- license: apache-2.0 language: - en --- # =====CLIP-ViT-L-14-448px-MedICaT-ROCO===== ## Pretrained Biomed CLIP model with higher resolution. Suitable for many medical downstream tasks. **Dataset**: MedICaT-200k, ROCO-80k **Base model**: [https://huggingface.co/ryanyip7777/pmc_vit_l_14] **Training config**: img-size: 448 lr: 1.024e-6 epoch: 6 batchsize: 16 **Benchmark**: ROCO-validation-8785samples | model | clip_val_loss | image_to_text_mean_rank | image_to_text_R@10 | text_to_image_mean_rank | text_to_image_R@10 | |-----------------------------|---------------|-------------------------|--------------------|-------------------------|--------------------| | pmc_vit_l_14 | 0.6886 | 41.4641 | 0.6263 | 54.4236 | 0.6410 | | CLIP-ViT-L-14-448px-MedICaT-ROCO | 0.3266 | 34.4018 | 0.6748 | 42.0458 | 0.6791 | We use code base from open_clip[https://github.com/mlfoundations/open_clip] Add personal configs in path **./open_clip-main/src/open_clip/model_configs** to load this model ``` import torch from PIL import Image import open_clip model, _ , preprocess = open_clip.create_model_and_transforms('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO') tokenizer = open_clip.get_tokenizer('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO') image = preprocess(Image.open("xray.png")).unsqueeze(0) text = tokenizer(["xray", "CT", "MRI"]) with torch.no_grad(), torch.cuda.amp.autocast(): image_features = model.encode_image(image) text_features = model.encode_text(text) image_features /= image_features.norm(dim=-1, keepdim=True) text_features /= text_features.norm(dim=-1, keepdim=True) text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1) print("Label probs:", text_probs) ```