Using IP-Adapters sdxl with sdxl-turbo? Not working?
#21
by
Zubi401
- opened
I'm trying to use ip adapters with sdxl_turbo (which seem to both have a sdxl 1.0 checkpoint as their base). I'm doing the following:
pipe = DiffusionPipeline.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
adapter_id = "ip-adapter-plus-face_sdxl_vit-h.bin"
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=adapter_id)
this gives me the following error:
KeyError Traceback (most recent call last)
Cell In[14], line 2
1 adapter_id = "ip-adapter-plus-face_sdxl_vit-h.bin"
----> 2 pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=adapter_id)
File ~/miniconda3/envs/sd_diff/lib/python3.10/site-packages/diffusers/loaders/ip_adapter.py:152, in IPAdapterMixin.load_ip_adapter(self, pretrained_model_name_or_path_or_dict, subfolder, weight_name, **kwargs)
149 self.feature_extractor = CLIPImageProcessor()
151 # load ip-adapter into unet
--> 152 self.unet._load_ip_adapter_weights(state_dict)
File ~/miniconda3/envs/sd_diff/lib/python3.10/site-packages/diffusers/loaders/unet.py:711, in UNet2DConditionLoadersMixin._load_ip_adapter_weights(self, state_dict)
708 self.set_attn_processor(attn_procs)
710 # create image projection layers.
--> 711 clip_embeddings_dim = state_dict["image_proj"]["proj.weight"].shape[-1]
712 cross_attention_dim = state_dict["image_proj"]["proj.weight"].shape[0] // 4
714 image_projection = ImageProjection(
715 cross_attention_dim=cross_attention_dim, image_embed_dim=clip_embeddings_dim, num_image_text_embeds=4
716 )
KeyError: 'proj.weight'
I've also tried setting the image encoder to the one mentioned in the IP-Adapters repo, but it still gives the same error. Any idea how to fix? Or is SDXL_turbo not supported yet?
e.g.
#I've also tried the non-large version but still get the same result...
pipe.image_encoder = CLIPVisionModelWithProjection.from_pretrained("laion/CLIP-ViT-bigG-14-laion2B-39B-b160k", ignore_mismatched_sizes=True)
Use the CLIP from the models
folder. It is VIT-H. https://huggingface.co/h94/IP-Adapter/tree/main/models/image_encoder . It should work with all the vit-h sdxl ones. You need to do the same on comfy as well.