Text-to-Image
Diffusers
Safetensors
StableDiffusionPipeline
stable-diffusion
Inference Endpoints

[SD2.1] - Input shapes for unet model

#40
by lalith-mcw - opened

Trying to run via Openvino IR - inferencing a pixelated image currently

Input Nodes for SD2.1:
sample - [2,4,64,64],timestep [-1] and encoder_hidden_states [2,77,1024]

Still I do get the inferenced image as 512x512 since vae_decoder takes latents input of shape 512x512 and that results in a pixelated image. What are the shapes used for the above three nodes for proper inferencing
image.png

Input Nodes for SD2.1:
sample - [2,4,64,64],timestep [-1] and encoder_hidden_states [2,77,768]

With these inputs the output was proper for SD1.4 models also tried using the DPMSolverMultistepScheduler for SD2.1 still the output is the same.

Saw somewhere the encoder_hidden_states blob shape was updated ? What are the right dimensions to be used ?

Sign up or log in to comment