Text-to-Image
Diffusers
Safetensors
StableDiffusionPipeline
stable-diffusion
Inference Endpoints

using stable-diffusion-2 for img2img with prompt

#41
by FelixDeMan - opened

Hey guys!

I've been trying to load the stable-diffusion-2 model in the img2img pipeline,

I've tried:

model_path = "stabilityai/stable-diffusion-2"

pipe_img = StableDiffusionImg2ImgPipeline.from_pretrained(
model_path,
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True
)

but this give the errors:

ValueError: Pipeline <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline'> expected {'unet', 'scheduler', 'feature_extractor', 'vae', 'safety_checker', 'text_encoder', 'tokenizer'}, but only {'unet', 'scheduler', 'vae', 'text_encoder', 'tokenizer'} were passed.

which makes me believe there is another model for this, if available?

I too am having this issue.

Where did you even figure out how to get that far?

By doing something similar to the reddit post above I am able to run it without errors but something is not set up right the images are randomly altered of the original as to be unrecognizable.

pipestart = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2",
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe = StableDiffusionImg2ImgPipeline(**pipestart.components).to("cuda")

image.png

It is further up in the code. I think this is all of it. Uplaod test1.jpg into root.

!pip install -Uq diffusers transformers fastcore

import logging
from pathlib import Path

import matplotlib.pyplot as plt
import torch
from diffusers import StableDiffusionPipeline
from fastcore.all import concat
from huggingface_hub import notebook_login
from PIL import Image

logging.disable(logging.WARNING)

torch.manual_seed(1)
if not (Path.home()/'.huggingface'/'token').exists(): notebook_login()

from diffusers import StableDiffusionImg2ImgPipeline
from fastdownload import FastDownload

def image_grid(imgs, rows, cols):
    w,h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    for i, img in enumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

pipestart = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda") #making this have .to("cuda") also made no difference.
pipe = StableDiffusionImg2ImgPipeline(**pipestart.components).to("cuda")

init_image = Image.open("test1.jpg").convert("RGB")

torch.manual_seed(1000)
prompt = "a lovely sunset"
images = pipe(prompt=prompt, num_images_per_prompt=3, init_image=init_image).images
image_grid(images, rows=1, cols=3)

Ok. I mean, not use StableDiffusionImg2ImgPipeline.from_pretrained but StableDiffusionPipeline.from_pretrained

pipestart = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda") #making this have .to("cuda") also made no difference.
pipe = StableDiffusionImg2ImgPipeline(**pipestart.components).to("cuda")

or try to use stable diffusion mega: https://github.com/huggingface/diffusers/tree/main/examples/community#stable-diffusion-mega

@curibe it made no difference.

Sign up or log in to comment