Areas masked as NOT for inpaint are nonetheless altered, why is this?

#14

by spejamas - opened Oct 20, 2023

Oct 20, 2023

Take this image I send for inpainting (or in this case, outpainting):

Using this mask:

This inpainting pipeline, instantiated as demonstrated in the model card pipe = AutoPipelineForInpainting.from_pretrained("diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16").to("cuda"), produces the following:

which is a great outpainting. However, I notice that the colors in the output that correspond to the masked area (the area that as I understand is not meant to be altered) are different from the original. The colors aren't as deep, and in some of my tests, small artefacts seem to appear where they didn't exist in the original. You can see the difference comparing the images above; in addition, you can see a difference in the RGB color levels in the area corresponding to the masked region as viewed in a photo editor (original first, outpaint output second):

I've been using guidance scale 7, inference steps 40, and strength of 1. I thought strength=1 might have caused the problem, but I tried it with lower strength as well and I notice the same degradation.

So......why? Why does this happen? Is there some kind of preprocessing of the image that degrades it? Is it possible to avoid degradation of the masked area with this pipeline?

spejamas

Oct 20, 2023

From the model card:
"When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version."
I'll check again what I was seeing with strength < 1. My mistake. Thank you for the disclaimer.

Next version anytime soon? :)

spejamas

Oct 21, 2023

NEVERMIND I TAKE IT BACK

After some more tests, the degradation from running the pipeline at .99 strength is indistinguishable from the degradation at 1 strength. Lower strengths also noticeably degrade the original picture, to a lesser extent than higher strengths. I'm not sure how to reverse the degradation, so for now this fine tune is at least for me unusable.

Wok

Oct 23, 2023

•

edited Oct 23, 2023

What I do is that I finalize the edit of the image using two layers (one for the input image, one for the output image) with a software like Gimp.
I use the eraser tool on the top layer and get what I want thanks to transparency. This way, I keep the sharpness of the input image and benefit from the inpainting in the output image.

spejamas

Oct 24, 2023

Updates with more observations:
The spikes in the second RGB visualization I referenced earlier arose from a problem in my workflow (RGB was converted to P). The true distribution is smoother (but still very different). Here's a better visualization, where red is the original image and blue is the outpainted:

And here's a visualization of the map of pixel-by-pixel differences:

It's worth noting that there is no pixel in the original that has a lower R,G, or B value than its corresponding pixel in the outpainted. The values strictly increase if they don't stay the same. This makes a more washed out image with no deep colors.

spejamas

Oct 24, 2023

The map I posted above seems inverted. Here's a map of the OUTPAINTED RGB values minus the ORIGINAL RGB values:

With this, it's easier to see what the pipeline is modifying about the image. These pixels + the original pixels = the outpainted pixels

spejamas

Oct 24, 2023

•

edited Oct 24, 2023

What I do is that I finalize the edit of the image using two layers (one for the input image, one for the output image) with a software like Gimp.
I use the eraser tool on the top layer and get what I want thanks to transparency. This way, I keep the sharpness of the input image and benefit from the inpainting in the output image.

This is a fine workflow, thank you Wok. My use case is a little more difficult—since I am outpainting, if I simply layer the original image on top of the output, I get a sharp/noticeable border where the color palette changes. Maybe it can be smoothed, but even then I definitely prefer the original colors better. They are deeper and richer. And the outpainted portion is large, not just one smaller piece of the image.

spejamas

Oct 24, 2023

Histograms of pixel differences in R, G, and B channels individually (outpainted r, g, b minus original r, g, b):

Seems the pipeline either adds < 100 (of these instances typically < 50), or > 200, with no in between.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment