Areas masked as NOT for inpaint are nonetheless altered, why is this?
Take this image I send for inpainting (or in this case, outpainting):
Using this mask:
This inpainting pipeline, instantiated as demonstrated in the model card pipe = AutoPipelineForInpainting.from_pretrained("diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16").to("cuda")
, produces the following:
which is a great outpainting. However, I notice that the colors in the output that correspond to the masked area (the area that as I understand is not meant to be altered) are different from the original. The colors aren't as deep, and in some of my tests, small artefacts seem to appear where they didn't exist in the original. You can see the difference comparing the images above; in addition, you can see a difference in the RGB color levels in the area corresponding to the masked region as viewed in a photo editor (original first, outpaint output second):
I've been using guidance scale 7, inference steps 40, and strength of 1. I thought strength=1 might have caused the problem, but I tried it with lower strength as well and I notice the same degradation.
So......why? Why does this happen? Is there some kind of preprocessing of the image that degrades it? Is it possible to avoid degradation of the masked area with this pipeline?
Oh
From the model card:
"When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version."
I'll check again what I was seeing with strength < 1. My mistake. Thank you for the disclaimer.
Next version anytime soon? :)
NEVERMIND I TAKE IT BACK
After some more tests, the degradation from running the pipeline at .99 strength is indistinguishable from the degradation at 1 strength. Lower strengths also noticeably degrade the original picture, to a lesser extent than higher strengths. I'm not sure how to reverse the degradation, so for now this fine tune is at least for me unusable.
What I do is that I finalize the edit of the image using two layers (one for the input image, one for the output image) with a software like Gimp.
I use the eraser tool on the top layer and get what I want thanks to transparency. This way, I keep the sharpness of the input image and benefit from the inpainting in the output image.
Updates with more observations:
The spikes in the second RGB visualization I referenced earlier arose from a problem in my workflow (RGB was converted to P). The true distribution is smoother (but still very different). Here's a better visualization, where red is the original image and blue is the outpainted:
And here's a visualization of the map of pixel-by-pixel differences:
It's worth noting that there is no pixel in the original that has a lower R,G, or B value than its corresponding pixel in the outpainted. The values strictly increase if they don't stay the same. This makes a more washed out image with no deep colors.
What I do is that I finalize the edit of the image using two layers (one for the input image, one for the output image) with a software like Gimp.
I use the eraser tool on the top layer and get what I want thanks to transparency. This way, I keep the sharpness of the input image and benefit from the inpainting in the output image.
This is a fine workflow, thank you Wok. My use case is a little more difficult—since I am outpainting, if I simply layer the original image on top of the output, I get a sharp/noticeable border where the color palette changes. Maybe it can be smoothed, but even then I definitely prefer the original colors better. They are deeper and richer. And the outpainted portion is large, not just one smaller piece of the image.