|
--- |
|
license: openrail |
|
base_model: runwayml/stable-diffusion-v1-5 |
|
tags: |
|
- art |
|
- controlnet |
|
- stable-diffusion |
|
--- |
|
|
|
# Controlnet |
|
|
|
Controlnet is an auxiliary model which augments pre-trained diffusion models with an additional conditioning. |
|
|
|
Controlnet comes with multiple auxiliary models, each which allows a different type of conditioning |
|
|
|
Controlnet's auxiliary models are trained with stable diffusion 1.5. Experimentally, the auxiliary models can be used with other diffusion models such as dreamboothed stable diffusion. |
|
|
|
The auxiliary conditioning is passed directly to the diffusers pipeline. If you want to process an image to create the auxiliary conditioning, external dependencies are required. |
|
|
|
Some of the additional conditionings can be extracted from images via additional models. We extracted these |
|
additional models from the original controlnet repo into a separate package that can be found on [github](https://github.com/patrickvonplaten/controlnet_aux.git). |
|
|
|
## Normal map |
|
|
|
### Diffusers |
|
|
|
```py |
|
from PIL import Image |
|
from transformers import pipeline |
|
import numpy as np |
|
import cv2 |
|
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler |
|
import torch |
|
|
|
image = Image.open("images/toy.png").convert("RGB") |
|
|
|
depth_estimator = pipeline("depth-estimation", model ="Intel/dpt-hybrid-midas" ) |
|
|
|
image = depth_estimator(image)['predicted_depth'][0] |
|
|
|
image = image.numpy() |
|
|
|
image_depth = image.copy() |
|
image_depth -= np.min(image_depth) |
|
image_depth /= np.max(image_depth) |
|
|
|
bg_threhold = 0.4 |
|
|
|
x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3) |
|
x[image_depth < bg_threhold] = 0 |
|
|
|
y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3) |
|
y[image_depth < bg_threhold] = 0 |
|
|
|
z = np.ones_like(x) * np.pi * 2.0 |
|
|
|
image = np.stack([x, y, z], axis=2) |
|
image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5 |
|
image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8) |
|
image = Image.fromarray(image) |
|
|
|
controlnet = ControlNetModel.from_pretrained( |
|
"fusing/stable-diffusion-v1-5-controlnet-normal", torch_dtype=torch.float16 |
|
) |
|
|
|
pipe = StableDiffusionControlNetPipeline.from_pretrained( |
|
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16 |
|
) |
|
|
|
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) |
|
|
|
# Remove if you do not have xformers installed |
|
# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers |
|
# for installation instructions |
|
pipe.enable_xformers_memory_efficient_attention() |
|
|
|
pipe.enable_model_cpu_offload() |
|
|
|
image = pipe("cute toy", image, num_inference_steps=20).images[0] |
|
|
|
image.save('images/toy_normal_out.png') |
|
``` |
|
|
|
![toy](./images/toy.png) |
|
|
|
![toy_normal](./images/toy_normal.png) |
|
|
|
![toy_normal_out](./images/toy_normal_out.png) |
|
|
|
### Training |
|
|
|
The normal model was trained from an initial model and then a further extended model. |
|
|
|
The initial normal model was trained on 25,452 normal-image, caption pairs from DIODE. The image captions were generated by BLIP. The model was trained for 100 GPU-hours with Nvidia A100 80G using Stable Diffusion 1.5 as a base model. |
|
|
|
The extended normal model further trained the initial normal model on "coarse" normal maps. The coarse normal maps were generated using Midas to compute a depth map and then performing normal-from-distance. The model was trained for 200 GPU-hours with Nvidia A100 80G using the initial normal model as a base model. |
|
|