Spaces:
Running
on
L40S
Running
on
L40S
add app.py
Browse files
README.md
CHANGED
@@ -1,127 +1,10 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
> **OminiControl: Minimal and Universal Control for Diffuison Transformer**
|
12 |
-
> <br>
|
13 |
-
> Zhenxiong Tan,
|
14 |
-
> [Songhua Liu](http://121.37.94.87/),
|
15 |
-
> [Xingyi Yang](https://adamdad.github.io/),
|
16 |
-
> Qiaochu Xue,
|
17 |
-
> and
|
18 |
-
> [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
|
19 |
-
> <br>
|
20 |
-
> [Learning and Vision Lab](http://lv-nus.org/), National University of Singapore
|
21 |
-
> <br>
|
22 |
-
|
23 |
-
|
24 |
-
## Features
|
25 |
-
|
26 |
-
OmniControl is a minimal yet powerful universal control framework for Diffusion Transformer models like [FLUX](https://github.com/black-forest-labs/flux).
|
27 |
-
|
28 |
-
* **Universal Control π**: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
|
29 |
-
|
30 |
-
* **Minimal Design π**: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
|
31 |
-
|
32 |
-
## Quick Start
|
33 |
-
### Setup (Optional)
|
34 |
-
1. **Environment setup**
|
35 |
-
```bash
|
36 |
-
conda create -n omini python=3.10
|
37 |
-
conda activate omini
|
38 |
-
```
|
39 |
-
2. **Requirements installation**
|
40 |
-
```bash
|
41 |
-
pip install -r requirements.txt
|
42 |
-
```
|
43 |
-
### Usage example
|
44 |
-
1. Subject-driven generation: `examples/subject.ipynb`
|
45 |
-
2. In-painting: `examples/inpainting.ipynb`
|
46 |
-
3. Canny edge to image, depth to image, colorization, deblurring: `examples/spatial.ipynb`
|
47 |
-
|
48 |
-
## Generated samples
|
49 |
-
### Subject-driven generation
|
50 |
-
**Demos** (Left: condition image; Right: generated image)
|
51 |
-
|
52 |
-
<div float="left">
|
53 |
-
<img src='./assets/demo/oranges_omini.jpg' width='48%'/>
|
54 |
-
<img src='./assets/demo/rc_car_omini.jpg' width='48%' />
|
55 |
-
<img src='./assets/demo/clock_omini.jpg' width='48%' />
|
56 |
-
<img src='./assets/demo/shirt_omini.jpg' width='48%' />
|
57 |
-
</div>
|
58 |
-
|
59 |
-
<details>
|
60 |
-
<summary>Text Prompts</summary>
|
61 |
-
|
62 |
-
- Prompt1: *A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'*
|
63 |
-
- Prompt2: *A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.*
|
64 |
-
- Prompt3: *In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.*
|
65 |
-
- Prompt4: *In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.*
|
66 |
-
</details>
|
67 |
-
<details>
|
68 |
-
<summary>More results</summary>
|
69 |
-
|
70 |
-
* Try on:
|
71 |
-
<img src='./assets/demo/try_on.jpg'/>
|
72 |
-
* Scene variations:
|
73 |
-
<img src='./assets/demo/scene_variation.jpg'/>
|
74 |
-
* Dreambooth dataset:
|
75 |
-
<img src='./assets/demo/dreambooth_res.jpg'/>
|
76 |
-
</details>
|
77 |
-
|
78 |
-
### Spaitally aligned control
|
79 |
-
1. **Image Inpainting** (Left: original image; Center: masked image; Right: filled image)
|
80 |
-
- Prompt: *The Mona Lisa is wearing a white VR headset with 'Omini' written on it.*
|
81 |
-
</br>
|
82 |
-
<img src='./assets/demo/monalisa_omini.jpg' width='700px' />
|
83 |
-
- Prompt: *A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.*
|
84 |
-
</br>
|
85 |
-
<img src='./assets/demo/book_omini.jpg' width='700px' />
|
86 |
-
2. **Other spatially aligned tasks** (Canny edge to image, depth to image, colorization, deblurring)
|
87 |
-
</br>
|
88 |
-
<details>
|
89 |
-
<summary>Click to show</summary>
|
90 |
-
<div float="left">
|
91 |
-
<img src='./assets/demo/room_corner_canny.jpg' width='48%'/>
|
92 |
-
<img src='./assets/demo/room_corner_depth.jpg' width='48%' />
|
93 |
-
<img src='./assets/demo/room_corner_coloring.jpg' width='48%' />
|
94 |
-
<img src='./assets/demo/room_corner_deblurring.jpg' width='48%' />
|
95 |
-
</div>
|
96 |
-
|
97 |
-
Prompt: *A light gray sofa stands against a white wall, featuring a black and white geometric patterned pillow. A white side table sits next to the sofa, topped with a white adjustable desk lamp and some books. Dark hardwood flooring contrasts with the pale walls and furniture.*
|
98 |
-
</details>
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
## Models
|
104 |
-
|
105 |
-
**Subject-driven control:**
|
106 |
-
| Model | Base model | Description | Resolution |
|
107 |
-
| ------------------------------------------------------------------------------------------------ | -------------- | -------------------------------------------------------------------------------------------------------- | ------------ |
|
108 |
-
| [`experimental`](https://huggingface.co/Yuanshi/OminiControl/tree/main/experimental) / `subject` | FLUX.1-schnell | The model used in the paper. | (512, 512) |
|
109 |
-
| [`omini`](https://huggingface.co/Yuanshi/OminiControl/tree/main/omini) / `subject_512` | FLUX.1-schnell | The model has been fine-tuned on a larger dataset. | (512, 512) |
|
110 |
-
| [`omini`](https://huggingface.co/Yuanshi/OminiControl/tree/main/omini) / `subject_1024` | FLUX.1-schnell | The model has been fine-tuned on a larger dataset and accommodates higher resolution. (To be released) | (1024, 1024) |
|
111 |
-
|
112 |
-
**Spatial aligned control:**
|
113 |
-
| Model | Base model | Description | Resolution |
|
114 |
-
| --------------------------------------------------------------------------------------------------------- | ---------- | -------------------------------------------------------------------------- | ------------ |
|
115 |
-
| [`experimental`](https://huggingface.co/Yuanshi/OminiControl/tree/main/experimental) / `<task_name>` | FLUX.1 | Canny edge to image, depth to image, colorization, deblurring, in-painting | (512, 512) |
|
116 |
-
| [`experimental`](https://huggingface.co/Yuanshi/OminiControl/tree/main/experimental) / `<task_name>_1024` | FLUX.1 | Supports higher resolution.(To be released) | (1024, 1024) |
|
117 |
-
|
118 |
-
## Citation
|
119 |
-
```
|
120 |
-
@article{
|
121 |
-
tan2024omini,
|
122 |
-
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
|
123 |
-
author={Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang},
|
124 |
-
journal={arXiv preprint arXiv:2411.15098},
|
125 |
-
year={2024}
|
126 |
-
}
|
127 |
-
```
|
|
|
1 |
+
---
|
2 |
+
title: OminiControl
|
3 |
+
emoji: π
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.6.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
ADDED
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import torch
|
3 |
+
from PIL import Image, ImageDraw, ImageFont
|
4 |
+
from src.condition import Condition
|
5 |
+
from diffusers.pipelines import FluxPipeline
|
6 |
+
import numpy as np
|
7 |
+
|
8 |
+
from src.generate import seed_everything, generate
|
9 |
+
|
10 |
+
pipe = None
|
11 |
+
|
12 |
+
|
13 |
+
def init_pipeline():
|
14 |
+
global pipe
|
15 |
+
pipe = FluxPipeline.from_pretrained(
|
16 |
+
"black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
|
17 |
+
)
|
18 |
+
pipe = pipe.to("cuda")
|
19 |
+
pipe.load_lora_weights(
|
20 |
+
"Yuanshi/OminiControl",
|
21 |
+
weight_name=f"omini/subject_512.safetensors",
|
22 |
+
adapter_name="subject",
|
23 |
+
)
|
24 |
+
|
25 |
+
|
26 |
+
def process_image_and_text(image, text):
|
27 |
+
# center crop image
|
28 |
+
w, h, min_size = image.size[0], image.size[1], min(image.size)
|
29 |
+
image = image.crop(
|
30 |
+
(
|
31 |
+
(w - min_size) // 2,
|
32 |
+
(h - min_size) // 2,
|
33 |
+
(w + min_size) // 2,
|
34 |
+
(h + min_size) // 2,
|
35 |
+
)
|
36 |
+
)
|
37 |
+
image = image.resize((512, 512))
|
38 |
+
|
39 |
+
condition = Condition("subject", image)
|
40 |
+
|
41 |
+
if pipe is None:
|
42 |
+
init_pipeline()
|
43 |
+
|
44 |
+
result_img = generate(
|
45 |
+
pipe,
|
46 |
+
prompt=text.strip(),
|
47 |
+
conditions=[condition],
|
48 |
+
num_inference_steps=8,
|
49 |
+
height=512,
|
50 |
+
width=512,
|
51 |
+
).images[0]
|
52 |
+
|
53 |
+
return result_img
|
54 |
+
|
55 |
+
|
56 |
+
def get_samples():
|
57 |
+
sample_list = [
|
58 |
+
{
|
59 |
+
"image": "assets/oranges.jpg",
|
60 |
+
"text": "A very close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!'",
|
61 |
+
},
|
62 |
+
{
|
63 |
+
"image": "assets/penguin.jpg",
|
64 |
+
"text": "On Christmas evening, on a crowded sidewalk, this item sits on the road, covered in snow and wearing a Christmas hat, holding a sign that reads 'Omini Control!'",
|
65 |
+
},
|
66 |
+
{
|
67 |
+
"image": "assets/rc_car.jpg",
|
68 |
+
"text": "A film style shot. On the moon, this item drives across the moon surface. The background is that Earth looms large in the foreground.",
|
69 |
+
},
|
70 |
+
{
|
71 |
+
"image": "assets/clock.jpg",
|
72 |
+
"text": "In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.",
|
73 |
+
},
|
74 |
+
]
|
75 |
+
return [[Image.open(sample["image"]), sample["text"]] for sample in sample_list]
|
76 |
+
|
77 |
+
|
78 |
+
demo = gr.Interface(
|
79 |
+
fn=process_image_and_text,
|
80 |
+
inputs=[
|
81 |
+
gr.Image(type="pil"),
|
82 |
+
gr.Textbox(lines=2),
|
83 |
+
],
|
84 |
+
outputs=gr.Image(type="pil"),
|
85 |
+
title="OminiControl / Subject driven generation",
|
86 |
+
examples=get_samples(),
|
87 |
+
)
|
88 |
+
|
89 |
+
if __name__ == "__main__":
|
90 |
+
init_pipeline()
|
91 |
+
demo.launch(
|
92 |
+
debug=True,
|
93 |
+
)
|