Text-to-Image
Diffusers
English
wanghaofan commited on
Commit
e45bee7
β€’
1 Parent(s): 8bef12f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +281 -277
README.md CHANGED
@@ -1,278 +1,282 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- <div align="center">
5
-
6
- [//]: # (<h1>CSGO: Content-Style Composition in Text-to-Image Generation</h1>)
7
-
8
- [//]: # ()
9
- [//]: # ([**Peng Xing**]&#40;https://github.com/xingp-ng&#41;<sup>12*</sup> Β· [**Haofan Wang**]&#40;https://haofanwang.github.io/&#41;<sup>1*</sup> Β· [**Yanpeng Sun**]&#40;https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN&oi=ao/&#41;<sup>2</sup> Β· [**Qixun Wang**]&#40;https://github.com/wangqixun&#41;<sup>1</sup> Β· [**Xu Bai**]&#40;https://huggingface.co/baymin0220&#41;<sup>1</sup> Β· [**Hao Ai**]&#40;https://github.com/aihao2000&#41;<sup>13</sup> Β· [**Renyuan Huang**]&#40;https://github.com/DannHuang&#41;<sup>14</sup> Β· [**Zechao Li**]&#40;https://zechao-li.github.io/&#41;<sup>2βœ‰</sup>)
10
-
11
- [//]: # ()
12
- [//]: # (<sup>1</sup>InstantX Team Β· <sup>2</sup>Nanjing University of Science and Technology Β· <sup>3</sup>Beihang University Β· <sup>4</sup>Peking University)
13
-
14
- [//]: # (<sup>*</sup>equal contributions, <sup>βœ‰</sup>corresponding authors)
15
-
16
- <a href='https://csgo-gen.github.io/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
17
- <a href='https://arxiv.org/abs/2408.16766'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
18
- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-App-red)](https://huggingface.co/spaces/xingpng/CSGO/)
19
- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/spaces/InstantX/CSGO)
20
-
21
-
22
- </div>
23
-
24
-
25
- [//]: # (## Updates πŸ”₯)
26
-
27
- [//]: # ()
28
- [//]: # ([//]: # &#40;- **`2024/07/19`**: ✨ We support 🎞️ portrait video editing &#40;aka v2v&#41;! More to see [here]&#40;assets/docs/changelog/2024-07-19.md&#41;.&#41;)
29
- [//]: # ()
30
- [//]: # ([//]: # &#40;- **`2024/07/17`**: 🍎 We support macOS with Apple Silicon, modified from [jeethu]&#40;https://github.com/jeethu&#41;'s PR [#143]&#40;https://github.com/KwaiVGI/LivePortrait/pull/143&#41;.&#41;)
31
- [//]: # ()
32
- [//]: # ([//]: # &#40;- **`2024/07/10`**: πŸ’ͺ We support audio and video concatenating, driving video auto-cropping, and template making to protect privacy. More to see [here]&#40;assets/docs/changelog/2024-07-10.md&#41;.&#41;)
33
- [//]: # ()
34
- [//]: # ([//]: # &#40;- **`2024/07/09`**: πŸ€— We released the [HuggingFace Space]&#40;https://huggingface.co/spaces/KwaiVGI/liveportrait&#41;, thanks to the HF team and [Gradio]&#40;https://github.com/gradio-app/gradio&#41;!&#41;)
35
- [//]: # ([//]: # &#40;Continuous updates, stay tuned!&#41;)
36
- [//]: # (- **`2024/08/30`**: 😊 We released the initial version of the inference code.)
37
-
38
- [//]: # (- **`2024/08/30`**: 😊 We released the technical report on [arXiv]&#40;https://arxiv.org/pdf/2408.16766&#41;)
39
-
40
- [//]: # (- **`2024/07/15`**: πŸ”₯ We released the [homepage]&#40;https://csgo-gen.github.io&#41;.)
41
-
42
- [//]: # ()
43
- [//]: # (## Plan πŸ’ͺ)
44
-
45
- [//]: # (- [x] technical report)
46
-
47
- [//]: # (- [x] inference code)
48
-
49
- [//]: # (- [ ] pre-trained weight)
50
-
51
- [//]: # (- [ ] IMAGStyle dataset)
52
-
53
- [//]: # (- [ ] training code)
54
-
55
- ## Introduction πŸ“–
56
- This repo, named **CSGO**, contains the official PyTorch implementation of our paper [CSGO: Content-Style Composition in Text-to-Image Generation](https://arxiv.org/pdf/).
57
- We are actively updating and improving this repository. If you find any bugs or have suggestions, welcome to raise issues or submit pull requests (PR) πŸ’–.
58
-
59
- ## Detail ✨
60
- We currently release two model weights.
61
-
62
- | Mode | content token | style token | Other |
63
- |:----------------:|:-----------:|:-----------:|:---------------------------------:|
64
- | csgo.bin |4|16| - |
65
- | csgo_4_32.bin |4|32| Deepspeed zero2 |
66
- | csgo_4_32_v2.bin |4|32| Deepspeed zero2+more(coming soon) |
67
-
68
-
69
- ## Pipeline πŸ’»
70
- <p align="center">
71
- <img src="assets/image3_1.jpg">
72
- </p>
73
-
74
- ## Capabilities πŸš…
75
-
76
- πŸ”₯ Our CSGO achieves **image-driven style transfer, text-driven stylized synthesis, and text editing-driven stylized synthesis**.
77
-
78
- πŸ”₯ For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> πŸ”₯
79
-
80
- <p align="center">
81
- <img src="assets/vis.jpg">
82
- </p>
83
-
84
-
85
- ## Getting Started 🏁
86
- ### 1. Clone the code and prepare the environment
87
- ```bash
88
- git clone https://github.com/instantX-research/CSGO
89
- cd CSGO
90
-
91
- # create env using conda
92
- conda create -n CSGO python=3.9
93
- conda activate CSGO
94
-
95
- # install dependencies with pip
96
- # for Linux and Windows users
97
- pip install -r requirements.txt
98
- ```
99
-
100
- ### 2. Download pretrained weights(coming soon)
101
-
102
- The easiest way to download the pretrained weights is from HuggingFace:
103
- ```bash
104
- # first, ensure git-lfs is installed, see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
105
- git lfs install
106
- # clone and move the weights
107
- git clone https://huggingface.co/InstantX/CSGO
108
- ```
109
- Our method is fully compatible with [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix), [ControlNet](https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic), and [Image Encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder).
110
- Please download them and place them in the ./base_models folder.
111
-
112
- tips:If you expect to load Controlnet directly using ControlNetPipeline as in CSGO, do the following:
113
- ```bash
114
- git clone https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic
115
- mv TTPLanet_SDXL_Controlnet_Tile_Realistic/TTPLANET_Controlnet_Tile_realistic_v2_fp16.safetensors TTPLanet_SDXL_Controlnet_Tile_Realistic/diffusion_pytorch_model.safetensors
116
- ```
117
- ### 3. Inference πŸš€
118
-
119
- ```python
120
- import torch
121
- from ip_adapter.utils import resize_content
122
- import numpy as np
123
- from ip_adapter.utils import BLOCKS as BLOCKS
124
- from ip_adapter.utils import controlnet_BLOCKS as controlnet_BLOCKS
125
- from PIL import Image
126
- from diffusers import (
127
- AutoencoderKL,
128
- ControlNetModel,
129
- StableDiffusionXLControlNetPipeline,
130
-
131
- )
132
- from ip_adapter import CSGO
133
-
134
-
135
- device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
136
-
137
- base_model_path = "./base_models/stable-diffusion-xl-base-1.0"
138
- image_encoder_path = "./base_models/IP-Adapter/sdxl_models/image_encoder"
139
- csgo_ckpt = "./CSGO/csgo.bin"
140
- pretrained_vae_name_or_path ='./base_models/sdxl-vae-fp16-fix'
141
- controlnet_path = "./base_models/TTPLanet_SDXL_Controlnet_Tile_Realistic"
142
- weight_dtype = torch.float16
143
-
144
-
145
- vae = AutoencoderKL.from_pretrained(pretrained_vae_name_or_path,torch_dtype=torch.float16)
146
- controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16,use_safetensors=True)
147
- pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
148
- base_model_path,
149
- controlnet=controlnet,
150
- torch_dtype=torch.float16,
151
- add_watermarker=False,
152
- vae=vae
153
- )
154
- pipe.enable_vae_tiling()
155
-
156
-
157
- target_content_blocks = BLOCKS['content']
158
- target_style_blocks = BLOCKS['style']
159
- controlnet_target_content_blocks = controlnet_BLOCKS['content']
160
- controlnet_target_style_blocks = controlnet_BLOCKS['style']
161
-
162
- csgo = CSGO(pipe, image_encoder_path, csgo_ckpt, device, num_content_tokens=4,num_style_tokens=32,
163
- target_content_blocks=target_content_blocks, target_style_blocks=target_style_blocks,controlnet_adapter=True,
164
- controlnet_target_content_blocks=controlnet_target_content_blocks,
165
- controlnet_target_style_blocks=controlnet_target_style_blocks,
166
- content_model_resampler=True,
167
- style_model_resampler=True,
168
-
169
- )
170
-
171
- style_name = 'img_1.png'
172
- content_name = 'img_0.png'
173
- style_image = Image.open("../assets/{}".format(style_name)).convert('RGB')
174
- content_image = Image.open('../assets/{}'.format(content_name)).convert('RGB')
175
-
176
- caption ='a small house with a sheep statue on top of it'
177
-
178
- num_sample=4
179
-
180
- #image-driven style transfer
181
- images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
182
- prompt=caption,
183
- negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
184
- content_scale=1.0,
185
- style_scale=1.0,
186
- guidance_scale=10,
187
- num_images_per_prompt=num_sample,
188
- num_samples=1,
189
- num_inference_steps=50,
190
- seed=42,
191
- image=content_image.convert('RGB'),
192
- controlnet_conditioning_scale=0.6,
193
- )
194
-
195
- #text editing-driven stylized synthesis
196
- caption='a small house'
197
- images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
198
- prompt=caption,
199
- negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
200
- content_scale=1.0,
201
- style_scale=1.0,
202
- guidance_scale=10,
203
- num_images_per_prompt=num_sample,
204
- num_samples=1,
205
- num_inference_steps=50,
206
- seed=42,
207
- image=content_image.convert('RGB'),
208
- controlnet_conditioning_scale=0.4,
209
- )
210
-
211
- #text-driven stylized synthesis
212
- caption='a cat'
213
- #If the content image still interferes with the generated results, set the content image to an empty image.
214
- # content_image =Image.fromarray(np.zeros((content_image.size[0],content_image.size[1], 3), dtype=np.uint8)).convert('RGB')
215
-
216
- images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
217
- prompt=caption,
218
- negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
219
- content_scale=1.0,
220
- style_scale=1.0,
221
- guidance_scale=10,
222
- num_images_per_prompt=num_sample,
223
- num_samples=1,
224
- num_inference_steps=50,
225
- seed=42,
226
- image=content_image.convert('RGB'),
227
- controlnet_conditioning_scale=0.01,
228
- )
229
- ```
230
-
231
- ## Demos
232
- <p align="center">
233
- <br>
234
- πŸ”₯ For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> πŸ”₯
235
- </p>
236
-
237
- ### Content-Style Composition
238
- <p align="center">
239
- <img src="assets/page1.png">
240
- </p>
241
-
242
- <p align="center">
243
- <img src="assets/page4.png">
244
- </p>
245
-
246
- ### Cycle Translation
247
- <p align="center">
248
- <img src="assets/page8.png">
249
- </p>
250
-
251
- ### Text-Driven Style Synthesis
252
- <p align="center">
253
- <img src="assets/page10.png">
254
- </p>
255
-
256
- ### Text Editing-Driven Style Synthesis
257
- <p align="center">
258
- <img src="assets/page11.jpg">
259
- </p>
260
-
261
- ## Star History
262
- [![Star History Chart](https://api.star-history.com/svg?repos=instantX-research/CSGO&type=Date)](https://star-history.com/#instantX-research/CSGO&Date)
263
-
264
-
265
-
266
- ## Acknowledgements
267
- This project is developed by InstantX Team, all copyright reserved.
268
-
269
- ## Citation πŸ’–
270
- If you find CSGO useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:
271
- ```bibtex
272
- @article{xing2024csgo,
273
- title={CSGO: Content-Style Composition in Text-to-Image Generation},
274
- author={Peng Xing and Haofan Wang and Yanpeng Sun and Qixun Wang and Xu Bai and Hao Ai and Renyuan Huang and Zechao Li},
275
- year={2024},
276
- journal = {arXiv 2408.16766},
277
- }
 
 
 
 
278
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: diffusers
6
+ pipeline_tag: text-to-image
7
+ ---
8
+ <div align="center">
9
+
10
+ [//]: # (<h1>CSGO: Content-Style Composition in Text-to-Image Generation</h1>)
11
+
12
+ [//]: # ()
13
+ [//]: # ([**Peng Xing**]&#40;https://github.com/xingp-ng&#41;<sup>12*</sup> Β· [**Haofan Wang**]&#40;https://haofanwang.github.io/&#41;<sup>1*</sup> Β· [**Yanpeng Sun**]&#40;https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN&oi=ao/&#41;<sup>2</sup> Β· [**Qixun Wang**]&#40;https://github.com/wangqixun&#41;<sup>1</sup> Β· [**Xu Bai**]&#40;https://huggingface.co/baymin0220&#41;<sup>1</sup> Β· [**Hao Ai**]&#40;https://github.com/aihao2000&#41;<sup>13</sup> Β· [**Renyuan Huang**]&#40;https://github.com/DannHuang&#41;<sup>14</sup> Β· [**Zechao Li**]&#40;https://zechao-li.github.io/&#41;<sup>2βœ‰</sup>)
14
+
15
+ [//]: # ()
16
+ [//]: # (<sup>1</sup>InstantX Team Β· <sup>2</sup>Nanjing University of Science and Technology Β· <sup>3</sup>Beihang University Β· <sup>4</sup>Peking University)
17
+
18
+ [//]: # (<sup>*</sup>equal contributions, <sup>βœ‰</sup>corresponding authors)
19
+
20
+ <a href='https://csgo-gen.github.io/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
21
+ <a href='https://arxiv.org/abs/2408.16766'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
22
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-App-red)](https://huggingface.co/spaces/xingpng/CSGO/)
23
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/spaces/InstantX/CSGO)
24
+
25
+
26
+ </div>
27
+
28
+
29
+ [//]: # (## Updates πŸ”₯)
30
+
31
+ [//]: # ()
32
+ [//]: # ([//]: # &#40;- **`2024/07/19`**: ✨ We support 🎞️ portrait video editing &#40;aka v2v&#41;! More to see [here]&#40;assets/docs/changelog/2024-07-19.md&#41;.&#41;)
33
+ [//]: # ()
34
+ [//]: # ([//]: # &#40;- **`2024/07/17`**: 🍎 We support macOS with Apple Silicon, modified from [jeethu]&#40;https://github.com/jeethu&#41;'s PR [#143]&#40;https://github.com/KwaiVGI/LivePortrait/pull/143&#41;.&#41;)
35
+ [//]: # ()
36
+ [//]: # ([//]: # &#40;- **`2024/07/10`**: πŸ’ͺ We support audio and video concatenating, driving video auto-cropping, and template making to protect privacy. More to see [here]&#40;assets/docs/changelog/2024-07-10.md&#41;.&#41;)
37
+ [//]: # ()
38
+ [//]: # ([//]: # &#40;- **`2024/07/09`**: πŸ€— We released the [HuggingFace Space]&#40;https://huggingface.co/spaces/KwaiVGI/liveportrait&#41;, thanks to the HF team and [Gradio]&#40;https://github.com/gradio-app/gradio&#41;!&#41;)
39
+ [//]: # ([//]: # &#40;Continuous updates, stay tuned!&#41;)
40
+ [//]: # (- **`2024/08/30`**: 😊 We released the initial version of the inference code.)
41
+
42
+ [//]: # (- **`2024/08/30`**: 😊 We released the technical report on [arXiv]&#40;https://arxiv.org/pdf/2408.16766&#41;)
43
+
44
+ [//]: # (- **`2024/07/15`**: πŸ”₯ We released the [homepage]&#40;https://csgo-gen.github.io&#41;.)
45
+
46
+ [//]: # ()
47
+ [//]: # (## Plan πŸ’ͺ)
48
+
49
+ [//]: # (- [x] technical report)
50
+
51
+ [//]: # (- [x] inference code)
52
+
53
+ [//]: # (- [ ] pre-trained weight)
54
+
55
+ [//]: # (- [ ] IMAGStyle dataset)
56
+
57
+ [//]: # (- [ ] training code)
58
+
59
+ ## Introduction πŸ“–
60
+ This repo, named **CSGO**, contains the official PyTorch implementation of our paper [CSGO: Content-Style Composition in Text-to-Image Generation](https://arxiv.org/pdf/).
61
+ We are actively updating and improving this repository. If you find any bugs or have suggestions, welcome to raise issues or submit pull requests (PR) πŸ’–.
62
+
63
+ ## Detail ✨
64
+ We currently release two model weights.
65
+
66
+ | Mode | content token | style token | Other |
67
+ |:----------------:|:-----------:|:-----------:|:---------------------------------:|
68
+ | csgo.bin |4|16| - |
69
+ | csgo_4_32.bin |4|32| Deepspeed zero2 |
70
+ | csgo_4_32_v2.bin |4|32| Deepspeed zero2+more(coming soon) |
71
+
72
+
73
+ ## Pipeline πŸ’»
74
+ <p align="center">
75
+ <img src="assets/image3_1.jpg">
76
+ </p>
77
+
78
+ ## Capabilities πŸš…
79
+
80
+ πŸ”₯ Our CSGO achieves **image-driven style transfer, text-driven stylized synthesis, and text editing-driven stylized synthesis**.
81
+
82
+ πŸ”₯ For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> πŸ”₯
83
+
84
+ <p align="center">
85
+ <img src="assets/vis.jpg">
86
+ </p>
87
+
88
+
89
+ ## Getting Started 🏁
90
+ ### 1. Clone the code and prepare the environment
91
+ ```bash
92
+ git clone https://github.com/instantX-research/CSGO
93
+ cd CSGO
94
+
95
+ # create env using conda
96
+ conda create -n CSGO python=3.9
97
+ conda activate CSGO
98
+
99
+ # install dependencies with pip
100
+ # for Linux and Windows users
101
+ pip install -r requirements.txt
102
+ ```
103
+
104
+ ### 2. Download pretrained weights(coming soon)
105
+
106
+ The easiest way to download the pretrained weights is from HuggingFace:
107
+ ```bash
108
+ # first, ensure git-lfs is installed, see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
109
+ git lfs install
110
+ # clone and move the weights
111
+ git clone https://huggingface.co/InstantX/CSGO
112
+ ```
113
+ Our method is fully compatible with [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix), [ControlNet](https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic), and [Image Encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder).
114
+ Please download them and place them in the ./base_models folder.
115
+
116
+ tips:If you expect to load Controlnet directly using ControlNetPipeline as in CSGO, do the following:
117
+ ```bash
118
+ git clone https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic
119
+ mv TTPLanet_SDXL_Controlnet_Tile_Realistic/TTPLANET_Controlnet_Tile_realistic_v2_fp16.safetensors TTPLanet_SDXL_Controlnet_Tile_Realistic/diffusion_pytorch_model.safetensors
120
+ ```
121
+ ### 3. Inference πŸš€
122
+
123
+ ```python
124
+ import torch
125
+ from ip_adapter.utils import resize_content
126
+ import numpy as np
127
+ from ip_adapter.utils import BLOCKS as BLOCKS
128
+ from ip_adapter.utils import controlnet_BLOCKS as controlnet_BLOCKS
129
+ from PIL import Image
130
+ from diffusers import (
131
+ AutoencoderKL,
132
+ ControlNetModel,
133
+ StableDiffusionXLControlNetPipeline,
134
+
135
+ )
136
+ from ip_adapter import CSGO
137
+
138
+
139
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
140
+
141
+ base_model_path = "./base_models/stable-diffusion-xl-base-1.0"
142
+ image_encoder_path = "./base_models/IP-Adapter/sdxl_models/image_encoder"
143
+ csgo_ckpt = "./CSGO/csgo.bin"
144
+ pretrained_vae_name_or_path ='./base_models/sdxl-vae-fp16-fix'
145
+ controlnet_path = "./base_models/TTPLanet_SDXL_Controlnet_Tile_Realistic"
146
+ weight_dtype = torch.float16
147
+
148
+
149
+ vae = AutoencoderKL.from_pretrained(pretrained_vae_name_or_path,torch_dtype=torch.float16)
150
+ controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16,use_safetensors=True)
151
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
152
+ base_model_path,
153
+ controlnet=controlnet,
154
+ torch_dtype=torch.float16,
155
+ add_watermarker=False,
156
+ vae=vae
157
+ )
158
+ pipe.enable_vae_tiling()
159
+
160
+
161
+ target_content_blocks = BLOCKS['content']
162
+ target_style_blocks = BLOCKS['style']
163
+ controlnet_target_content_blocks = controlnet_BLOCKS['content']
164
+ controlnet_target_style_blocks = controlnet_BLOCKS['style']
165
+
166
+ csgo = CSGO(pipe, image_encoder_path, csgo_ckpt, device, num_content_tokens=4,num_style_tokens=32,
167
+ target_content_blocks=target_content_blocks, target_style_blocks=target_style_blocks,controlnet_adapter=True,
168
+ controlnet_target_content_blocks=controlnet_target_content_blocks,
169
+ controlnet_target_style_blocks=controlnet_target_style_blocks,
170
+ content_model_resampler=True,
171
+ style_model_resampler=True,
172
+
173
+ )
174
+
175
+ style_name = 'img_1.png'
176
+ content_name = 'img_0.png'
177
+ style_image = Image.open("../assets/{}".format(style_name)).convert('RGB')
178
+ content_image = Image.open('../assets/{}'.format(content_name)).convert('RGB')
179
+
180
+ caption ='a small house with a sheep statue on top of it'
181
+
182
+ num_sample=4
183
+
184
+ #image-driven style transfer
185
+ images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
186
+ prompt=caption,
187
+ negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
188
+ content_scale=1.0,
189
+ style_scale=1.0,
190
+ guidance_scale=10,
191
+ num_images_per_prompt=num_sample,
192
+ num_samples=1,
193
+ num_inference_steps=50,
194
+ seed=42,
195
+ image=content_image.convert('RGB'),
196
+ controlnet_conditioning_scale=0.6,
197
+ )
198
+
199
+ #text editing-driven stylized synthesis
200
+ caption='a small house'
201
+ images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
202
+ prompt=caption,
203
+ negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
204
+ content_scale=1.0,
205
+ style_scale=1.0,
206
+ guidance_scale=10,
207
+ num_images_per_prompt=num_sample,
208
+ num_samples=1,
209
+ num_inference_steps=50,
210
+ seed=42,
211
+ image=content_image.convert('RGB'),
212
+ controlnet_conditioning_scale=0.4,
213
+ )
214
+
215
+ #text-driven stylized synthesis
216
+ caption='a cat'
217
+ #If the content image still interferes with the generated results, set the content image to an empty image.
218
+ # content_image =Image.fromarray(np.zeros((content_image.size[0],content_image.size[1], 3), dtype=np.uint8)).convert('RGB')
219
+
220
+ images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
221
+ prompt=caption,
222
+ negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
223
+ content_scale=1.0,
224
+ style_scale=1.0,
225
+ guidance_scale=10,
226
+ num_images_per_prompt=num_sample,
227
+ num_samples=1,
228
+ num_inference_steps=50,
229
+ seed=42,
230
+ image=content_image.convert('RGB'),
231
+ controlnet_conditioning_scale=0.01,
232
+ )
233
+ ```
234
+
235
+ ## Demos
236
+ <p align="center">
237
+ <br>
238
+ πŸ”₯ For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> πŸ”₯
239
+ </p>
240
+
241
+ ### Content-Style Composition
242
+ <p align="center">
243
+ <img src="assets/page1.png">
244
+ </p>
245
+
246
+ <p align="center">
247
+ <img src="assets/page4.png">
248
+ </p>
249
+
250
+ ### Cycle Translation
251
+ <p align="center">
252
+ <img src="assets/page8.png">
253
+ </p>
254
+
255
+ ### Text-Driven Style Synthesis
256
+ <p align="center">
257
+ <img src="assets/page10.png">
258
+ </p>
259
+
260
+ ### Text Editing-Driven Style Synthesis
261
+ <p align="center">
262
+ <img src="assets/page11.jpg">
263
+ </p>
264
+
265
+ ## Star History
266
+ [![Star History Chart](https://api.star-history.com/svg?repos=instantX-research/CSGO&type=Date)](https://star-history.com/#instantX-research/CSGO&Date)
267
+
268
+
269
+
270
+ ## Acknowledgements
271
+ This project is developed by InstantX Team, all copyright reserved.
272
+
273
+ ## Citation πŸ’–
274
+ If you find CSGO useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:
275
+ ```bibtex
276
+ @article{xing2024csgo,
277
+ title={CSGO: Content-Style Composition in Text-to-Image Generation},
278
+ author={Peng Xing and Haofan Wang and Yanpeng Sun and Qixun Wang and Xu Bai and Hao Ai and Renyuan Huang and Zechao Li},
279
+ year={2024},
280
+ journal = {arXiv 2408.16766},
281
+ }
282
  ```