Text-to-Image
Diffusers
English
File size: 5,235 Bytes
a802a86
 
 
 
 
 
73d7814
a802a86
 
 
 
 
 
800ec62
a802a86
 
 
 
 
73d7814
 
 
 
a802a86
 
1a10df7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a802a86
1a10df7
a802a86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73d7814
 
1a10df7
 
 
 
 
 
 
 
 
 
 
73d7814
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db58ef7
 
73d7814
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: apache-2.0
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
base_model: stabilityai/stable-diffusion-xl-base-1.0
---

# Target-Driven Distillation

<div align="center">

[**Project Page**](https://redaigc.github.io/TDD/) **|** [**Paper**](https://arxiv.org/abs/2409.01347) **|** [**Code**](https://github.com/RedAIGC/Target-Driven-Distillation) **|** [**Model**](https://huggingface.co/RED-AIGC/TDD) **|** [🤗 **TDD-SDXL Demo**](https://huggingface.co/spaces/RED-AIGC/TDD) **|** [🤗 **SVD-TDD Demo**](https://huggingface.co/spaces/RED-AIGC/SVD-TDD) **|** [🤗 **FLUX-TDD Demo**](https://huggingface.co/spaces/RED-AIGC/FLUX-TDD-BETA)

</div>

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

<div align="center">
  <img src="assets/teaser.jpg" alt="teaser" style="zoom:80%;" />

  Samples generated by TDD-distilled SDXL, with only 4--8 steps.
</div>

## Usage FLUX
```python
from huggingface_hub import hf_hub_download
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.load_lora_weights(hf_hub_download("RED-AIGC/TDD", "TDD-FLUX.1-dev-lora-beta.safetensors"))
pipe.fuse_lora(lora_scale=0.125)
pipe.to("cuda")

image_flux = pipe(
    prompt=[prompt],
    generator=torch.Generator().manual_seed(int(3413)),
    num_inference_steps=8,
    guidance_scale=2.0,
    height=1024,
    width=1024,
    max_sequence_length=256
).images[0]
```

## Usage SDXL
You can directly download the model in this repository.
You also can download the model in python script:

```python
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="RedAIGC/TDD", filename="sdxl_tdd_lora_weights.safetensors", local_dir="./tdd_lora")
```

```python
# !pip install opencv-python transformers accelerate 
import torch
import diffusers
from diffusers import StableDiffusionXLPipeline
from tdd_scheduler import TDDScheduler

device = "cuda"
tdd_lora_path = "tdd_lora/sdxl_tdd_lora_weights.safetensors"

pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16").to(device)

pipe.scheduler = TDDSchedulerPlus.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tdd_lora_path, adapter_name="accelerate")
pipe.fuse_lora()

prompt = "A photo of a cat made of water."

image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=1.7,
    eta=0.2, 
    generator=torch.Generator(device=device).manual_seed(546237),
).images[0]

image.save("tdd.png")
```

## Update
[2024.09.20]:Upload the TDD LoRA weights of FLUX-TDD-BETA(4-8-steps) 
[2024.08.25]:Upload the TDD LoRA weights of SVD
[2024.08.22]:Upload the TDD LoRA weights of Stable Diffusion XL, YamerMIX and RealVisXL-V4.0, fast text-to-image generation.
- sdxl_tdd_lora_weights.safetensors
- yamermix_tdd_lora_weights.safetensors
- realvis_tdd_sdxl_lora_weights.safetensors

Thanks to [Yamer](https://civitai.com/user/Yamer) and [SG_161222](https://civitai.com/user/SG_161222) for developing [YamerMIX](https://civitai.com/models/84040?modelVersionId=395107) and [RealVisXL V4.0](https://civitai.com/models/139562/realvisxl-v40) respectively.


## Introduction

Target-Driven Distillation (TDD) features three key designs, that differ from previous consistency distillation methods.
1. **TDD adopts a delicate selection strategy of target timesteps, increasing the training efficiency.** Specifically, it first chooses from a predefined set of equidistant denoising schedules (*e.g.* 4--8 steps), then adds a stochatic offset to accomodate non-deterministic sampling (*e.g.* $\gamma$-sampling).
2. **TDD utilizes decoupled guidances during training, making itself open to post-tuning on guidance scale during inference periods.** Specifically, it replaces a portion of the text conditions with unconditional (*i.e.* empty) prompts, in order to align with the standard training process using CFG.
3. **TDD can be optionally equipped with non-equidistant sampling and x0 clipping, enabling a more flexible and accurate way for image sampling.**

<div align="center">
  <img src="assets/tdd_overview.jpg" alt="overview"/>

  An overview of TDD. (a) The training process features target timestep selection and decoupled guidance. (b) The inference process can optionally adopt non-equidistant denoising schedules.
</div>

<div align="center">
  <img src="assets/compare.png" alt="comparison" style="zoom:80%;" />

  Samples generated by SDXL models distilled by mainstream consistency distillation methods LCM, PCM, TCD, and our TDD, from the same seeds. Our method demonstrates advantages in both image complexity and clarity.
</div>

<div align="center">
  <img src="assets/other_1.jpg" alt="other"/>

  Samples generated by TDD-distilled different base models, and by SDXL with different LoRA adapters or ControlNets.  
</div>


<div align="center">
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/668767587bc84f3fe658fada/y7_uIXQOWK8AywASxcvbU.mp4"></video>
  
  Video samples generated by AnimateLCM-distilled (top) and TDD-distilled (bottom) SVD-xt 1.1, also with 4--8 steps.
</div>