madebyollin
commited on
Commit
•
81a553d
1
Parent(s):
7a79e2c
Update README.md
Browse files
README.md
CHANGED
@@ -5,26 +5,78 @@ library_name: diffusers
|
|
5 |
|
6 |
# Stage-A-ft-HQ
|
7 |
|
8 |
-
`stage-a-ft-hq` is a version of [Würstchen](https://huggingface.co/warp-ai/wuerstchen)'s **Stage A** that was finetuned to
|
|
|
9 |
`stage-a-ft-hq` works with any Würstchen-derived model (including [Stable Cascade](https://huggingface.co/stabilityai/stable-cascade)).
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
## 🧨 Diffusers Usage
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
```py
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
```
|
18 |
|
19 |
## Explanation
|
20 |
|
21 |
-
Image generators like Würstchen and Stable Cascade create images via a multi-stage process.
|
22 |
Stage A is the ultimate stage, responsible for rendering out full-resolution, human-interpretable images (based on the output from prior stages).
|
23 |
|
24 |
-
The original Stage A tends to render slightly-smoothed-out images with a distinctive
|
25 |
|
26 |
-
`stage-a-ft-hq` was finetuned on a high-quality dataset in order to
|
27 |
|
28 |
-
##
|
29 |
|
30 |
-
To generate highly detailed images, you probably want to use `stage-a-ft-hq` (which improves very fine detail) in combination with a large Stage B step count (which improves mid-level detail).
|
|
|
5 |
|
6 |
# Stage-A-ft-HQ
|
7 |
|
8 |
+
`stage-a-ft-hq` is a version of [Würstchen](https://huggingface.co/warp-ai/wuerstchen)'s **Stage A** that was finetuned to have slightly-nicer-looking textures.
|
9 |
+
|
10 |
`stage-a-ft-hq` works with any Würstchen-derived model (including [Stable Cascade](https://huggingface.co/stabilityai/stable-cascade)).
|
11 |
|
12 |
+
## Example comparison
|
13 |
+
|
14 |
+
| Stable Cascade | Stable Cascade + `stage-a-ft-hq` |
|
15 |
+
| --------------------------------- | ---------------------------------- |
|
16 |
+
| ![](example_baseline.png) | ![](example_finetuned.png) |
|
17 |
+
| ![](example_baseline_closeup.png) | ![](example_finetuned_closeup.png) |
|
18 |
+
|
19 |
|
20 |
## 🧨 Diffusers Usage
|
21 |
|
22 |
+
⚠️ As of 2024-02-17, Stable Cascade's [PR](https://github.com/huggingface/diffusers/pull/6487) is still under review.
|
23 |
+
I've only confirmed Stable Cascade working with this particular version of the PR:
|
24 |
+
```bash
|
25 |
+
pip install --upgrade --force-reinstall https://github.com/kashif/diffusers/archive/a3dc21385b7386beb3dab3a9845962ede6765887.zip
|
26 |
+
```
|
27 |
+
|
28 |
```py
|
29 |
+
import torch
|
30 |
+
|
31 |
+
# Load the Stage-A-ft-HQ model
|
32 |
+
from diffusers.pipelines.wuerstchen import PaellaVQModel
|
33 |
+
stage_a_ft_hq = PaellaVQModel.from_pretrained("madebyollin/stage_a_ft_hq", torch_dtype=torch.float16)
|
34 |
+
|
35 |
+
# Load the normal Stable Cascade pipeline
|
36 |
+
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
|
37 |
+
|
38 |
+
device = "cuda"
|
39 |
+
num_images_per_prompt = 2
|
40 |
+
|
41 |
+
prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
|
42 |
+
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)
|
43 |
+
|
44 |
+
# Swap in the Stage-A-ft-HQ model
|
45 |
+
decoder.vqgan = stage_a_ft_hq
|
46 |
+
|
47 |
+
prompt = "Anthropomorphic cat dressed as a pilot"
|
48 |
+
negative_prompt = ""
|
49 |
+
|
50 |
+
prior_output = prior(
|
51 |
+
prompt=prompt,
|
52 |
+
height=1024,
|
53 |
+
width=1024,
|
54 |
+
negative_prompt=negative_prompt,
|
55 |
+
guidance_scale=4.0,
|
56 |
+
num_images_per_prompt=num_images_per_prompt,
|
57 |
+
num_inference_steps=20
|
58 |
+
)
|
59 |
+
decoder_output = decoder(
|
60 |
+
image_embeddings=prior_output.image_embeddings.half(),
|
61 |
+
prompt=prompt,
|
62 |
+
negative_prompt=negative_prompt,
|
63 |
+
guidance_scale=0.0,
|
64 |
+
output_type="pil",
|
65 |
+
num_inference_steps=10
|
66 |
+
).images
|
67 |
+
|
68 |
+
display(decoder_output[0])
|
69 |
```
|
70 |
|
71 |
## Explanation
|
72 |
|
73 |
+
Image generators like Würstchen and Stable Cascade create images via a multi-stage process.
|
74 |
Stage A is the ultimate stage, responsible for rendering out full-resolution, human-interpretable images (based on the output from prior stages).
|
75 |
|
76 |
+
The original Stage A tends to render slightly-smoothed-out images with a distinctive noise pattern on top.
|
77 |
|
78 |
+
`stage-a-ft-hq` was finetuned briefly on a high-quality dataset in order to reduce these artifacts.
|
79 |
|
80 |
+
## Suggested Settings
|
81 |
|
82 |
+
To generate highly detailed images, you probably want to use `stage-a-ft-hq` (which improves very fine detail) in combination with a large Stage B step count (which [improves mid-level detail](https://old.reddit.com/r/StableDiffusion/comments/1ar359h/cascade_can_generate_directly_at_1536x1536_and/kqhjtk5/)).
|