pipeline_tag: text-to-image
license: other
license_name: faipl-1.0-sd
license_link: LICENSE
base_model: stabilityai/stable-cascade
tags:
- text-to-image
- anime
library_name: diffusers
language: en
inference: false
decoder: Disty0/sotediffusion-wuerstchen3-decoder
new_version: Disty0/sotediffusion-v2
New verison is available: https://huggingface.co/Disty0/sotediffusion-v2
SoteDiffusion Wuerstchen3
Anime finetune of Würstchen V3.
Release Notes
- This release is sponsored by fal.ai/grants
- Trained on 6M images for 3 epochs using 8x A100 80G GPUs.
API Usage
This model can be used via API with Fal.AI
For more details: https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion
UI Guide
SD.Next
URL: https://github.com/vladmandic/automatic/
Go to Models -> Huggingface and type Disty0/sotediffusion-wuerstchen3-decoder
into the model name and press download.
Load Disty0/sotediffusion-wuerstchen3-decoder
after the download process is complete.
Prompt:
newest, extremely aesthetic, best quality,
Negative Prompt:
very displeasing, worst quality, monochrome, realistic, oldest, loli,
Parameters:
Sampler: Default
Steps: 30 or 40
Refiner Steps: 10
CFG: 7
Secondary CFG: 2 or 1
Resolution: 1024x1536, 2048x1152
Anything works as long as it's a multiply of 128.
ComfyUI
Please refer to CivitAI: https://civitai.com/models/353284
Code Example
pip install diffusers
import torch
from diffusers import StableCascadeCombinedPipeline
device = "cuda"
dtype = torch.bfloat16 # or torch.float16
model = "Disty0/sotediffusion-wuerstchen3-decoder"
pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype)
# send everything to the gpu:
pipe = pipe.to(device, dtype=dtype)
pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)
# or enable model offload to save vram:
# pipe.enable_model_cpu_offload()
prompt = "newest, extremely aesthetic, best quality, 1girl, solo, cat ears, pink hair, orange eyes, long hair, bare shoulders, looking at viewer, smile, indoors, casual, living room, playing guitar,"
negative_prompt = "very displeasing, worst quality, monochrome, realistic, oldest, loli,"
output = pipe(
width=1024,
height=1536,
prompt=prompt,
negative_prompt=negative_prompt,
decoder_guidance_scale=2.0,
prior_guidance_scale=7.0,
prior_num_inference_steps=30,
output_type="pil",
num_inference_steps=10
).images[0]
## do something with the output image
Training:
Software used: Kohya SD-Scripts with Stable Cascade branch.
https://github.com/kohya-ss/sd-scripts/tree/stable-cascade
GPU used: 8x Nvidia A100 80GB
GPU Hours: 220
Base
parameter | value |
---|---|
amp | bf16 |
weights | fp32 |
save weights | fp16 |
resolution | 1024x1024 |
effective batch size | 128 |
unet learning rate | 1e-5 |
te learning rate | 4e-6 |
optimizer | Adafactor |
images | 6M |
epochs | 3 |
Final
parameter | value |
---|---|
amp | bf16 |
weights | fp32 |
save weights | fp16 |
resolution | 1024x1024 |
effective batch size | 128 |
unet learning rate | 4e-6 |
te learning rate | none |
optimizer | Adafactor |
images | 120K |
epochs | 16 |
Dataset:
GPU used for captioning: 1x Intel ARC A770 16GB
GPU Hours: 350
Model used for captioning: SmilingWolf/wd-swinv2-tagger-v3
Model used for text: llava-hf/llava-1.5-7b-hf
Command:
python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
dataset name | total images |
---|---|
newest | 1.848.331 |
recent | 1.380.630 |
mid | 993.227 |
early | 566.152 |
oldest | 160.397 |
pixiv | 343.614 |
visual novel cg | 231.358 |
anime wallpaper | 104.790 |
Total | 5.628.499 |
Note:
- Smallest size is 1280x600 | 768.000 pixels
- Deduped based on image similarity using czkawka-cli
- Around 120K very high quality images got intentionally duplicated 5 times, making the total image count 6.2M
Tags:
Model is trained with random tag order but this is the order in the dataset if you are interested:
aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags
Date:
tag | date |
---|---|
newest | 2022 to 2024 |
recent | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Aesthetic Tags:
Model used: shadowlilac/aesthetic-shadow-v2
score greater than | tag | count |
---|---|---|
0.90 | extremely aesthetic | 125.451 |
0.80 | very aesthetic | 887.382 |
0.70 | aesthetic | 1.049.857 |
0.50 | slightly aesthetic | 1.643.091 |
0.40 | not displeasing | 569.543 |
0.30 | not aesthetic | 445.188 |
0.20 | slightly displeasing | 341.424 |
0.10 | displeasing | 237.660 |
rest of them | very displeasing | 328.712 |
Quality Tags:
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score greater than | tag | count |
---|---|---|
0.980 | best quality | 1.270.447 |
0.900 | high quality | 498.244 |
0.750 | great quality | 351.006 |
0.500 | medium quality | 366.448 |
0.250 | normal quality | 368.380 |
0.125 | bad quality | 279.050 |
0.025 | low quality | 538.958 |
rest of them | worst quality | 1.955.966 |
Rating Tags:
tag | count |
---|---|
general | 1.416.451 |
sensitive | 3.447.664 |
nsfw | 427.459 |
explicit nsfw | 336.925 |
Custom Tags:
dataset name | custom tag |
---|---|
image boards | date, |
text | The text says "text", |
characters | character, series |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Limitations and Bias
Bias
- This model is intended for anime illustrations.
Realistic capabilites are not tested at all.
Limitations
- Can fall back to realistic.
Add "realistic" tag to the negatives when this happens. - Far shot eyes and hands can be bad.
License
SoteDiffusion models falls under Fair AI Public License 1.0-SD license, which is compatible with Stable Diffusion models’ license. Key points:
- Modification Sharing: If you modify SoteDiffusion models, you must share both your changes and the original license.
- Source Code Accessibility: If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
- Distribution Terms: Any distribution must be under this license or another with similar rules.
- Compliance: Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values.
Notes: Anything not covered by Fair AI license is inherited from Stability AI Non-Commercial license which is named as LICENSE_INHERIT.