Capabilities
This model is "adventure" and "fantasy" focused.
With certain inference configurations, it is capable of producing very high quality results.
This model functions better without negative prompts than most fine-tunes.
Inference parameters
Diffusers should "Just Work" with the config in this repository.
For A1111 users,
Scheduler: DDIM, 15-50 steps Generally acceptable resolutions:
- 768x768
- 1024x1024
- 1152x768
Limitations
This model contains a heavily tuned text encoder that has lost many original Stable Diffusion 2.1 concepts
This model is even less reliable at producing real people than the base 2.1-v model is
Training data included only 768x768 downsampled 1:1 ratio images, all other aspects were discarded. Ergo, this model struggles with high resolution native generations.
This model may have "burnt" outputs at higher CFG.
Checkpoints
This model contains multiple revisions:
02b28ff
(latest/main checkpoint)
30000 steps (approx 4 epochs) with terminal SNR on 22k Midjourney 5.1 images plus 7200 real photographs as balance data with complete BLIP captions on all data. BS=4, LR=4e-7 to 1e-8
6d3949c
(retrained from ptx0/pseudo-journey)
[retrained: based on ptx0/pseudo-journey @ 4000 steps from stable-diffusion-2-1 baseline on 3300 images] + 9500 steps on 22,400 images, polynomial learning rate scheduler, batch size 4, 64 gradient accumulations, FROZEN text encoder, 8bit ADAM, ZERO PLW (no regularization data), followed by 550 steps with unfrozen text encoder and constant LR 1e-8
9135a79
(original ckpt test)
13000 steps: trained from ptx0/pseudo-journey, polynomial learning rate scheduler, batch size 3, text encoder, 8bit ADAM, ZERO PLW (no regularization data)
- Downloads last month
- 548