adds shizuko LoRA
Browse files- .gitignore +1 -0
- shizuko/README.md +49 -0
- shizuko/chara-shizuko.png +3 -0
- shizuko/chara-shizuko.safetensors +3 -0
- shizuko/example-001-7thA.png +3 -0
- shizuko/example-001-AOM2NMM.png +3 -0
- shizuko/example-001-NAI.png +3 -0
- shizuko/example-001-NMM.png +3 -0
- shizuko/example-002-7thA.png +3 -0
- shizuko/example-002-AOM2NMM.png +3 -0
- shizuko/example-002-NAI.png +3 -0
- shizuko/example-002-NMM.png +3 -0
- shizuko/example-003-7thA.png +3 -0
- shizuko/example-003-AOM2NMM.png +3 -0
- shizuko/example-003-NAI.png +3 -0
- shizuko/example-003-NMM.png +3 -0
- sora/lora_chara_sora_v3_128i12r.json → shizuko/lora_chara_shizuko_v9_8r-11r.json +18 -14
- shizuko/useless training notes.md +82 -0
.gitignore
CHANGED
@@ -1 +1,2 @@
|
|
1 |
**/dataset/
|
|
|
|
1 |
**/dataset/
|
2 |
+
**/ignore/
|
shizuko/README.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Kawawa Shizuko
|
2 |
+
|
3 |
+
## Table of Contents
|
4 |
+
- [Preview](#preview)
|
5 |
+
- [Usage](#usage)
|
6 |
+
- [Training](#training)
|
7 |
+
|
8 |
+
## Preview
|
9 |
+
|
10 |
+
![Shizuko Portrait](chara-shizuko.png)
|
11 |
+
![Shizuko Normal Outfit](example-002-AOM2NMM.png)
|
12 |
+
![Shizuko Swimsuit Outfit](example-001-AOM2NMM.png)
|
13 |
+
|
14 |
+
I spent a lot longer than usual on this one trying to mess with batch sizes, learning rates, and network dimension/alpha. I didn't really come out with any conclusive findings and ended up going back to settings similar to the Koharu LoRA.
|
15 |
+
|
16 |
+
## Usage
|
17 |
+
|
18 |
+
Use any or all of the following tags to summon Shizuko: `shizuko, halo, 1girl, small breasts, purple eyes, brown hair`
|
19 |
+
- Hair and eye tags are optional.
|
20 |
+
|
21 |
+
For her normal outfit: `two side up, wa maid, japanese clothes, pink kimono, apron, black skirt, white thighhighs, hair ribbon`
|
22 |
+
|
23 |
+
For her swimsuit outfit: `twintails, swimsuit, pink bikini, frilled bikini, frills, hair flower, fake animal ears`
|
24 |
+
|
25 |
+
Not all tags may be necessary.
|
26 |
+
|
27 |
+
Her summer alt's cat ears tend to leak into other outfits, but can usually be fixed with negative `fake animal ears`.
|
28 |
+
|
29 |
+
Her shotgun sling tends to show up on her normal outfit and is a little hard to get rid of, unfortuntaely. You can try `gun sling` or `strap` in the negative prompt. If I were retraining this, I would go back and tag the sling in all images and see if that makes it easier to remove. Maybe another time.
|
30 |
+
|
31 |
+
## Training
|
32 |
+
*Exact parameters are provided in the accompanying JSON files.*
|
33 |
+
- Trained on a set of 128 images; 88 swimsuit, 37 normal.
|
34 |
+
- Datset included a mixture of SFW/NSFW
|
35 |
+
- 11 repeats for normal outfit
|
36 |
+
- 8 repeats for swimsuit outfit
|
37 |
+
- 3 batch size, 4 epochs
|
38 |
+
- `(88*8 + 37*11) / 3 * 4` = 1482 steps
|
39 |
+
- Due to a change in the kohya GUI script, a few of my previous LoRAs (Mari, Michiru, Reisa, Sora, Chise) were accidentally trained without my painstakingly pruned tags. This is probably why they seem overfit to the characters' outfits, though the results were surprisingly good considering there were literally no tags.
|
40 |
+
- Once I found this issue, I figured since I'd have to "re-learn" some of my settings anyway to account for proper captions, I may as well experiment with batch/dim/alpha/LR valuee. Unfortunately, no conclusive results. Ended up going back to tried and true settings from several LoRAs ago.
|
41 |
+
- `constant_with_warmup` scheduler instead of `cosine` since it seems to train in fewer steps, at the cost of being more finnicky
|
42 |
+
- 1.5e-5 text encoder LR
|
43 |
+
- 1.5e-4 unet LR
|
44 |
+
- 1.5e-5 optimizer LR though in my experience this makes very little difference if the above two are already set
|
45 |
+
- Initially tagged with WD1.4 swinv2 model. Tags minimally pruned/edited.
|
46 |
+
- Removed `blue archive` from tags. I think it just adds noise.
|
47 |
+
- `keep_tokens` accidentally set to 3. This means it probably usually kept `shizuko, 1girl` and some other random tag.
|
48 |
+
- Used network_dimension 128 (same as usual) / network alpha 128 (default)
|
49 |
+
- Trained without VAE.
|
shizuko/chara-shizuko.png
ADDED
Git LFS Details
|
shizuko/chara-shizuko.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:95cc1165921fd4c24087acf18043a795ebee6ccace70ea983452883791bf8d31
|
3 |
+
size 151122159
|
shizuko/example-001-7thA.png
ADDED
Git LFS Details
|
shizuko/example-001-AOM2NMM.png
ADDED
Git LFS Details
|
shizuko/example-001-NAI.png
ADDED
Git LFS Details
|
shizuko/example-001-NMM.png
ADDED
Git LFS Details
|
shizuko/example-002-7thA.png
ADDED
Git LFS Details
|
shizuko/example-002-AOM2NMM.png
ADDED
Git LFS Details
|
shizuko/example-002-NAI.png
ADDED
Git LFS Details
|
shizuko/example-002-NMM.png
ADDED
Git LFS Details
|
shizuko/example-003-7thA.png
ADDED
Git LFS Details
|
shizuko/example-003-AOM2NMM.png
ADDED
Git LFS Details
|
shizuko/example-003-NAI.png
ADDED
Git LFS Details
|
shizuko/example-003-NMM.png
ADDED
Git LFS Details
|
sora/lora_chara_sora_v3_128i12r.json → shizuko/lora_chara_shizuko_v9_8r-11r.json
RENAMED
@@ -3,22 +3,22 @@
|
|
3 |
"v2": false,
|
4 |
"v_parameterization": false,
|
5 |
"logging_dir": "",
|
6 |
-
"train_data_dir": "G:/sd/training/datasets/
|
7 |
-
"reg_data_dir": "
|
8 |
-
"output_dir": "G:/sd/lora/trained/chara/
|
9 |
-
"max_resolution": "
|
10 |
-
"learning_rate": "
|
11 |
-
"lr_scheduler": "
|
12 |
"lr_warmup": "5",
|
13 |
"train_batch_size": 3,
|
14 |
-
"epoch": "
|
15 |
-
"save_every_n_epochs": "
|
16 |
"mixed_precision": "fp16",
|
17 |
"save_precision": "fp16",
|
18 |
"seed": "31337",
|
19 |
"num_cpu_threads_per_process": 32,
|
20 |
"cache_latents": true,
|
21 |
-
"caption_extension": "",
|
22 |
"enable_bucket": true,
|
23 |
"gradient_checkpointing": false,
|
24 |
"full_fp16": false,
|
@@ -31,8 +31,8 @@
|
|
31 |
"save_state": false,
|
32 |
"resume": "",
|
33 |
"prior_loss_weight": 1.0,
|
34 |
-
"text_encoder_lr": "1.5e-
|
35 |
-
"unet_lr": "1.5e-
|
36 |
"network_dim": 128,
|
37 |
"lora_network_weights": "",
|
38 |
"color_aug": false,
|
@@ -40,11 +40,15 @@
|
|
40 |
"clip_skip": 2,
|
41 |
"gradient_accumulation_steps": 1.0,
|
42 |
"mem_eff_attn": false,
|
43 |
-
"output_name": "chara-
|
44 |
"model_list": "",
|
45 |
"max_token_length": "150",
|
46 |
"max_train_epochs": "",
|
47 |
"max_data_loader_n_workers": "",
|
48 |
-
"network_alpha":
|
49 |
-
"training_comment": ""
|
|
|
|
|
|
|
|
|
50 |
}
|
|
|
3 |
"v2": false,
|
4 |
"v_parameterization": false,
|
5 |
"logging_dir": "",
|
6 |
+
"train_data_dir": "G:/sd/training/datasets/shizuko/dataset",
|
7 |
+
"reg_data_dir": "",
|
8 |
+
"output_dir": "G:/sd/lora/trained/chara/shizuko",
|
9 |
+
"max_resolution": "768,768",
|
10 |
+
"learning_rate": "1e-5",
|
11 |
+
"lr_scheduler": "constant_with_warmup",
|
12 |
"lr_warmup": "5",
|
13 |
"train_batch_size": 3,
|
14 |
+
"epoch": "4",
|
15 |
+
"save_every_n_epochs": "",
|
16 |
"mixed_precision": "fp16",
|
17 |
"save_precision": "fp16",
|
18 |
"seed": "31337",
|
19 |
"num_cpu_threads_per_process": 32,
|
20 |
"cache_latents": true,
|
21 |
+
"caption_extension": ".txt",
|
22 |
"enable_bucket": true,
|
23 |
"gradient_checkpointing": false,
|
24 |
"full_fp16": false,
|
|
|
31 |
"save_state": false,
|
32 |
"resume": "",
|
33 |
"prior_loss_weight": 1.0,
|
34 |
+
"text_encoder_lr": "1.5e-5",
|
35 |
+
"unet_lr": "1.5e-4",
|
36 |
"network_dim": 128,
|
37 |
"lora_network_weights": "",
|
38 |
"color_aug": false,
|
|
|
40 |
"clip_skip": 2,
|
41 |
"gradient_accumulation_steps": 1.0,
|
42 |
"mem_eff_attn": false,
|
43 |
+
"output_name": "chara-shizuko-v9",
|
44 |
"model_list": "",
|
45 |
"max_token_length": "150",
|
46 |
"max_train_epochs": "",
|
47 |
"max_data_loader_n_workers": "",
|
48 |
+
"network_alpha": 128,
|
49 |
+
"training_comment": "",
|
50 |
+
"keep_tokens": 3,
|
51 |
+
"lr_scheduler_num_cycles": "",
|
52 |
+
"lr_scheduler_power": "",
|
53 |
+
"persistent_data_loader_workers": true
|
54 |
}
|
shizuko/useless training notes.md
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Revisions
|
2 |
+
|
3 |
+
I wouldn't regard any of the notes below as especially useful. I was mostly throwing shit at the wall and then recording what happened while likely drawing incorrect conclusions.
|
4 |
+
|
5 |
+
### v6
|
6 |
+
- 1343 steps total
|
7 |
+
- 90 images * 20 repeats, 38 images * 26 repeats
|
8 |
+
- 5 batch size
|
9 |
+
- 3 epochs
|
10 |
+
- 4e-4 adamw lr / unet lr (8e-5 * batch size)
|
11 |
+
- 7.5e-5 text encoder lr (1e-5 * batch size)
|
12 |
+
- 768x768 training resolution
|
13 |
+
- network rank 64, alpha 64
|
14 |
+
Result: 22m13s, 0.0976 loss.
|
15 |
+
Pretty good coherency, adapts well to prompt.
|
16 |
+
Very slight color fringing and noise at high cfg. Epoch 2 eliminates this, but loses a little bit of the character's face and causes occasional body horror as coherency is lost.
|
17 |
+
Optimal result would probably be around epoch 2.5.
|
18 |
+
|
19 |
+
### v6b
|
20 |
+
- Exactly the same as v6, but at 512x512 resolution instead of 768x768.
|
21 |
+
Result: 8m49s, 0.115 loss.
|
22 |
+
Substantially improved iamge quality -- noise and color fringing are almost completely gone. However, coherency seems noticably worse; certain fine character details are lost (perhaps due to them becoming too small to be represented at 512x512) and body horror/extra limbs are more common. It almost seems undertrained, but I would have expected the opposite given the step count is the same as v6 for a lower resolution.
|
23 |
+
Not sure how to maintain the improved image quality from v6b while keeping the coherency of v6.
|
24 |
+
|
25 |
+
### v7
|
26 |
+
Went back to old Koharu hyperparameters to reestablish a baseline since we're changing too many things at once and it's hard to tell what's causing what. Koharu params are slow to train and result in large models, but otherwise seem to work well.
|
27 |
+
- 1138 steps total
|
28 |
+
- 90 images * 8 repeats, 38 images * 11 repeats
|
29 |
+
- 3 batch size
|
30 |
+
- 3 epochs
|
31 |
+
- 2e-4 adamw lr / unet lr
|
32 |
+
- 5e-5 text encoder lr
|
33 |
+
- 832x832 training resolution
|
34 |
+
- network rank 128, alpha 128
|
35 |
+
Result: 18m46s, 0.102 loss.
|
36 |
+
Still very slight fringing and noise but not as bad as v6. Coherency is good, but marginally worse than v6. Second-to-last epoch is totally undertrained and misses most character details, so while the last epoch is better I suspect it is on the verge of underfitting; some of the character's more subtle details and expressions don't quite make it through.
|
37 |
+
I think increasing unet LR to improve fit while slightly reducing step count may be the best option here to reduce color fringing and training time while
|
38 |
+
|
39 |
+
Omitting several not-very-good revisions here.
|
40 |
+
|
41 |
+
### v8b
|
42 |
+
- 1138 steps total
|
43 |
+
- 90 images * 8 repeats, 38 images * 11 repeats
|
44 |
+
- 3 batch size
|
45 |
+
- 3 epochs
|
46 |
+
- 1.5e-4 adamw lr / unet lr
|
47 |
+
- 1.5e-5 text encoder lr
|
48 |
+
- 768x768 training resolution
|
49 |
+
- network rank 128, alpha 128
|
50 |
+
- Modified image captions to always include the character's name as the first word
|
51 |
+
Result: 15m40s, 0.0952 loss.
|
52 |
+
Image quality is good. Coherency is good. Probably the most usable result so far, but it is slightly undertrained and needs its weight boosted during inference to get the best results. Close to what I would consider a final result, but I think I can do better.
|
53 |
+
|
54 |
+
### v8c
|
55 |
+
- 1236 steps total
|
56 |
+
- 88 images * 9 repeats, 38 images * 12 repeats
|
57 |
+
- 3 batch size
|
58 |
+
- 3 epochs
|
59 |
+
- Removed some images which were too similar to each other
|
60 |
+
- Added three new images which demonstrate a wider range of expressions
|
61 |
+
- Removed series name from image captions since it probably just adds noise
|
62 |
+
- 3e-4 adamw lr / unet lr
|
63 |
+
- 2e-5 text encoder lr
|
64 |
+
- Boosted learning rates in an attempt to improve fit
|
65 |
+
- 768x768 training resolution
|
66 |
+
- network rank 128, alpha 128
|
67 |
+
- keep_tokens revised to 2, it was incorrectly set to 3 in v8b though it probably makes no difference
|
68 |
+
Result: 18m02, 0.0934 loss.
|
69 |
+
Immediately it's clear that the combination of boosted learning rates and 8% increase in step count is too much. The model is overfitting and coherence is lost; extra limbs and weird deformations are too common. However it doesn't exhibit the color fringing and noise issues from earlier overtrained models so I think the issue is the learning rates are too high, not the step count.
|
70 |
+
Setting the model's weight to 0.8 during inference gives quite good results, nailing the character's face and expressions while still having a good amount of coherency. Ideally it would work this well at 1.0 weight, but if I can't find the right parameters to get that I'll just release this one with a note that the weight should be set to 0.8.
|
71 |
+
|
72 |
+
### v8d
|
73 |
+
Same params as v8c except for with text encoder LR, reduced from 2e-5 to 1e-5.
|
74 |
+
Result: didn't work as well as I'd hoped. Just seems messier. Not sure if that's due to the 50% reduction in text encoder LR or just random variance. RNG seed was the same but I believe the token shuffler doesn't use the seed it introduces additional randomness.
|
75 |
+
|
76 |
+
### v8e
|
77 |
+
Same params as v8c except for with text encoder LR, reduced from 2e-5 to 1.6e-5 (80% of v8c's learning rate to try to reproduce the effect of v8c with 0.8 weight)
|
78 |
+
Result: shit's fucked, going back to v8b and just increasing the step count.
|
79 |
+
|
80 |
+
### v9
|
81 |
+
v8b with an extra epoch and the slightly tweaked dataset from v8c.
|
82 |
+
Result: looks pretty good, going with this.
|