D-Adaptation Experiment Notes

using decouple=true and weight_decay=0.6 in optimizer args.

Learning rates

Unet 1, text 0.5 as seen in thread: https://twitter.com/kohya_tech/status/1627194651034943490?cxt=HHwWhIDUtb66-pQtAAAA This rate did not work well for 768 training and for 640 required adjustments to scheduler to work better. 0.5/0.25 unet/text worked on 512 resolution and 768.

Alpha

Alpha=Dim was recommended in the github thread https://github.com/kohya-ss/sd-scripts/issues/181 I have tried dim 8 alpha 1 with success as well as failure. Both Amber and Castoria are alpha=1 and seem to work fine. UMP ends up with image generations that look like a single brown square, still testing if alpha has a relationship to this issue. As noted in the same github issue, alpha/rank scaling modifies the gradient update to become smaller and thus d-adaptation to boost the learning rate. This could be the reason why it goes bad.

UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence. After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better. Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart. dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
This was further confirmed by running dim 8 alpha 1 with constant learning scheduler. The results were similar to high restart count with cosine.

Dim

128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount. Lower dims show good performance. Need much larger test to check for accuracy between them. Dim 1 worked! Very good accuracy on the details too. See Amber's pattern on her stomach, it is very clean compared to higher dims. The issue seems to be that this doesn't work too well wiith some style loras. More investigation needed.

Resolution

512 resolution is good so far 700 steps. 640 resolution is good. 900 steps. UMP has 640 resolution sample. 768 resolution has not been doing well with the 1.0/0.5 LR, it melts. 0.5/0.25 LR worked out great with 750 steps. 512 resolution held up well at 0.5/0.25 as well so this is looking like a better universal rate.

Steps

600 to 800 steps have been the primary focus of experimentation. 350 is too low to fully learn a charater. Over 1000 has not shown much improvement.

2.X models

Lora training so far on wd1.5, replicant and subtly have shown poor performance when used on another model. See sample in amber. Notably, replicant is highly stylized and the trained lora from replicant when used on replicant shows extreme deviation away from the art style of replicant which suggests that the lora learned a lot of style related concepts, the opposite of what we want for character. The initial set of trained loras showed better usability at lower strengths which is leading to continued research in training for longer and at lower learning rates. It was noted that vprediction finetuning required lower learning rates and that might apply to lora training as well.

We can see that the 1.X based models are a lot similar to one another allowing lora's to transfer well between them. Similarity between models using JosephCheung's tool. Thanks qromaiko for running and bringing this up.

99.95% - Anything\Anything-V3.0-ema-pruned.safetensors [2ea31c17]
98.25% - Anything\Anything-V3.0-pruned.safetensors [2ea31c17]
97.57% - 7th\7th_anime_v3_C-fp16-fix.safetensors [db1dd94e]
95.41% - Elysium\Elysium_Anime_V3.safetensors [1a97f4ef]
95.36% - Orange\AOM3A1.safetensors [9600da17]
94.79% - Orange\AOM2_Hard-fp16-fix.safetensors [05e43f1e]
94.74% - Orange\AOM2_sfw-fp16.safetensors [9600da17]
94.70% - Orange\AOM3.safetensors [9600da17]

100.00% - wd15-beta1-fp16.safetensors [0b910e4b]
99.90% - Aikimi_dV3.0.safetensors [0b910e4b]
93.14% - subtly-fp16.safetensors [a2fa5a65]
82.11% - Replicant-V1.0_fp16.safetensors [18007027]

Noise offset.

toyxyz has noted that using high noise offset (higher than 0.2 it seems) with d-adaptation creates unusable results. It starts looking better than lower learning rates but even unet 0.5, text 0.25 with noise offset of 0.75 does not look usable.