JosephusCheung
/

ACertainModel

StableDiffusionPipeline

stable-diffusion

stable-diffusion-diffusers

Inference Endpoints

Model card Files Files and versions

JosephusCheung commited on Dec 12, 2022

Commit

e2b524a

•

1 Parent(s): 4d306b5

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -17,7 +17,11 @@ widget:
 # ACertainModel
-Welcome to ACertainModel - a latent diffusion model for weebs. This model is intended to produce high-quality, highly detailed anime style with just a few prompts. Like other anime-style Stable Diffusion models, it also supports danbooru tags to generate images.
 e.g. **_masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden_**

 # ACertainModel
+Welcome to ACertainModel - a latent diffusion model for weebs. This model is intended to produce high-quality, highly detailed anime style pictures with just a few prompts. Like other anime-style Stable Diffusion models, it also supports danbooru tags, including artists, to generate images.
+Since I noticed that the laion-aesthetics introduced in the Stable-Diffusion-v-1-4 checkpoint hindered the training of the model in the animation illustration generation model, I used Dreambooth to fine-tune some tags separately to make it closer what it was in SD1.2. To avoid overfitting and possible language drift, I added a huge amount of auto-generated pictures from a single word prompt to the training set, using models that are popular in the community such as Anything-3.0, together with partially manual selected full-danbooru images within a year, for further native training. I am also aware of a method of [LoRA](https://arxiv.org/abs/2106.09685), with a similar idea, finetuning attention layer solely, to have better performance on eyes, hands, and other details.
+For copyright compliance and technical experiment, it was trained from few artist images directly. Instead, we do Dreambooth with pictures generated from several popular diffusion models in the community. The checkpoint was initialized with the weights of a Stable Diffusion Model and subsequently fine-tuned for 2K GPU hours on V100 32GB and 600 GPU hours on A100 40GB at 512P dynamic aspect ratio resolution with a certain ratio of unsupervised auto-generated images from several popular diffusion models in the community with some Textual Inversions and Hypernetworks. We do know some tricks on xformers and 8-bit optimization, but we didn't use any of them for better quality and stability. Up to 15 branches are trained simultaneously, cherry-picking about every 20,000 steps.
 e.g. **_masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden_**