Ketengan-Diffusion commited on
Commit
e697237
1 Parent(s): 21c2c7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md CHANGED
@@ -2,4 +2,115 @@
2
  license: other
3
  license_name: stable-cascade-nc-community
4
  license_link: https://huggingface.co/stabilityai/stable-cascade/blob/main/LICENSE
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  license_name: stable-cascade-nc-community
4
  license_link: https://huggingface.co/stabilityai/stable-cascade/blob/main/LICENSE
5
+ language:
6
+ - en
7
+ tags:
8
+ - stable-cascade
9
+ - SDXL
10
+ - art
11
+ - artstyle
12
+ - fantasy
13
+ - anime
14
+ - aiart
15
+ - ketengan
16
+ - SomniumSC
17
+ pipeline_tag: text-to-image
18
+ library_name: diffusers
19
  ---
20
+
21
+ # SomniumSC-v1,1 Model Showcase
22
+ <p align="center">
23
+ <img src="01.png" width=70% height=70%>
24
+ </p>
25
+
26
+ `Ketengan-Diffusion/SomniumSC-v1.1` is a fine tuned stage C Stable Cascade model [stabilityai/stable-cascade](https://huggingface.co/stabilityai/stable-cascade).
27
+
28
+ A fine-tuned model from all new stabilityAI model, Stable Cascade (Or we could say Würstchen v3) with a 2D (cartoonish) style is trained at Stage C 3.6B model. This model also trains the text encoder to generate a 2D style, so this model not only could generate using booru tag prompt, also you can use the natural language.
29
+
30
+ The model uses same amount and method of AnySomniumXL v2 used which has 33,000+ curated images from hundreds of thousands of images from various sources. The dataset is built by saving images that have an aesthetic score of at least 19 and a maximum of 50 (to maintain the cartoonish model and not too realistic. The scale is based on our proprietary aesthetic scoring mechanism), and do not have text and watermarks such as signatures or comic/manga images. Thus, images that have an aesthetic score of less than 17 and more than 50 will be discarded, as well as images that have watermarks or text will be discarded.
31
+
32
+ # Demo
33
+
34
+ # Training Process
35
+
36
+ SomniumSC v1.1 Technical Specifications:
37
+
38
+ Training per 1 Epoch 30 Epoch (Results from SomniumSC using Epoch 40)
39
+
40
+ Captioned by proprietary multimodal LLM, better than LLaVA
41
+
42
+ Trained with a bucket size of 1024x1024; 1536x1536 (Multi Resoutin)
43
+
44
+ Shuffle Caption: Yes
45
+
46
+ Clip Skip: 0
47
+
48
+ Trained with 1x NVIDIA A100 80GB
49
+
50
+
51
+ # Our Dataset Process Curation
52
+ <p align="center">
53
+ <img src="Curation.png" width=70% height=70%>
54
+ </p>
55
+
56
+ Image source: [Source1](https://danbooru.donmai.us/posts/3143351) [Source2](https://danbooru.donmai.us/posts/3272710) [Source3](https://danbooru.donmai.us/posts/3320417)
57
+
58
+ Our dataset is scored using Pretrained CLIP+MLP Aesthetic Scoring model by https://github.com/christophschuhmann/improved-aesthetic-predictor, and We made adjusment into our script to detecting any text or watermark by utilizing OCR by pytesseract
59
+
60
+ This scoring method has scale between -1-100, we take the score threshold around 17 or 20 as minimum and 50-75 as maximum to pretain the 2D style of the dataset, Any images with text will returning -1 score. So any images with score below 17 or above 65 is deleted
61
+
62
+ The dataset curation proccess is using Nvidia T4 16GB Machine and takes about 7 days for curating 1.000.000 images.
63
+
64
+ # Captioning process
65
+ We using combination of proprietary Multimodal LLM and open source multimodal LLM such as LLaVa 1.5 as the captioning process which is resulting more complex result than using normal BLIP2. Any detail like the clothes, atmosphere, situation, scene, place, gender, skin, and others is generated by LLM.
66
+
67
+ # Tagging Process
68
+ We simply using booru tags, that retrieved from booru boards so this could be tagged by manually by human hence make this tags more accurate.
69
+
70
+ # Limitations:
71
+
72
+ ✓ Still requires broader dataset training for more variation of poses and style
73
+
74
+ ✓ Text cannot generated correctly, and seems ruined
75
+
76
+ ✓ This optimized for human or mutated human generation. Non human like SCP, Ponies, and more maybe could resulting not what you expecting
77
+
78
+ ✓ The faces maybe looks compressed. Generate the image at 1536px could be better
79
+
80
+ Smaller half size and stable cascade lite version will be released soon
81
+
82
+ # How to use SomniumSC:
83
+
84
+ Currently Stable Cascade only supported by ComfyUI.
85
+
86
+ Currently Stable Cascade only supported by ComfyUI.
87
+
88
+ You can use tutorial in [here](https://gist.github.com/comfyanonymous/0f09119a342d0dd825bb2d99d19b781c#file-stable_cascade_workflow_test-json) or [here](https://medium.com/@codeandbird/run-new-stable-cascade-model-in-comfyui-now-officially-supported-f66a37e9a8ad)
89
+
90
+ To simplify which model should you download, I will provide you the where's to download model directly
91
+
92
+ For stage A you can download from [Official stabilityai/stable-cascade repo](https://huggingface.co/stabilityai/stable-cascade).
93
+
94
+ For stage B you can download from [Official stabilityai/stable-cascade repo](https://huggingface.co/stabilityai/stable-cascade).
95
+
96
+ For stage C you can download the safetensors on huggingface repo that you find on files tab
97
+
98
+ And the text encoder you download from our huggingface repo on text_encoder folder
99
+
100
+ # Deplying SomniumSC v1.1 with Diffusers 🧨
101
+
102
+ Coming Soon
103
+
104
+
105
+ # SomniumSC Pro tips:
106
+
107
+ Negative prompt is a must to get better quality output. The recommended negative prompt is lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
108
+
109
+ If the model producing pointy ears on the character, just add elf or pointy ears.
110
+
111
+ If the model producing "Compressed Face" use 1536px resolution, so the model can produce the face clearly.
112
+
113
+
114
+ # Disclaimer:
115
+
116
+ This model is under STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE. Which this model cannot be sold, and the derivative works cannot be commercialized. Except As far as I know, you can buy the membership of StabilityAI here To commercialize your derivative works based on this model. Please support StabilityAI, so they can always provide open source model for us. But still you can merge our model freely