时情 commited on
Commit
34cb4f8
1 Parent(s): c3ca3d6

upload readme

Browse files
Files changed (2) hide show
  1. README.md +183 -3
  2. diffusion_pytorch_model.safetensors +0 -3
README.md CHANGED
@@ -1,3 +1,183 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div style="display: flex; justify-content: center; align-items: center;">
2
+ <img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
3
+ <img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
4
+ </div>
5
+
6
+ EcomID aims to generate customized images from a single reference ID image, ensuring strong semantic consistency while being controlled by keypoints.
7
+
8
+ This repository provides the EcomID method and model, combining the strengths of PuLID and InstantID for better background consistency, facial keypoint control, and realistic facial representation with improved similarity.
9
+
10
+ # EcomID Overview
11
+
12
+ ## EcomID Structure
13
+ <img src="./images/overflow.png" alt="alibaba" style="width: 100%; height: auto; margin-right: 5%;">
14
+
15
+
16
+ - **IP-Adapter of PuLID**: EcomID incorporates the ID-Encoder and cross-attention components from PuLID, trained with alignment loss.
17
+ This method effectively reduces the interference of ID embeddings on text embeddings within the cross-attention part, minimizing disruption to the underlying model's text-to-image capabilities.
18
+ - **InstantID’s IdentityNet Architecture**: Utilizing **a dataset of 2 million aesthetically pleasing portrait images**, IdentityNet enhances keypoint control, improving ID consistency and facial realism. During training, the IP-adapter is frozen, and only the IdentityNet is trained. Facial landmarks are used as conditional inputs, while face embeddings are integrated into IdentityNet via cross-attention.
19
+
20
+ # Show Cases
21
+ ## Comparison with Other Methods
22
+ ### 1、Preserved Text-to-Image Capability
23
+
24
+ <table>
25
+ <tr>
26
+ <th style="width: 28%;">Prompt</th>
27
+ <th style="width: 24%;">Reference Image</th>
28
+ <th style="width: 24%;">EcomID</th>
29
+ <th style="width: 24%;">InstantID</th>
30
+ </tr>
31
+ <tr>
32
+ <td>girl, white skin, black hair, long wavy hair, <span style="color:red"><strong>in European style living room, Retro tone, decorations</strong></span>, depth of field.</td>
33
+ <td><img src="images/show_case/50.png" alt="参考图像" width="100%"></td>
34
+ <td><img src="images/show_case/49.png" alt="EcomID图像" width="100%"></td>
35
+ <td><img src="images/show_case/48.png" alt="InstantID图像" width="100%"></td>
36
+ </tr>
37
+ <table>
38
+
39
+ As shown above, EcomID ***preserves background generation abilities while minimizing stylization, greatly enhancing realism***.
40
+ The visualizations highlight more authentic portraits with improved background semantic consistency, showcasing EcomID's advantage in generating realistic images.
41
+
42
+ ### 2、Improved Facial Control and Consistency
43
+ <table>
44
+ <tr>
45
+ <th style="width: 24%;">Prompt</th>
46
+ <th style="width: 19%;">Reference Image</th>
47
+ <th style="width: 19%;">EcomID</th>
48
+ <th style="width: 19%;">InstantID</th>
49
+ <th style="width: 19%;">PuLID</th>
50
+ </tr>
51
+ <tr>
52
+ <td>A close-up portrait of a man standing in the library, holding <span style="color:red"><strong>two smiling toddlers</strong></span> next to him.</td>
53
+ <td><img src="images/show_case/20.png" alt="参考图像" width="100%"></td>
54
+ <td><img src="images/show_case/17.png" alt="EcomID图像" width="100%"></td>
55
+ <td><img src="images/show_case/18.png" alt="InstantID图像" width="100%"></td>
56
+ <td><img src="images/show_case/19.png" alt="PuLID图像" width="100%"></td>
57
+ </tr>
58
+ <table>
59
+
60
+ As shown above, EcomID employs keypoints as conditional inputs for training, ***allowing for precise adjustments of facial positions, sizes, and orientations***. This capability ensures that the generated portraits are more controllable while further enhancing facial similarity and the overall quality of the images.
61
+
62
+ ### More showcases
63
+ EcomID enhances portrait representation, delivering a more authentic and aesthetically pleasing appearance while ensuring semantic consistency and greater internal ID similarity (i.e., traits that do not vary with age, hairstyle, glasses, or other physical changes).
64
+
65
+ <table>
66
+ <tr>
67
+ <th style="width: 24%;">Prompt</th>
68
+ <th style="width: 19%;">Reference Image</th>
69
+ <th style="width: 19%;">EcomID</th>
70
+ <th style="width: 19%;">InstantID</th>
71
+ <th style="width: 19%;">PuLID</th>
72
+ </tr>
73
+ <tr>
74
+ <td>A close-up portrait of a <span style="color:red"><strong>little girl with double braids</strong></span>, wearing a white dress, standing on the beach during sunset.</td>
75
+ <td><img src="images/show_case/21.png" alt="参考图像" width="100%"></td>
76
+ <td><img src="images/show_case/22.png" alt="EcomID图像" width="100%"></td>
77
+ <td><img src="images/show_case/23.png" alt="InstantID图像" width="100%"></td>
78
+ <td><img src="images/show_case/24.png" alt="PuLID图像" width="100%"></td>
79
+ </tr>
80
+ <tr>
81
+ <td>A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
82
+ <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
83
+ <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
84
+ <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
85
+ <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
86
+ </tr>
87
+ <tr>
88
+ <td>Agrizzled detective, <span style="color:red"><strong>fedora</strong></span> casting a shadow over his square jaw, a <span style="color:red"><strong>cigar dangling from his lips</strong></span>, his trench coat evocative of film noir, in a <span style="color:red"><strong>rainy alley</strong></span>.</td>
89
+ <td><img src="images/show_case/25.png" alt="参考图像" width="100%"></td>
90
+ <td><img src="images/show_case/26.png" alt="EcomID图像" width="100%"></td>
91
+ <td><img src="images/show_case/27.png" alt="InstantID图像" width="100%"></td>
92
+ <td><img src="images/show_case/28.png" alt="PuLID图像" width="100%"></td>
93
+ </tr>
94
+ <tr>
95
+ <td>A smiling girl with <span style="color:red"><strong>bangs and long hair</strong></span> in a school uniform stands under cherry trees, holding a book.</td>
96
+ <td><img src="images/show_case/29.png" alt="参考图像" width="100%"></td>
97
+ <td><img src="images/show_case/30.png" alt="EcomID图像" width="100%"></td>
98
+ <td><img src="images/show_case/31.png" alt="InstantID图像" width="100%"></td>
99
+ <td><img src="images/show_case/32.png" alt="PuLID图像" width="100%"></td>
100
+ </tr>
101
+ <tr>
102
+ <td>A <span style="color:red"><strong>very old</strong></span> witch, wearing a black cloak, with a pointed hat, holding a magic wand, against a background of a misty forest.</td>
103
+ <td><img src="images/show_case/33.png" alt="参考图像" width="100%"></td>
104
+ <td><img src="images/show_case/34.png" alt="EcomID图像" width="100%"></td>
105
+ <td><img src="images/show_case/35.png" alt="InstantID图像" width="100%"></td>
106
+ <td><img src="images/show_case/36.png" alt="PuLID图像" width="100%"></td>
107
+ </tr>
108
+ <tr>
109
+ <td>A man clad in cyberpunk fashion: <span style="color:red"><strong>neon accents, reflective sunglasses,</strong></span> and a leather jacket with glowing circuit patterns. He stands stoically amidst a soaked cityscape.</td>
110
+ <td><img src="images/show_case/37.png" alt="参考图像" width="100%"></td>
111
+ <td><img src="images/show_case/38.png" alt="EcomID图像" width="100%"></td>
112
+ <td><img src="images/show_case/39.png" alt="InstantID图像" width="100%"></td>
113
+ <td><img src="images/show_case/40.png" alt="PuLID图像" width="100%"></td>
114
+ </tr>
115
+
116
+ </table>
117
+
118
+ ### More Base Models, Resolutions, and Styles
119
+ <table>
120
+ <tr>
121
+ <th style="width: 12%;">SDXL models</th>
122
+ <th style="width: 24%;">Prompt</th>
123
+ <th style="width: 16%;">Reference Image</th>
124
+ <th style="width: 16%;">EcomID</th>
125
+ <th style="width: 16%;">InstantID</th>
126
+ <th style="width: 16%;">PuLID</th>
127
+ </tr>
128
+ <tr>
129
+ <td>sd-xl-base-1.0</td>
130
+ <td>girl, solo, brown hair, holding a little teddy bear on her hands, wearing a school uniform, standing in the library, <span style="color:red"><strong>cartoon style</strong></span>.</td>
131
+ <td><img src="images/show_case/1.png" alt="参考图像" width="100%"></td>
132
+ <td><img src="images/show_case/2.png" alt="EcomID图像" width="100%"></td>
133
+ <td><img src="images/show_case/3.png" alt="InstantID图像" width="100%"></td>
134
+ <td><img src="images/show_case/4.png" alt="PuLID图像" width="100%"></td>
135
+ </tr>
136
+ <tr>
137
+ <td>EcomXL</td>
138
+ <td>A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
139
+ <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
140
+ <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
141
+ <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
142
+ <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
143
+ </tr>
144
+ <tr>
145
+ <td>DreamShaperXL</td>
146
+ <td>solo, looking_at_viewer, smile, brown_hair, upper_body, open_clothes, teeth, open_jacket, black_jacket, blurry_background, realistic</td>
147
+ <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
148
+ <td><img src="images/show_case/6.png" alt="EcomID图像" width="100%"></td>
149
+ <td><img src="images/show_case/7.png" alt="InstantID图像" width="100%"></td>
150
+ <td><img src="images/show_case/8.png" alt="PuLID图像" width="100%"></td>
151
+ </tr>
152
+ <tr>
153
+ <td>leosam_xl_v7</td>
154
+ <td>A close-up portrait of a girl, solo, dress, jewelry, beach and sea, pink_dress, realistic.</td>
155
+ <td><img src="images/show_case/9.png" alt="参考图像" width="100%"></td>
156
+ <td><img src="images/show_case/15.png" alt="EcomID图像" width="100%"></td>
157
+ <td><img src="images/show_case/14.png" alt="InstantID图像" width="100%"></td>
158
+ <td><img src="images/show_case/16.png" alt="PuLID图像" width="100%"></td>
159
+ </tr>
160
+
161
+ </table>
162
+
163
+ ### Notes
164
+ - Unless otherwise specified, the showcases are generated using the base model EcomXL, which is also highly compatible with various other SDXL-based models, such as [leosams-helloworld-xl](https://civitai.com/models/43977/leosams-helloworld-xl), [dreamshaper-xl](https://civitai.com/models/112902/dreamshaper-xl), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and so on.
165
+ - It works very well with SDXL Turbo/Lighting, [EcomXL Inpainting ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint) and [EcomXL Softedge ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge).
166
+
167
+ # How to use
168
+
169
+ ## ComfyUI
170
+
171
+ - The EcomID_ComfyUI node has been released: [click here](https://code.alibaba-inc.com/ruxue.wrx/EcomID_ComfyUI)
172
+
173
+ # Training Details
174
+
175
+ The model is trained on 2M Taobao images, where the proportion of human faces is greater than 3%. The images have a resolution greater than 800, and the aesthetic score is above 5.5.
176
+
177
+ Mixed precision: fp16
178
+
179
+ Learning rate: 1e-4
180
+
181
+ Batch size: 2
182
+
183
+ Image size: 1024x1024
diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f21e607c9582b1df7a14f6b99597c4a6dda4cb37e4f6675db8985971e4b1cf2
3
- size 5004167864