File size: 11,113 Bytes

cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
515e4e5
 
cde8d7c
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
 
 
 
 
 
515e4e5
cde8d7c
 
 
515e4e5
cde8d7c

<div style="display: flex; justify-content: center; align-items: center;">
  <img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
  <img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
</div>
EcomID 旨在从单个ID参考图像生成定制的保ID图像，优势在于很强的语义一致性，同时受人脸关键点控制。

此仓库提供了 EcomID 方法和模型，方法上结合了 [PuLID](https://github.com/ToTheBeginning/PuLID) 和 [InstantID](https://github.com/instantX-research/InstantID) 的优点，以获得更好的背景一致性、面部关键点控制、更真实的面部以及更高的相似度。

# EcomID 概述
## EcomID 结构
  <img src="./images/overflow.png" alt="alibaba" style="width: 100%; height: auto; margin-right: 5%;">

- **PuLID 的 IP-Adapter**：EcomID 借鉴了 PuLID 的 ID-Encoder 和交叉注意力组件，其使用对齐损失训练而成。
故而该方法有效减少了 ID embedding 对交叉注意力部分的文本 embedding的干扰，最小化对底层模型文本到图像能力的干扰。

- **InstantID 的 IdentityNet 架构**：利用 *200 万张美观的人像图像数据集*，训练了IdentityNet，增强了关键点控制，提高了 ID 一致性和面部真实感。在训练过程中，IP-adapter 被冻结，只有 IdentityNet 被训练。面部Keypoint用作条件输入，同时面部嵌入通过交叉注意力集成到 IdentityNet 中。

# 展示案例
## 与其他方法的比较
### 1、保留文本到图像能力
<table>
    <tr>
        <th style="width: 28%;">Prompt</th>
        <th style="width: 24%;">参考图像</th>
        <th style="width: 24%;">EcomID</th>
        <th style="width: 24%;">InstantID</th>
    </tr>
    <tr>
        <td>女孩，白皮肤，黑头发，长卷发，<span style="color:red"><strong>在欧洲风格的客厅，复古色调，装饰品</strong></span>，景深。</td>
        <td><img src="images/show_case/50.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/49.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/48.png" alt="InstantID图像" width="100%"></td>
    </tr>
<table>

如上所示，EcomID ***保留了背景生成能力，同时最大限度地减少了风格化，从而大大增强了真实感***。
如图可见，EcomID的背景语义一致性得到了改善，且在生成真实图像方面格外有优势。

### 2、改善面部控制和相似度
<table>
    <tr>
        <th style="width: 24%;">Prompt</th>
        <th style="width: 19%;">参考图像</th>
        <th style="width: 19%;">EcomID</th>
        <th style="width: 19%;">InstantID</th>
        <th style="width: 19%;">PuLID</th>
    </tr>
    <tr>
        <td>在图书馆前站着的男人的特写肖像，<span style="color:red"><strong>抱着两个微笑的幼儿</strong></span>。</td>
        <td><img src="images/show_case/20.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/17.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/18.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/19.png" alt="PuLID图像" width="100%"></td>
    </tr>
<table>

如上所示，EcomID 使用关键点作为训练的条件输入，***允许精确调整面部位置、大小和方向***。这种能力确保生成的人像更加可控，同时进一步增强了面部相似性和图像的整体质量。

### 更多案例
EcomID 提高了人像表现，提供了更真实和美观的外观，同时确保语义一致性和更好的内部 ID 相似性（即，不随年龄、发型、眼镜或其他身体变化而变化的特征）。
<table>
    <tr>
        <th style="width: 24%;">Prompt</th>
        <th style="width: 19%;">参考图像</th>
        <th style="width: 19%;">EcomID</th>
        <th style="width: 19%;">InstantID</th>
        <th style="width: 19%;">PuLID</th>
    </tr>
    <tr>
        <td>一个<span style="color:red"><strong>双辫小女孩</strong></span>的特写肖像，穿着白色裙子，傍晚在海滩上。</td>
        <td><img src="images/show_case/21.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/22.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/23.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/24.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>一个<span style="color:red"><strong>非常小的女孩</strong></span>，双辫，带着<spann style="color:red"><strong>帽子</strong></span>和白色裙子，傍晚在海滩上。</td>
        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>一个满脸胡茬的侦探，<span style="color:red"><strong>戴着帽子</strong></span>，阴影投在他方形的下巴上，<span style="color:red"><strong>嘴里叼着一根香烟</strong></span>，他的风衣唤起了电影黑色风格，在一个<span style="color:red"><strong>阴雨小巷</strong></span>里。</td>
        <td><img src="images/show_case/25.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/26.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/27.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/28.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>一个微笑的女孩，<span style="color:red"><strong>齐刘海和长发</strong></span>，穿着校服，站在樱花树下，手里拿着一本书。</td>
        <td><img src="images/show_case/29.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/30.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/31.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/32.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>一个<spann style="color:red"><strong>非常老的</strong></span>女巫，穿着黑色斗篷，戴着尖顶帽，手握魔杖，在雾气缭绕的森林背景下。</td>
        <td><img src="images/show_case/33.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/34.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/35.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/36.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>一个身穿赛博朋克风格的男人：<span style="color:red"><strong>霓虹配件，反光太阳镜，</strong></span>和带有发光电路图案的皮夹克。他在湿润的城市风貌中冷静地站着。</td>
        <td><img src="images/show_case/37.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/38.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/39.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/40.png" alt="PuLID图像" width="100%"></td>
    </tr>
</table>

### 更多基础模型、分辨率和风格
<table>
    <tr>
        <th style="width: 12%;">SDXL 模型</th>
        <th style="width: 24%;">Prompt</th>
        <th style="width: 16%;">参考图像</th>
        <th style="width: 16%;">EcomID</th>
        <th style="width: 16%;">InstantID</th>
        <th style="width: 16%;">PuLID</th>
    </tr>
    <tr>
        <td>sd-xl-base-1.0</td>
        <td>女孩，单独，棕色头发，手里抱着一个小泰迪熊，穿着校服，站在图书馆里，<span style="color:red"><strong>卡通风格</strong></span>。</td>
        <td><img src="images/show_case/1.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/2.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/3.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/4.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>EcomXL</td>
        <td>一个<span style="color:red"><strong>非常小的女孩</strong></span>的特写肖像，双辫，带着<spann style="color:red"><strong>帽子</strong></span>和白色裙子，傍晚在海滩上。</td>
        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>DreamShaperXL</td>
        <td>单独，面向观众，微笑，棕色头发，上半身，开衫，牙齿，打开的外套，黑色夹克，模糊背景，真实感</td>
        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/6.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/7.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/8.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>leosam_xl_v7</td>
        <td>一个特写肖像，女孩，单独，裙子，珠宝，海滩和大海，粉色裙子，真实感。</td>
        <td><img src="images/show_case/9.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/15.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/14.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/16.png" alt="PuLID图像" width="100%"></td>
    </tr>
</table>

### 注意事项
- 除非特别说明，大部分展示案例使用基础模型 EcomXL 生成；同时EcomID与其他基于 SDXL 的模型也高度兼容，例如 [leosams-helloworld-xl](https://civitai.com/models/43977/leosams-helloworld-xl)、[dreamshaper-xl](https://civitai.com/models/112902/dreamshaper-xl)、[stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) 等。
- 它与 SDXL Turbo/Lighting、[EcomXL Inpainting ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint) 和 [EcomXL Softedge ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge) 的兼容性非常好。
# 如何使用
## ComfyUI
- 已发布 EcomID_ComfyUI 节点：[点击这里](https://github.com/alimama-creative/SDXL_EcomID_ComfyUI)
# 训练细节
该模型在 200 万张淘宝图像上进行训练，其中人脸比例大于 3%。图像分辨率大于800，且美学评分超过 5.5。

混合精度：fp16

学习率：1e-4

批量大小：2

图像大小：1024x1024