|
# SD-Latent-Interposer |
|
A small neural network to provide interoperability between the latents generated by the different Stable Diffusion models. |
|
|
|
I wanted to see if it was possible to pass latents generated by the new SDXL model directly into SDv1.5 models without decoding and re-encoding them using a VAE first. |
|
|
|
## Installation |
|
To install it, simply clone this repo to your custom_nodes folder using the following command: |
|
``` |
|
git clone https://github.com/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer |
|
``` |
|
|
|
Alternatively, you can download the [comfy_latent_interposer.py](https://github.com/city96/SD-Latent-Interposer/raw/main/comfy_latent_interposer.py) file to your `ComfyUI/custom_nodes` folder as well. You may need to install hfhub using the command `pip install huggingface-hub` inside your venv. |
|
|
|
If you need the model weights for something else, they are [hosted on HF](https://huggingface.co/city96/SD-Latent-Interposer/tree/main) under the same Apache2 license as the rest of the repo. The current files are in the **"v4.0"** subfolder. |
|
|
|
## Usage |
|
Simply place it where you would normally place a VAE decode followed by a VAE encode. Set the denoise as appropirate to hide any artifacts while keeping the composition. See image below. |
|
|
|
![LATENT_INTERPOSER_V3 1_TEST](https://github.com/city96/SD-Latent-Interposer/assets/125218114/849574b4-2565-4090-85d3-ae63ab425ee2) |
|
|
|
Without the interposer, the two latent spaces are incompatible: |
|
|
|
![LATENT_INTERPOSER_V3 1](https://github.com/city96/SD-Latent-Interposer/assets/125218114/13e2c01f-580e-4ecb-af1f-b6b21699127b) |
|
|
|
### Local models |
|
The node pulls the required files from huggingface hub by default. You can create a `models` folder and place the models there if you have a flaky connection or prefer to use it completely offline. The custom node will prefer local files over HF when available. The path should be: `ComfyUI/custom_nodes/SD-Latent-Interposer/models` |
|
|
|
Alternatively, just clone the entire HF repo to it: |
|
``` |
|
git clone https://huggingface.co/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer/models |
|
``` |
|
|
|
### Supported Models |
|
|
|
Model names: |
|
|
|
| code | name | |
|
| ---- | -------------------------- | |
|
| `v1` | Stable Diffusion v1.x | |
|
| `xl` | SDXL | |
|
| `v3` | Stable Diffusion 3 | |
|
| `ca` | Stable Cascade (Stage A/B) | |
|
|
|
Available models: |
|
|
|
| From | to `v1` | to `xl` | to `v3` | to `ca` | |
|
|:----:|:-------:|:-------:|:-------:|:-------:| |
|
| `v1` | - | v4.0 | v4.0 | No | |
|
| `xl` | v4.0 | - | v4.0 | No | |
|
| `v3` | v4.0 | v4.0 | - | No | |
|
| `ca` | v4.0 | v4.0 | v4.0 | - | |
|
|
|
## Training |
|
|
|
The training code initializes most training parameters from the provided config file. The dataset should be a single .bin file saved with `torch.save` for each latent version. The format should be [batch, channels, height, width] with the "batch" being as large as the dataset, ie 88000. |
|
|
|
### Interposer v4.0 |
|
|
|
The training code currently initializes two copies of the model, one in the target direction and one in the opposite. The losses are defined based on this. |
|
|
|
- `p_loss` is the main criterion for the primary model. |
|
- `b_loss` is the main criterion for the secondary one. |
|
- `r_loss` is the output of the primary model back through the secondary model and checked against the source latent (basically a round trip through the two models). |
|
- `h_loss` is the same as `r_loss` but for the secondary model. |
|
|
|
All models were trained for 50000 steps with either batch size 128 (xl/v1) or 48 (cascade). |
|
The training was done locally on an RTX 3080 and a Tesla V100S. |
|
|
|
![LATENT_INTERPOSER_V4_LOSS](https://github.com/city96/SD-Latent-Interposer/assets/125218114/3a0d8920-ed48-42f0-96c9-897263525efb) |
|
|
|
### Older versions |
|
|
|
<details><summary>Interposer v3.1</summary> |
|
|
|
### Interposer v3.1 |
|
|
|
This is basically a complete rewrite. Replaced the mediocre bunch of conv2d layers with something that looks more like a proper neural network. No VGG loss because I still don't have a better GPU. |
|
|
|
Training was done on combined Flickr2K + DIV2K, with each image being processed into 6 1024x1024 segments. Padded with some of my random images for a total of 22,000 source images in the dataset. |
|
|
|
I think I got rid of most of the XL artifacts, but the color/hue/saturation shift issues are still there. I actually saved the optimizer state this time so I might be able to do 100K steps with visual loss on my P40s. Hopefully they won't burn up. |
|
|
|
v3.0 was 500k steps at a constant LR of 1e-4, v3.1 was 1M steps using a CosineAnnealingLR to drop the learning rate towards the end. Both used AdamW. |
|
|
|
![INTERPOSER_V3 1](https://github.com/city96/SD-Latent-Interposer/assets/125218114/daff0ae2-4739-4cef-ba54-ac1d156d3388) |
|
|
|
</details> |
|
|
|
<details><summary>Interposer v1.1</summary> |
|
|
|
### Interposer v1.1 |
|
This is the second release using the "spaceship" architecture. It was trained on the Flickr2K dataset and was continued from the v1.0 checkpoint. |
|
Overall, it seems to perform a lot better, especially for real life photos. I also investigated the odd v1->xl artifacts but in the end it seems [inherent to the VAE decoder stage.](https://github.com/comfyanonymous/ComfyUI/issues/1116) |
|
|
|
![loss](https://github.com/city96/SD-Latent-Interposer/assets/125218114/e890420f-cebd-4f88-b243-62560b8384e5) |
|
|
|
</details> |
|
|
|
|
|
<details><summary>Interposer v1.0</summary> |
|
|
|
### Interposer v1.0 |
|
Not sure why the training loss is so different, it might be due to the """highly curated""" dataset of 1000 random images from my Downloads folder that I used to train it. |
|
|
|
I probably should've just grabbed LAION. |
|
|
|
I also trained a v1-to-v2 mode, before realizing v1 and v2 shared the same latent space. Oh well. |
|
|
|
![loss](https://github.com/city96/SD-Latent-Interposer/assets/125218114/f92c399b-a823-4521-b09b-8bdc3795f1ea) |
|
|
|
![xl-to-v1_interposer](https://github.com/city96/SD-Latent-Interposer/assets/125218114/0d963bc5-570f-4ebe-95db-16e261f05e48) |
|
|
|
</details> |
|
|
|
</details> |
|
|