ostris/ip-composition-adapter · Question about training details

Hi, Thank you for your contributions.

If I understand correctly, Is this training detail correct?

Create 30k images by the parent model (It might be SDXL?)
Create corresponding depth maps from them.
With a reference image and the depth map which is not corresponding to the reference image, we can make different image using ipadapter and depth t2i adapter.
With this pseudo gt image pairs (reference image and the image generated by IPAdapter), we can train IP-Adapter again.

Here, what prompts did you use for creating 30k images?
Did you narrow down your target to human images?
Because depth maps should have similar structure with reference image so we cannot use different categories.