arxiv:2311.10093

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Published on Nov 16, 2023

· Submitted by

akhaliq on Nov 17, 2023

#1 Paper of the day

Upvote

Authors:

Omri Avrahami ,

Amir Hertz ,

Yael Vinker ,

Moab Arar ,

Shlomi Fruchter ,

Ohad Fried ,

Daniel Cohen-Or ,

Dani Lischinski

Abstract

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, these models struggle with generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach. Project page is available at https://omriavrahami.com/the-chosen-one

View arXiv page View PDF Add to collection

Community

Chilangosta

Nov 18, 2023

Code?

NakedCyborg

Nov 18, 2023

•

edited Nov 18, 2023

I've had people contact me through Freelancer to do this. Whoever figures this out is going to be rich. For instance, i want a dress on a model, the exact same dress, with different poses. A magazine generator is what she wanted. I told her she would be better served by a photographer. But what if you had a magic wand tool to exclude certain pixels from being manipulated in subsequent generations. A way to keep elements dialed in. Nevermind rich, this is the crux of inventing a whole new angle of generative imaging, a big milestone.
What if those pixels could not only be made static, but in a dynamic way so that they stay the same for design purposes but can be manipulated internally for scene continuity.
Life Story is hilarious btw