Let’s make a generation of amazing image generation models
The best image generation models are trained on human preference datasets, where annotators have selected the best image from a choice of two. Unfortunately, many of these datasets are closed source and so the community cannot train open models on them. Let’s change that!
The community can contribute image preferences for an open source dataset that could be used for building AI models that convert text to image, like the flux or stable diffusion families. The dataset will be open source so everyone can use it to train models that we can all use.
How to get involved
If you would like to contribute to the dataset, you can do so by adding preferences in the annotation app (using Argilla UI). You should follow the following steps:
- Go to the Argilla Space and log in using your Hugging Face profile.
- Check out the guidelines on how to select the best image, and tips for optimizing your workflow.
- Rank the images that you prefer based on the annotation guidelines. You should rank images based on their aesthetic appeal, and how much they adhere to the prompt.
- Check out your contribution in the leaderboard.
How to use the dataset
We will share the dataset periodically during the project as it is being labeled. So you will be able to download it to explore or train your own models.
If you would like use the dataset, you can do so straight away. We will share the image preferences dataset on the Hugging Face hub in a dataset repo named `data-is-better-together/image-preferences-argilla`
from datasets import load_dataset
dataset = load_dataset("data-is-better-together/image-preferences")
References
- Leaderboard A dashboard to track the progress of the community in adding preferences to the dataset.
- Argilla Space The space where the dataset is hosted for the community to contribute to.
- distilabel: a tool for creating and synthetic datasets. We used distilabel to evolve prompt and to create the image preferences dataset.
- Hugging Face Spaces: a platform for hosting machine learning applications and demos. We used Spaces to host the Argilla tool for prompt ranking.
- Argilla an open-source data annotation tool that we used for the prompt ranking. Argilla has the option of using Hugging Face for authentication, which makes it easier for the community to contribute.