ddpo-alignment
This model was finetuned from Stable Diffusion v1-4 using DDPO and a reward function that uses LLaVA to measure prompt-image alignment. See the project website for more details.
The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "a(n) <animal> <activity>". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.
Activities:
- washing dishes
- playing chess
- riding a bike
Animals:
- cat
- dog
- horse
- monkey
- rabbit
- zebra
- spider
- bird
- sheep
- deer
- cow
- goat
- lion
- tiger
- bear
- raccoon
- fox
- wolf
- lizard
- beetle
- ant
- butterfly
- fish
- shark
- whale
- dolphin
- squirrel
- mouse
- rat
- snake
- turtle
- frog
- chicken
- duck
- goose
- bee
- pig
- turkey
- fly
- llama
- camel
- bat
- gorilla
- hedgehog
- kangaroo
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.