alea31415
/

yama-no-susume

Model card Files Files and versions Community

cyber-meow commited on Dec 31, 2022

Commit

4e83663

•

1 Parent(s): 5ad3449

update readme

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ The ressemblance with a character can be improved by a better description of the
 ### Dataset description
 The dataset contains around 40K images with the following composition
-- 11424 anime screenshots from the four seasons of the anime
 - 726 fan arts
 - ~30K customized regularization images
@@ -67,12 +67,12 @@ I tried several things in this model (this is why I trained for so long), but I
 (it can generate like 3~5 people when we prompt 3people).
 - I use some tokens to describe the face position within a 5x5 grid but the model did not learn anything about these tokens.
 I think this is either due to 1) face position being too abstract to learn, 2) data imbalance as I did not balance my training for this, or 3) captions not enough focused on these concepts (it is much longer and contains other information).
-- As mentioned, the model can generate multi-character scenes but the success rate becomes lower and lower as we increase the number of character in the scene.
 Character bleeding is always a hard problem to solve.
 - The model is trained with 5% weight for hand images, but I doubt it helps in any kind.
-Actually, I have a doubt whether the last 22000 steps really improved the models.
-This is how I get my 20$ estimate taking into account that we can simply train at resolution 512 on 3090 with ED2.
 ### More Example Generations

 ### Dataset description
 The dataset contains around 40K images with the following composition
+- 11423 anime screenshots from the four seasons of the anime
 - 726 fan arts
 - ~30K customized regularization images
 (it can generate like 3~5 people when we prompt 3people).
 - I use some tokens to describe the face position within a 5x5 grid but the model did not learn anything about these tokens.
 I think this is either due to 1) face position being too abstract to learn, 2) data imbalance as I did not balance my training for this, or 3) captions not enough focused on these concepts (it is much longer and contains other information).
+- As mentioned, the model can generate multi-character scenes but the success rate becomes lower and lower as we increase the number of characters in the scene.
 Character bleeding is always a hard problem to solve.
 - The model is trained with 5% weight for hand images, but I doubt it helps in any kind.
+Actually, I have a doubt whether the last 22000 steps really improved the model.
+This is how I get my 20$ estimate taking into account that we can simply train at resolution 512 on 3090 (and also ED2 will be more efficient).
 ### More Example Generations