cyber-meow commited on
Commit
4e83663
1 Parent(s): 5ad3449

update readme

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -28,7 +28,7 @@ The ressemblance with a character can be improved by a better description of the
28
  ### Dataset description
29
 
30
  The dataset contains around 40K images with the following composition
31
- - 11424 anime screenshots from the four seasons of the anime
32
  - 726 fan arts
33
  - ~30K customized regularization images
34
 
@@ -67,12 +67,12 @@ I tried several things in this model (this is why I trained for so long), but I
67
  (it can generate like 3~5 people when we prompt 3people).
68
  - I use some tokens to describe the face position within a 5x5 grid but the model did not learn anything about these tokens.
69
  I think this is either due to 1) face position being too abstract to learn, 2) data imbalance as I did not balance my training for this, or 3) captions not enough focused on these concepts (it is much longer and contains other information).
70
- - As mentioned, the model can generate multi-character scenes but the success rate becomes lower and lower as we increase the number of character in the scene.
71
  Character bleeding is always a hard problem to solve.
72
  - The model is trained with 5% weight for hand images, but I doubt it helps in any kind.
73
 
74
- Actually, I have a doubt whether the last 22000 steps really improved the models.
75
- This is how I get my 20$ estimate taking into account that we can simply train at resolution 512 on 3090 with ED2.
76
 
77
 
78
  ### More Example Generations
 
28
  ### Dataset description
29
 
30
  The dataset contains around 40K images with the following composition
31
+ - 11423 anime screenshots from the four seasons of the anime
32
  - 726 fan arts
33
  - ~30K customized regularization images
34
 
 
67
  (it can generate like 3~5 people when we prompt 3people).
68
  - I use some tokens to describe the face position within a 5x5 grid but the model did not learn anything about these tokens.
69
  I think this is either due to 1) face position being too abstract to learn, 2) data imbalance as I did not balance my training for this, or 3) captions not enough focused on these concepts (it is much longer and contains other information).
70
+ - As mentioned, the model can generate multi-character scenes but the success rate becomes lower and lower as we increase the number of characters in the scene.
71
  Character bleeding is always a hard problem to solve.
72
  - The model is trained with 5% weight for hand images, but I doubt it helps in any kind.
73
 
74
+ Actually, I have a doubt whether the last 22000 steps really improved the model.
75
+ This is how I get my 20$ estimate taking into account that we can simply train at resolution 512 on 3090 (and also ED2 will be more efficient).
76
 
77
 
78
  ### More Example Generations