cyber-meow
commited on
Commit
•
4e83663
1
Parent(s):
5ad3449
update readme
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ The ressemblance with a character can be improved by a better description of the
|
|
28 |
### Dataset description
|
29 |
|
30 |
The dataset contains around 40K images with the following composition
|
31 |
-
-
|
32 |
- 726 fan arts
|
33 |
- ~30K customized regularization images
|
34 |
|
@@ -67,12 +67,12 @@ I tried several things in this model (this is why I trained for so long), but I
|
|
67 |
(it can generate like 3~5 people when we prompt 3people).
|
68 |
- I use some tokens to describe the face position within a 5x5 grid but the model did not learn anything about these tokens.
|
69 |
I think this is either due to 1) face position being too abstract to learn, 2) data imbalance as I did not balance my training for this, or 3) captions not enough focused on these concepts (it is much longer and contains other information).
|
70 |
-
- As mentioned, the model can generate multi-character scenes but the success rate becomes lower and lower as we increase the number of
|
71 |
Character bleeding is always a hard problem to solve.
|
72 |
- The model is trained with 5% weight for hand images, but I doubt it helps in any kind.
|
73 |
|
74 |
-
Actually, I have a doubt whether the last 22000 steps really improved the
|
75 |
-
This is how I get my 20$ estimate taking into account that we can simply train at resolution 512 on 3090
|
76 |
|
77 |
|
78 |
### More Example Generations
|
|
|
28 |
### Dataset description
|
29 |
|
30 |
The dataset contains around 40K images with the following composition
|
31 |
+
- 11423 anime screenshots from the four seasons of the anime
|
32 |
- 726 fan arts
|
33 |
- ~30K customized regularization images
|
34 |
|
|
|
67 |
(it can generate like 3~5 people when we prompt 3people).
|
68 |
- I use some tokens to describe the face position within a 5x5 grid but the model did not learn anything about these tokens.
|
69 |
I think this is either due to 1) face position being too abstract to learn, 2) data imbalance as I did not balance my training for this, or 3) captions not enough focused on these concepts (it is much longer and contains other information).
|
70 |
+
- As mentioned, the model can generate multi-character scenes but the success rate becomes lower and lower as we increase the number of characters in the scene.
|
71 |
Character bleeding is always a hard problem to solve.
|
72 |
- The model is trained with 5% weight for hand images, but I doubt it helps in any kind.
|
73 |
|
74 |
+
Actually, I have a doubt whether the last 22000 steps really improved the model.
|
75 |
+
This is how I get my 20$ estimate taking into account that we can simply train at resolution 512 on 3090 (and also ED2 will be more efficient).
|
76 |
|
77 |
|
78 |
### More Example Generations
|