Upload folder using huggingface_hub
Browse files- README.md +18 -14
- safety_checker/model.safetensors +3 -0
- sd-v1-5-inpainting.fp16.ckpt +3 -0
- sd-v1-5-inpainting.fp16.safetensors +3 -0
- sd-v1-5-inpainting.safetensors +3 -0
- text_encoder/model.safetensors +3 -0
- unet/diffusion_pytorch_model.safetensors +3 -0
- vae/diffusion_pytorch_model.safetensors +3 -0
README.md
CHANGED
@@ -22,15 +22,25 @@ extra_gated_fields:
|
|
22 |
I have read the License and agree with its terms: checkbox
|
23 |
---
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
|
26 |
|
27 |
The **Stable-Diffusion-Inpainting** was initialized with the weights of the [Stable-Diffusion-v-1-2](https://steps/huggingface.co/CompVis/stable-diffusion-v-1-2-original). First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598). For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
|
28 |
|
29 |
-
[![Open In
|
30 |
-
|
31 |
## Examples:
|
32 |
|
33 |
-
You can use this
|
34 |
|
35 |
### Diffusers
|
36 |
|
@@ -38,8 +48,8 @@ You can use this both with the [🧨Diffusers library](https://github.com/huggin
|
|
38 |
from diffusers import StableDiffusionInpaintPipeline
|
39 |
|
40 |
pipe = StableDiffusionInpaintPipeline.from_pretrained(
|
41 |
-
"
|
42 |
-
|
43 |
torch_dtype=torch.float16,
|
44 |
)
|
45 |
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
|
@@ -59,18 +69,13 @@ image.save("./yellow_cat_on_park_bench.png")
|
|
59 |
:-------------------------:|:-------------------------:|
|
60 |
<span style="position: relative;bottom: 150px;">Face of a yellow cat, high resolution, sitting on a park bench</span> | <img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/test.png" alt="drawing" width="300"/>
|
61 |
|
62 |
-
### Original GitHub Repository
|
63 |
-
|
64 |
-
1. Download the weights [sd-v1-5-inpainting.ckpt](https://huggingface.co/runwayml/stable-diffusion-inpainting/resolve/main/sd-v1-5-inpainting.ckpt)
|
65 |
-
2. Follow instructions [here](https://github.com/runwayml/stable-diffusion#inpainting-with-stable-diffusion).
|
66 |
-
|
67 |
## Model Details
|
68 |
- **Developed by:** Robin Rombach, Patrick Esser
|
69 |
- **Model type:** Diffusion-based text-to-image generation model
|
70 |
- **Language(s):** English
|
71 |
- **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
|
72 |
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
|
73 |
-
- **Resources for more information:** [
|
74 |
- **Cite as:**
|
75 |
|
76 |
@InProceedings{Rombach_2022_CVPR,
|
@@ -99,10 +104,11 @@ Excluded uses are described below.
|
|
99 |
### Misuse, Malicious Use, and Out-of-Scope Use
|
100 |
_Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1_.
|
101 |
|
102 |
-
|
103 |
The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
|
|
|
104 |
#### Out-of-Scope Use
|
105 |
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
|
|
|
106 |
#### Misuse and Malicious Use
|
107 |
Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
|
108 |
|
@@ -140,7 +146,6 @@ Texts and images from communities and cultures that use other languages are like
|
|
140 |
This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
|
141 |
ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
|
142 |
|
143 |
-
|
144 |
## Training
|
145 |
|
146 |
**Training Data**
|
@@ -209,7 +214,6 @@ Based on that information, we estimate the following CO2 emissions using the [Ma
|
|
209 |
- **Compute Region:** US-east
|
210 |
- **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq.
|
211 |
|
212 |
-
|
213 |
## Citation
|
214 |
|
215 |
```bibtex
|
|
|
22 |
I have read the License and agree with its terms: checkbox
|
23 |
---
|
24 |
|
25 |
+
# Re-upload
|
26 |
+
|
27 |
+
This repository is being re-uploaded to HuggingFace in accordance with [The CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license) under which this repository was originally uploaded, specifically **Section II** which grants:
|
28 |
+
|
29 |
+
> ...a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
|
30 |
+
|
31 |
+
Note that these files did not come from HuggingFace, but instead from [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-inpainting/files). As such, some files that were present in the original repository may not be present. File integrity has been verified via checksum.
|
32 |
+
|
33 |
+
# Original Model Card
|
34 |
+
|
35 |
Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
|
36 |
|
37 |
The **Stable-Diffusion-Inpainting** was initialized with the weights of the [Stable-Diffusion-v-1-2](https://steps/huggingface.co/CompVis/stable-diffusion-v-1-2-original). First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598). For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
|
38 |
|
39 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
|
40 |
+
|
41 |
## Examples:
|
42 |
|
43 |
+
You can use this with the [🧨Diffusers library](https://github.com/huggingface/diffusers).
|
44 |
|
45 |
### Diffusers
|
46 |
|
|
|
48 |
from diffusers import StableDiffusionInpaintPipeline
|
49 |
|
50 |
pipe = StableDiffusionInpaintPipeline.from_pretrained(
|
51 |
+
"benjamin-paine/stable-diffusion-v1-5-inpainting",
|
52 |
+
variant="fp16",
|
53 |
torch_dtype=torch.float16,
|
54 |
)
|
55 |
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
|
|
|
69 |
:-------------------------:|:-------------------------:|
|
70 |
<span style="position: relative;bottom: 150px;">Face of a yellow cat, high resolution, sitting on a park bench</span> | <img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/test.png" alt="drawing" width="300"/>
|
71 |
|
|
|
|
|
|
|
|
|
|
|
72 |
## Model Details
|
73 |
- **Developed by:** Robin Rombach, Patrick Esser
|
74 |
- **Model type:** Diffusion-based text-to-image generation model
|
75 |
- **Language(s):** English
|
76 |
- **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
|
77 |
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
|
78 |
+
- **Resources for more information:** [Paper](https://arxiv.org/abs/2112.10752).
|
79 |
- **Cite as:**
|
80 |
|
81 |
@InProceedings{Rombach_2022_CVPR,
|
|
|
104 |
### Misuse, Malicious Use, and Out-of-Scope Use
|
105 |
_Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1_.
|
106 |
|
|
|
107 |
The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
|
108 |
+
|
109 |
#### Out-of-Scope Use
|
110 |
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
|
111 |
+
|
112 |
#### Misuse and Malicious Use
|
113 |
Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
|
114 |
|
|
|
146 |
This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
|
147 |
ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
|
148 |
|
|
|
149 |
## Training
|
150 |
|
151 |
**Training Data**
|
|
|
214 |
- **Compute Region:** US-east
|
215 |
- **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq.
|
216 |
|
|
|
217 |
## Citation
|
218 |
|
219 |
```bibtex
|
safety_checker/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:26bedfc237ed91ea9661957a1201f1a9dec54da7fd92f37cb2e9bff4b73ab639
|
3 |
+
size 1215981800
|
sd-v1-5-inpainting.fp16.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3384ef4638923d8cfe838d73d280dfbabc521ab751e8b74f8a0cba8a91d512fd
|
3 |
+
size 4265456609
|
sd-v1-5-inpainting.fp16.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a33284f5a9be288d1d97c4b1d66d186b1eda8d3703506318e3358bf05914cee
|
3 |
+
size 2132692100
|
sd-v1-5-inpainting.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ef97ac1fe87ed0406433ad8710ff1da6e07e873de9a1a107b828844336d015ec
|
3 |
+
size 4265216468
|
text_encoder/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7e7b5cc5d68991c50f042031c04547e49f006e468d521d064257a7f465c2910b
|
3 |
+
size 492265848
|
unet/diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3c5e5441a3304b5fe1eb1b29279889440f0ebdbf969a078b191c6c7046a2cc7f
|
3 |
+
size 3438225120
|
vae/diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:37af64f891b66af7e83ce2dc8490e2fcaff9b4e681a725fb7a1376d8826bdeb2
|
3 |
+
size 334643252
|