mrtuandao commited on
Commit
c5d116c
1 Parent(s): 7f71d96

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -22,15 +22,25 @@ extra_gated_fields:
22
  I have read the License and agree with its terms: checkbox
23
  ---
24
 
 
 
 
 
 
 
 
 
 
 
25
  Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
26
 
27
  The **Stable-Diffusion-Inpainting** was initialized with the weights of the [Stable-Diffusion-v-1-2](https://steps/huggingface.co/CompVis/stable-diffusion-v-1-2-original). First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598). For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
28
 
29
- [![Open In Spaces](https://camo.githubusercontent.com/00380c35e60d6b04be65d3d94a58332be5cc93779f630bcdfc18ab9a3a7d3388/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f25463025394625413425393725323048756767696e67253230466163652d5370616365732d626c7565)](https://huggingface.co/spaces/runwayml/stable-diffusion-inpainting) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
30
- :-------------------------:|:-------------------------:|
31
  ## Examples:
32
 
33
- You can use this both with the [🧨Diffusers library](https://github.com/huggingface/diffusers) and the [RunwayML GitHub repository](https://github.com/runwayml/stable-diffusion).
34
 
35
  ### Diffusers
36
 
@@ -38,8 +48,8 @@ You can use this both with the [🧨Diffusers library](https://github.com/huggin
38
  from diffusers import StableDiffusionInpaintPipeline
39
 
40
  pipe = StableDiffusionInpaintPipeline.from_pretrained(
41
- "runwayml/stable-diffusion-inpainting",
42
- revision="fp16",
43
  torch_dtype=torch.float16,
44
  )
45
  prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
@@ -59,18 +69,13 @@ image.save("./yellow_cat_on_park_bench.png")
59
  :-------------------------:|:-------------------------:|
60
  <span style="position: relative;bottom: 150px;">Face of a yellow cat, high resolution, sitting on a park bench</span> | <img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/test.png" alt="drawing" width="300"/>
61
 
62
- ### Original GitHub Repository
63
-
64
- 1. Download the weights [sd-v1-5-inpainting.ckpt](https://huggingface.co/runwayml/stable-diffusion-inpainting/resolve/main/sd-v1-5-inpainting.ckpt)
65
- 2. Follow instructions [here](https://github.com/runwayml/stable-diffusion#inpainting-with-stable-diffusion).
66
-
67
  ## Model Details
68
  - **Developed by:** Robin Rombach, Patrick Esser
69
  - **Model type:** Diffusion-based text-to-image generation model
70
  - **Language(s):** English
71
  - **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
72
  - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
73
- - **Resources for more information:** [GitHub Repository](https://github.com/runwayml/stable-diffusion), [Paper](https://arxiv.org/abs/2112.10752).
74
  - **Cite as:**
75
 
76
  @InProceedings{Rombach_2022_CVPR,
@@ -99,10 +104,11 @@ Excluded uses are described below.
99
  ### Misuse, Malicious Use, and Out-of-Scope Use
100
  _Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1_.
101
 
102
-
103
  The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
 
104
  #### Out-of-Scope Use
105
  The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
 
106
  #### Misuse and Malicious Use
107
  Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
108
 
@@ -140,7 +146,6 @@ Texts and images from communities and cultures that use other languages are like
140
  This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
141
  ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
142
 
143
-
144
  ## Training
145
 
146
  **Training Data**
@@ -209,7 +214,6 @@ Based on that information, we estimate the following CO2 emissions using the [Ma
209
  - **Compute Region:** US-east
210
  - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq.
211
 
212
-
213
  ## Citation
214
 
215
  ```bibtex
 
22
  I have read the License and agree with its terms: checkbox
23
  ---
24
 
25
+ # Re-upload
26
+
27
+ This repository is being re-uploaded to HuggingFace in accordance with [The CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license) under which this repository was originally uploaded, specifically **Section II** which grants:
28
+
29
+ > ...a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
30
+
31
+ Note that these files did not come from HuggingFace, but instead from [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-inpainting/files). As such, some files that were present in the original repository may not be present. File integrity has been verified via checksum.
32
+
33
+ # Original Model Card
34
+
35
  Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
36
 
37
  The **Stable-Diffusion-Inpainting** was initialized with the weights of the [Stable-Diffusion-v-1-2](https://steps/huggingface.co/CompVis/stable-diffusion-v-1-2-original). First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598). For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
38
 
39
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
40
+
41
  ## Examples:
42
 
43
+ You can use this with the [🧨Diffusers library](https://github.com/huggingface/diffusers).
44
 
45
  ### Diffusers
46
 
 
48
  from diffusers import StableDiffusionInpaintPipeline
49
 
50
  pipe = StableDiffusionInpaintPipeline.from_pretrained(
51
+ "benjamin-paine/stable-diffusion-v1-5-inpainting",
52
+ variant="fp16",
53
  torch_dtype=torch.float16,
54
  )
55
  prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
 
69
  :-------------------------:|:-------------------------:|
70
  <span style="position: relative;bottom: 150px;">Face of a yellow cat, high resolution, sitting on a park bench</span> | <img src="https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/test.png" alt="drawing" width="300"/>
71
 
 
 
 
 
 
72
  ## Model Details
73
  - **Developed by:** Robin Rombach, Patrick Esser
74
  - **Model type:** Diffusion-based text-to-image generation model
75
  - **Language(s):** English
76
  - **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
77
  - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
78
+ - **Resources for more information:** [Paper](https://arxiv.org/abs/2112.10752).
79
  - **Cite as:**
80
 
81
  @InProceedings{Rombach_2022_CVPR,
 
104
  ### Misuse, Malicious Use, and Out-of-Scope Use
105
  _Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1_.
106
 
 
107
  The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
108
+
109
  #### Out-of-Scope Use
110
  The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
111
+
112
  #### Misuse and Malicious Use
113
  Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
114
 
 
146
  This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
147
  ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
148
 
 
149
  ## Training
150
 
151
  **Training Data**
 
214
  - **Compute Region:** US-east
215
  - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq.
216
 
 
217
  ## Citation
218
 
219
  ```bibtex
safety_checker/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26bedfc237ed91ea9661957a1201f1a9dec54da7fd92f37cb2e9bff4b73ab639
3
+ size 1215981800
sd-v1-5-inpainting.fp16.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3384ef4638923d8cfe838d73d280dfbabc521ab751e8b74f8a0cba8a91d512fd
3
+ size 4265456609
sd-v1-5-inpainting.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a33284f5a9be288d1d97c4b1d66d186b1eda8d3703506318e3358bf05914cee
3
+ size 2132692100
sd-v1-5-inpainting.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef97ac1fe87ed0406433ad8710ff1da6e07e873de9a1a107b828844336d015ec
3
+ size 4265216468
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e7b5cc5d68991c50f042031c04547e49f006e468d521d064257a7f465c2910b
3
+ size 492265848
unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c5e5441a3304b5fe1eb1b29279889440f0ebdbf969a078b191c6c7046a2cc7f
3
+ size 3438225120
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37af64f891b66af7e83ce2dc8490e2fcaff9b4e681a725fb7a1376d8826bdeb2
3
+ size 334643252