ohashi56225 commited on
Commit
30aee98
1 Parent(s): 130d77b

Add Model Card

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ja
3
+ license: other
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - ja
9
+ - japanese
10
+ inference: true
11
+ # extra_gated_prompt: |-
12
+ # One more step before getting this model.
13
+ # This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
14
+ # The CreativeML OpenRAIL License specifies:
15
+ # 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
16
+ # 2. rinna Co., Ltd. claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
17
+ # 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
18
+ # Please read the full license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license
19
+
20
+ # By clicking on "Access repository" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well.
21
+
22
+ # extra_gated_fields:
23
+ # I have read the License and agree with its terms: checkbox
24
+ ---
25
+
26
+
27
+ # SFCOCO Stable Diffusion Model Card
28
+
29
+ SFCOCO Stable Diffusion is a Japanese-specific latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
30
+
31
+ This model was fine-tuned by using a powerful Japanese-specific latent text-to-image diffusion model, [Japanese Stable Diffusion](https://huggingface.co/rinna/japanese-stable-diffusion).
32
+ We use the [Stable Diffusion text-to-image fine-tuning script](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) of [🤗 Diffusers](https://github.com/huggingface/diffusers)
33
+
34
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nu-dialogue/clip-prefix-caption-jp/blob/master/notebooks/sfc2022_stable_diffusion.ipynb)
35
+
36
+ ## Model Details
37
+ - **Developed by:** Atsumoto Ohashi
38
+ - **Model type:** Diffusion-based text-to-image generation model
39
+ - **Language(s):** Japanese
40
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model (LDM)](https://arxiv.org/abs/2112.10752) that used [Japanese Stable Diffusion](https://huggingface.co/rinna/japanese-stable-diffusion) as a pre-trained model.
41
+ - **Resources for more information:** [Japanese Stable Diffusion GitHub Repository](https://github.com/rinnakk/japanese-stable-diffusion)
42
+
43
+ ## Examples
44
+
45
+ Firstly, install our package as follows. This package is modified [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run Japanese Stable Diffusion.
46
+
47
+
48
+ ```bash
49
+ pip install git+https://github.com/rinnakk/japanese-stable-diffusion
50
+ ```
51
+
52
+ Run this command to log in with your HF Hub token if you haven't before:
53
+
54
+ ```bash
55
+ huggingface-cli login
56
+ ```
57
+
58
+ Running the pipeline with the k_lms scheduler:
59
+ ```python
60
+ import torch
61
+ from torch import autocast
62
+ from diffusers import LMSDiscreteScheduler
63
+ from japanese_stable_diffusion import JapaneseStableDiffusionPipeline
64
+ model_id = "nu-dialogue/sfc2022-stable-diffusion"
65
+ device = "cuda"
66
+ # Use the K-LMS scheduler here instead
67
+ scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
68
+ pipe = JapaneseStableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True, torch_dtype=torch.float16)
69
+ pipe = pipe.to(device)
70
+
71
+ prompt = "福澤諭吉像の写真"
72
+ with autocast("cuda"):
73
+ image = pipe(prompt, guidance_scale=7.5)["sample"][0]
74
+
75
+ image.save("output.png")
76
+ ```
77
+ _Note: `JapaneseStableDiffusionPipeline` is almost same as diffusers' `StableDiffusionPipeline` but added some lines to initialize our models properly._
78
+
79
+
80
+ ## Training
81
+
82
+ **Training Data**
83
+ We used the SFCOCO2021 and SFCOCO2022 dataset for training the model.
84
+ You can see these datasets in [this repository](https://github.com/nu-dialogue/clip-prefix-caption-jp).
85
+
86
+ **Training Procedure**
87
+ SFCOCO Stable Diffusion has the same architecture as Japanese Stable Diffusion and was trained by using Japanese Stable Diffusion.
88
+ We use the [Stable Diffusion text-to-image fine-tuning script](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) of [🤗 Diffusers](https://github.com/huggingface/diffusers)
89
+
90
+ ## Citation
91
+
92
+ ```bibtex
93
+ @InProceedings{Rombach_2022_CVPR,
94
+ author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
95
+ title = {High-Resolution Image Synthesis With Latent Diffusion Models},
96
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
97
+ month = {June},
98
+ year = {2022},
99
+ pages = {10684-10695}
100
+ }
101
+ ```
102
+
103
+ ```bibtex
104
+ @misc{japanese_stable_diffusion,
105
+ author = {Shing, Makoto and Sawada, Kei},
106
+ title = {Japanese Stable Diffusion},
107
+ howpublished = {\url{https://github.com/rinnakk/japanese-stable-diffusion}},
108
+ month = {September},
109
+ year = {2022},
110
+ }
111
+ ```
112
+
113
+
114
+ *This model card was written by: Atsumoto Ohashi and is based on the [Japanese Stable Diffusion Model Card](https://github.com/rinnakk/japanese-stable-diffusion).*