File size: 5,148 Bytes
2b0afb0
0f03fc1
2b0afb0
0f03fc1
ccd4a42
0f03fc1
2b0afb0
0f03fc1
2b0afb0
 
 
ccd4a42
0f03fc1
2b0afb0
 
 
ccd4a42
0f03fc1
2b0afb0
0f03fc1
ccd4a42
0f03fc1
2b0afb0
ccd4a42
f0c880a
ccd4a42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0c880a
2b0afb0
 
f0c880a
2b0afb0
 
f0c880a
2b0afb0
 
 
f0c880a
2b0afb0
 
 
 
ccd4a42
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# Persian-to-Image Text-to-Image Pipeline

## Model Overview

This model pipeline is designed to generate images from Persian text descriptions. It works by first translating the Persian text into English and then using a fine-tuned Stable Diffusion model to generate the corresponding image. The pipeline combines two models: a translation model (`mohammad-shirkhani/finetune_persian_to_english_mt5_base_summarize_on_celeba_hq`) and an image generation model (`ebrahim-k/Stable-Diffusion-1_5-FT-celeba_HQ_en`).

## Model Details

### Translation Model
- **Model Name**: `mohammad-shirkhani/finetune_persian_to_english_mt5_base_summarize_on_celeba_hq`
- **Architecture**: mT5
- **Purpose**: This model translates Persian text into English. It has been fine-tuned on the CelebA-HQ dataset for summarization tasks, making it effective for translating descriptions of facial features.

### Image Generation Model
- **Model Name**: `ebrahim-k/Stable-Diffusion-1_5-FT-celeba_HQ_en`
- **Architecture**: Stable Diffusion 1.5
- **Purpose**: This model generates high-quality images from English text produced by the translation model. It has been fine-tuned on the CelebA-HQ dataset, which makes it particularly effective for generating realistic human faces based on text descriptions.

## Pipeline Description

The pipeline operates through the following steps:

1. **Text Translation**: The Persian input text is translated into English using the mT5-based translation model.
2. **Image Generation**: The translated English text is then used to generate the corresponding image with the Stable Diffusion model.

### Code Implementation

#### 1. Install Required Libraries

```python
!pip install transformers diffusers accelerate torch
```
#### 2. Import Necessary Libraries

```python
import torch
from transformers import MT5ForConditionalGeneration, T5Tokenizer
from diffusers import StableDiffusionPipeline
```

#### 3. Set Device (GPU or CPU)
This code determines whether the pipeline should use a GPU (if available) or fallback to a CPU.

```python
# Determine the device: GPU if available, otherwise CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
```

#### 4. Define and Load the Persian-to-Image Model Class
The following class handles both translation and image generation tasks.

```python
# Define the model class
class PersianToImageModel:
    def __init__(self, translation_model_name, image_model_name, device):
        self.device = device

        # Load translation model
        self.translation_model = MT5ForConditionalGeneration.from_pretrained(translation_model_name).to(device)
        self.translation_tokenizer = T5Tokenizer.from_pretrained(translation_model_name)

        # Load image generation model
        self.image_model = StableDiffusionPipeline.from_pretrained(image_model_name).to(device)

    def translate_text(self, persian_text):
        input_ids = self.translation_tokenizer.encode(persian_text, return_tensors="pt").to(self.device)
        translated_ids = self.translation_model.generate(input_ids, max_length=512, num_beams=4, early_stopping=True)
        translated_text = self.translation_tokenizer.decode(translated_ids[0], skip_special_tokens=True)
        return translated_text

    def generate_image(self, english_text):
        image = self.image_model(english_text).images[0]
        return image

    def __call__(self, persian_text):
        # Translate Persian text to English
        english_text = self.translate_text(persian_text)
        print(f"Translated Text: {english_text}")

        # Generate and return image
        return self.generate_image(english_text)
```
#### 5. Instantiate the Model
The following code snippet demonstrates how to instantiate the combined model.

```python
# Instantiate the combined model
translation_model_name = 'mohammad-shirkhani/finetune_persian_to_english_mt5_base_summarize_on_celeba_hq'
image_model_name = 'ebrahim-k/Stable-Diffusion-1_5-FT-celeba_HQ_en'

persian_to_image_model = PersianToImageModel(translation_model_name, image_model_name, device)
```
#### 6. Example Usage of the Model
Below are examples of how to use the model to generate images from Persian text.

```python
from IPython.display import display

# Persian text describing a person
persian_text = "این زن دارای موهای موج دار ، لب های بزرگ و موهای قهوه ای است و رژ لب دارد.این زن موهای موج دار و لب های بزرگ دارد و رژ لب دارد.فرد جذاب است و موهای موج دار ، چشم های باریک و موهای قهوه ای دارد."

# Generate and display the image
image = persian_to_image_model(persian_text)
display(image)

# Another example
persian_text2 = "این مرد جذاب دارای موهای قهوه ای ، سوزش های جانبی ، دهان کمی باز و کیسه های زیر چشم است.این فرد جذاب دارای کیسه های زیر چشم ، سوزش های جانبی و دهان کمی باز است."
image2 = persian_to_image_model(persian_text2)
display(image2)
```