Hilley commited on
Commit
2fe2568
1 Parent(s): af7e73b

Upload 3 files

Browse files
checkpoints/base_speakers/EN_V2/README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ko
5
+ pipeline_tag: text-to-speech
6
+ ---
7
+
8
+ # MeloTTS
9
+
10
+ MeloTTS is a **high-quality multi-lingual** text-to-speech library by [MyShell.ai](https://myshell.ai). Supported languages include:
11
+
12
+
13
+ | Model card | Example |
14
+ | --- | --- |
15
+ | [English](https://huggingface.co/myshell-ai/MeloTTS-English-v2) (American) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/en/EN-US/speed_1.0/sent_000.wav) |
16
+ | [English](https://huggingface.co/myshell-ai/MeloTTS-English-v2) (British) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/en/EN-BR/speed_1.0/sent_000.wav) |
17
+ | [English](https://huggingface.co/myshell-ai/MeloTTS-English-v2) (Indian) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/en/EN_INDIA/speed_1.0/sent_000.wav) |
18
+ | [English](https://huggingface.co/myshell-ai/MeloTTS-English-v2) (Australian) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/en/EN-AU/speed_1.0/sent_000.wav) |
19
+ | [English](https://huggingface.co/myshell-ai/MeloTTS-English-v2) (Default) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/en/EN-Default/speed_1.0/sent_000.wav) |
20
+ | [Spanish](https://huggingface.co/myshell-ai/MeloTTS-Spanish) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/es/ES/speed_1.0/sent_000.wav) |
21
+ | [French](https://huggingface.co/myshell-ai/MeloTTS-French) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/fr/FR/speed_1.0/sent_000.wav) |
22
+ | [Chinese](https://huggingface.co/myshell-ai/MeloTTS-Chinese) (mix EN) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/zh/ZH/speed_1.0/sent_008.wav) |
23
+ | [Japanese](https://huggingface.co/myshell-ai/MeloTTS-Japanese) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/jp/JP/speed_1.0/sent_000.wav) |
24
+ | [Korean](https://huggingface.co/myshell-ai/MeloTTS-Korean/) | [Link](https://myshell-public-repo-hosting.s3.amazonaws.com/myshellttsbase/examples/kr/KR/speed_1.0/sent_000.wav) |
25
+
26
+ Some other features include:
27
+ - The Chinese speaker supports `mixed Chinese and English`.
28
+ - Fast enough for `CPU real-time inference`.
29
+
30
+
31
+ ## Usage
32
+
33
+ ### Without Installation
34
+
35
+ An unofficial [live demo](https://huggingface.co/spaces/mrfakename/MeloTTS) is hosted on Hugging Face Spaces.
36
+
37
+ #### Use it on MyShell
38
+
39
+ There are hundreds of TTS models on MyShell, much more than MeloTTS. See examples [here](https://github.com/myshell-ai/MeloTTS/blob/main/docs/quick_use.md#use-melotts-without-installation).
40
+ More can be found at the widget center of [MyShell.ai](https://app.myshell.ai/robot-workshop).
41
+
42
+ ### Install and Use Locally
43
+
44
+ Follow the installation steps [here](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#linux-and-macos-install) before using the following snippet:
45
+
46
+ ```python
47
+ from melo.api import TTS
48
+
49
+ # Speed is adjustable
50
+ speed = 1.0
51
+
52
+ # CPU is sufficient for real-time inference.
53
+ # You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
54
+ device = 'auto' # Will automatically use GPU if available
55
+
56
+ # English
57
+ text = "Did you ever hear a folk tale about a giant turtle?"
58
+ model = TTS(language='EN_V2', device=device)
59
+ speaker_ids = model.hps.data.spk2id
60
+
61
+ # American accent
62
+ output_path = 'en-us.wav'
63
+ model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)
64
+
65
+ # British accent
66
+ output_path = 'en-br.wav'
67
+ model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)
68
+
69
+ # Indian accent
70
+ output_path = 'en-india.wav'
71
+ model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)
72
+
73
+ # Australian accent
74
+ output_path = 'en-au.wav'
75
+ model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)
76
+
77
+ # Default accent
78
+ output_path = 'en-default.wav'
79
+ model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)
80
+
81
+ ```
82
+
83
+
84
+ ## Join the Community
85
+
86
+ **Open Source AI Grant**
87
+
88
+ We are actively sponsoring open-source AI projects. The sponsorship includes GPU resources, fundings and intellectual support (collaboration with top research labs). We welcome both reseach and engineering projects, as long as the open-source community needs them. Please contact [Zengyi Qin](https://www.qinzy.tech/) if you are interested.
89
+
90
+ **Contributing**
91
+
92
+ If you find this work useful, please consider contributing to the GitHub [repo](https://github.com/myshell-ai/MeloTTS).
93
+
94
+ - Many thanks to [@fakerybakery](https://github.com/fakerybakery) for adding the Web UI and CLI part.
95
+
96
+ ## License
97
+
98
+ This library is under MIT License, which means it is free for both commercial and non-commercial use.
99
+
100
+ ## Acknowledgements
101
+
102
+ This implementation is based on [TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), [VITS2](https://github.com/daniilrobnikov/vits2) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work.
103
+
checkpoints/base_speakers/EN_V2/checkpoint.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:794226eb7c1745f3ca281b290613d5f39aa5b0d3b16a117009966f4aaf184757
3
+ size 207769356
checkpoints/base_speakers/EN_V2/config.json ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train": {
3
+ "segment_size": 16384
4
+ },
5
+ "data": {
6
+ "sampling_rate": 44100,
7
+ "filter_length": 2048,
8
+ "hop_length": 512,
9
+ "add_blank": true,
10
+ "n_speakers": 256,
11
+ "spk2id": {
12
+ "EN-US": 0,
13
+ "EN-BR": 1,
14
+ "EN-INDIA": 2,
15
+ "EN-AU": 4
16
+ }
17
+ },
18
+ "model": {
19
+ "use_spk_conditioned_encoder": true,
20
+ "use_noise_scaled_mas": true,
21
+ "use_mel_posterior_encoder": false,
22
+ "use_duration_discriminator": true,
23
+ "inter_channels": 192,
24
+ "hidden_channels": 192,
25
+ "filter_channels": 768,
26
+ "n_heads": 2,
27
+ "n_layers": 6,
28
+ "n_layers_trans_flow": 3,
29
+ "kernel_size": 3,
30
+ "p_dropout": 0.1,
31
+ "resblock": "1",
32
+ "resblock_kernel_sizes": [
33
+ 3,
34
+ 7,
35
+ 11
36
+ ],
37
+ "resblock_dilation_sizes": [
38
+ [
39
+ 1,
40
+ 3,
41
+ 5
42
+ ],
43
+ [
44
+ 1,
45
+ 3,
46
+ 5
47
+ ],
48
+ [
49
+ 1,
50
+ 3,
51
+ 5
52
+ ]
53
+ ],
54
+ "upsample_rates": [
55
+ 8,
56
+ 8,
57
+ 2,
58
+ 2,
59
+ 2
60
+ ],
61
+ "upsample_initial_channel": 512,
62
+ "upsample_kernel_sizes": [
63
+ 16,
64
+ 16,
65
+ 8,
66
+ 2,
67
+ 2
68
+ ],
69
+ "n_layers_q": 3,
70
+ "use_spectral_norm": false,
71
+ "gin_channels": 256
72
+ },
73
+ "symbols": [
74
+ "_",
75
+ "AA",
76
+ "E",
77
+ "EE",
78
+ "En",
79
+ "N",
80
+ "OO",
81
+ "V",
82
+ "a",
83
+ "a:",
84
+ "aa",
85
+ "ae",
86
+ "ah",
87
+ "ai",
88
+ "an",
89
+ "ang",
90
+ "ao",
91
+ "aw",
92
+ "ay",
93
+ "b",
94
+ "by",
95
+ "c",
96
+ "ch",
97
+ "d",
98
+ "dh",
99
+ "dy",
100
+ "e",
101
+ "e:",
102
+ "eh",
103
+ "ei",
104
+ "en",
105
+ "eng",
106
+ "er",
107
+ "ey",
108
+ "f",
109
+ "g",
110
+ "gy",
111
+ "h",
112
+ "hh",
113
+ "hy",
114
+ "i",
115
+ "i0",
116
+ "i:",
117
+ "ia",
118
+ "ian",
119
+ "iang",
120
+ "iao",
121
+ "ie",
122
+ "ih",
123
+ "in",
124
+ "ing",
125
+ "iong",
126
+ "ir",
127
+ "iu",
128
+ "iy",
129
+ "j",
130
+ "jh",
131
+ "k",
132
+ "ky",
133
+ "l",
134
+ "m",
135
+ "my",
136
+ "n",
137
+ "ng",
138
+ "ny",
139
+ "o",
140
+ "o:",
141
+ "ong",
142
+ "ou",
143
+ "ow",
144
+ "oy",
145
+ "p",
146
+ "py",
147
+ "q",
148
+ "r",
149
+ "ry",
150
+ "s",
151
+ "sh",
152
+ "t",
153
+ "th",
154
+ "ts",
155
+ "ty",
156
+ "u",
157
+ "u:",
158
+ "ua",
159
+ "uai",
160
+ "uan",
161
+ "uang",
162
+ "uh",
163
+ "ui",
164
+ "un",
165
+ "uo",
166
+ "uw",
167
+ "v",
168
+ "van",
169
+ "ve",
170
+ "vn",
171
+ "w",
172
+ "x",
173
+ "y",
174
+ "z",
175
+ "zh",
176
+ "zy",
177
+ "!",
178
+ "?",
179
+ "…",
180
+ ",",
181
+ ".",
182
+ "'",
183
+ "-",
184
+ "SP",
185
+ "UNK"
186
+ ],
187
+ "num_tones": 11,
188
+ "num_languages": 3
189
+ }