Siddhant commited on
Commit
bdd8c05
1 Parent(s): f8cf7a0

import from zenodo

Browse files
Files changed (19) hide show
  1. README.md +50 -0
  2. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/config.yaml +261 -0
  3. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/backward_time.png +0 -0
  4. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/duration_loss.png +0 -0
  5. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/energy_loss.png +0 -0
  6. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/forward_time.png +0 -0
  7. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/gpu_max_cached_mem_GB.png +0 -0
  8. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/iter_time.png +0 -0
  9. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/l1_loss.png +0 -0
  10. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/loss.png +0 -0
  11. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/optim0_lr0.png +0 -0
  12. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/optim_step_time.png +0 -0
  13. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/pitch_loss.png +0 -0
  14. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/train_time.png +0 -0
  15. exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/train.loss.ave_5best.pth +3 -0
  16. exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/energy_stats.npz +0 -0
  17. exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/feats_stats.npz +0 -0
  18. exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/pitch_stats.npz +0 -0
  19. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - text-to-speech
6
+ language: ja
7
+ datasets:
8
+ - jsut
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 TTS pretrained model
12
+ ### `kan-bayashi/jsut_conformer_fastspeech2_transformer_prosody`
13
+ ♻️ Imported from https://zenodo.org/record/5499066/
14
+
15
+ This model was trained by kan-bayashi using jsut/tts1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+ @inproceedings{hayashi2020espnet,
32
+ title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
33
+ author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
34
+ booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
35
+ pages={7654--7658},
36
+ year={2020},
37
+ organization={IEEE}
38
+ }
39
+ ```
40
+ or arXiv:
41
+ ```bibtex
42
+ @misc{watanabe2018espnet,
43
+ title={ESPnet: End-to-End Speech Processing Toolkit},
44
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
45
+ year={2018},
46
+ eprint={1804.00015},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL}
49
+ }
50
+ ```
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/config.yaml ADDED
@@ -0,0 +1,261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_conformer_fastspeech2.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 54890
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: false
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 200
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ - - train
41
+ - loss
42
+ - min
43
+ keep_nbest_models: 5
44
+ grad_clip: 1.0
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 1
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: false
52
+ log_interval: null
53
+ use_tensorboard: true
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: 1000
66
+ batch_size: 20
67
+ valid_batch_size: null
68
+ batch_bins: 6000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/text_shape.phn
72
+ - exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/speech_shape
73
+ valid_shape_file:
74
+ - exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/valid/text_shape.phn
75
+ - exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/valid/speech_shape
76
+ batch_type: numel
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 150
80
+ - 240000
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/tr_no_dev/text
89
+ - text
90
+ - text
91
+ - - exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/tr_no_dev/durations
92
+ - durations
93
+ - text_int
94
+ - - dump/raw/tr_no_dev/wav.scp
95
+ - speech
96
+ - sound
97
+ valid_data_path_and_name_and_type:
98
+ - - dump/raw/dev/text
99
+ - text
100
+ - text
101
+ - - exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/dev/durations
102
+ - durations
103
+ - text_int
104
+ - - dump/raw/dev/wav.scp
105
+ - speech
106
+ - sound
107
+ allow_variable_data_keys: false
108
+ max_cache_size: 0.0
109
+ max_cache_fd: 32
110
+ valid_max_cache_size: null
111
+ optim: adam
112
+ optim_conf:
113
+ lr: 1.0
114
+ scheduler: noamlr
115
+ scheduler_conf:
116
+ model_size: 384
117
+ warmup_steps: 4000
118
+ token_list:
119
+ - <blank>
120
+ - <unk>
121
+ - a
122
+ - o
123
+ - i
124
+ - '['
125
+ - '#'
126
+ - u
127
+ - ']'
128
+ - e
129
+ - k
130
+ - n
131
+ - t
132
+ - r
133
+ - s
134
+ - N
135
+ - m
136
+ - _
137
+ - sh
138
+ - d
139
+ - g
140
+ - ^
141
+ - $
142
+ - w
143
+ - cl
144
+ - h
145
+ - y
146
+ - b
147
+ - j
148
+ - ts
149
+ - ch
150
+ - z
151
+ - p
152
+ - f
153
+ - ky
154
+ - ry
155
+ - gy
156
+ - hy
157
+ - ny
158
+ - by
159
+ - my
160
+ - py
161
+ - v
162
+ - dy
163
+ - '?'
164
+ - ty
165
+ - <sos/eos>
166
+ odim: null
167
+ model_conf: {}
168
+ use_preprocessor: true
169
+ token_type: phn
170
+ bpemodel: null
171
+ non_linguistic_symbols: null
172
+ cleaner: jaconv
173
+ g2p: pyopenjtalk_prosody
174
+ feats_extract: fbank
175
+ feats_extract_conf:
176
+ n_fft: 2048
177
+ hop_length: 300
178
+ win_length: 1200
179
+ fs: 24000
180
+ fmin: 80
181
+ fmax: 7600
182
+ n_mels: 80
183
+ normalize: global_mvn
184
+ normalize_conf:
185
+ stats_file: exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/feats_stats.npz
186
+ tts: fastspeech2
187
+ tts_conf:
188
+ adim: 384
189
+ aheads: 2
190
+ elayers: 4
191
+ eunits: 1536
192
+ dlayers: 4
193
+ dunits: 1536
194
+ positionwise_layer_type: conv1d
195
+ positionwise_conv_kernel_size: 3
196
+ duration_predictor_layers: 2
197
+ duration_predictor_chans: 256
198
+ duration_predictor_kernel_size: 3
199
+ postnet_layers: 5
200
+ postnet_filts: 5
201
+ postnet_chans: 256
202
+ use_masking: true
203
+ encoder_normalize_before: true
204
+ decoder_normalize_before: true
205
+ reduction_factor: 1
206
+ encoder_type: conformer
207
+ decoder_type: conformer
208
+ conformer_pos_enc_layer_type: rel_pos
209
+ conformer_self_attn_layer_type: rel_selfattn
210
+ conformer_activation_type: swish
211
+ use_macaron_style_in_conformer: true
212
+ use_cnn_in_conformer: true
213
+ conformer_enc_kernel_size: 7
214
+ conformer_dec_kernel_size: 31
215
+ init_type: xavier_uniform
216
+ transformer_enc_dropout_rate: 0.2
217
+ transformer_enc_positional_dropout_rate: 0.2
218
+ transformer_enc_attn_dropout_rate: 0.2
219
+ transformer_dec_dropout_rate: 0.2
220
+ transformer_dec_positional_dropout_rate: 0.2
221
+ transformer_dec_attn_dropout_rate: 0.2
222
+ pitch_predictor_layers: 5
223
+ pitch_predictor_chans: 256
224
+ pitch_predictor_kernel_size: 5
225
+ pitch_predictor_dropout: 0.5
226
+ pitch_embed_kernel_size: 1
227
+ pitch_embed_dropout: 0.0
228
+ stop_gradient_from_pitch_predictor: true
229
+ energy_predictor_layers: 2
230
+ energy_predictor_chans: 256
231
+ energy_predictor_kernel_size: 3
232
+ energy_predictor_dropout: 0.5
233
+ energy_embed_kernel_size: 1
234
+ energy_embed_dropout: 0.0
235
+ stop_gradient_from_energy_predictor: false
236
+ pitch_extract: dio
237
+ pitch_extract_conf:
238
+ fs: 24000
239
+ n_fft: 2048
240
+ hop_length: 300
241
+ f0max: 400
242
+ f0min: 80
243
+ reduction_factor: 1
244
+ pitch_normalize: global_mvn
245
+ pitch_normalize_conf:
246
+ stats_file: exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/pitch_stats.npz
247
+ energy_extract: energy
248
+ energy_extract_conf:
249
+ fs: 24000
250
+ n_fft: 2048
251
+ hop_length: 300
252
+ win_length: 1200
253
+ reduction_factor: 1
254
+ energy_normalize: global_mvn
255
+ energy_normalize_conf:
256
+ stats_file: exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/energy_stats.npz
257
+ required:
258
+ - output_dir
259
+ - token_list
260
+ version: 0.10.3a2
261
+ distributed: true
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/backward_time.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/duration_loss.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/energy_loss.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/forward_time.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/gpu_max_cached_mem_GB.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/iter_time.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/l1_loss.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/loss.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/optim0_lr0.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/optim_step_time.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/pitch_loss.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/images/train_time.png ADDED
exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/train.loss.ave_5best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91e1d03376f1d60ed83792efa024ad9e020b2bef0349bd7a402d2f8b0983402b
3
+ size 281529561
exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/energy_stats.npz ADDED
Binary file (770 Bytes). View file
 
exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
 
exp/tts_train_transformer_raw_phn_jaconv_pyopenjtalk_prosody/decode_use_teacher_forcingtrue_train.loss.ave/stats/train/pitch_stats.npz ADDED
Binary file (770 Bytes). View file
 
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a2
2
+ files:
3
+ model_file: exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/train.loss.ave_5best.pth
4
+ python: "3.7.3 (default, Mar 27 2019, 22:11:17) \n[GCC 7.3.0]"
5
+ timestamp: 1631246566.918887
6
+ torch: 1.7.1
7
+ yaml_files:
8
+ train_config: exp/tts_train_conformer_fastspeech2_transformer_teacher_raw_phn_jaconv_pyopenjtalk_prosody/config.yaml