Emrys365 commited on
Commit
ba96e01
1 Parent(s): 7d88217

Update model

Browse files
Files changed (45) hide show
  1. README.md +379 -3
  2. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/44epoch.pth +3 -0
  3. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/config.yaml +236 -0
  4. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/enhanced_test_16k/RESULTS.md +24 -0
  5. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/enhanced_test_48k/RESULTS.md +18 -0
  6. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/backward_time.png +0 -0
  7. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/clip.png +0 -0
  8. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/forward_time.png +0 -0
  9. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/grad_norm.png +0 -0
  11. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/iter_time.png +0 -0
  12. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_16k.png +0 -0
  13. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png +0 -0
  14. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_24k.png +0 -0
  15. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_48k.png +0 -0
  16. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_8k.png +0 -0
  17. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png +0 -0
  18. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_16k.png +0 -0
  19. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png +0 -0
  20. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_8k.png +0 -0
  21. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png +0 -0
  22. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_5ch_16k.png +0 -0
  23. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_5ch_8k.png +0 -0
  24. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png +0 -0
  25. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png +0 -0
  26. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/loss.png +0 -0
  27. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/loss_scale.png +0 -0
  28. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/optim0_lr0.png +0 -0
  29. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/optim_step_time.png +0 -0
  30. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_16k.png +0 -0
  31. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_16k_r.png +0 -0
  32. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_24k.png +0 -0
  33. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_48k.png +0 -0
  34. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_8k.png +0 -0
  35. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_8k_r.png +0 -0
  36. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_16k.png +0 -0
  37. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_16k_r.png +0 -0
  38. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_8k.png +0 -0
  39. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_8k_r.png +0 -0
  40. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_5ch_16k.png +0 -0
  41. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_5ch_8k.png +0 -0
  42. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_8ch_16k_r.png +0 -0
  43. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_8ch_8k_r.png +0 -0
  44. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/train_time.png +0 -0
  45. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,379 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - universal_se
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `wyz/vctk_dns2020_whamr_conv_tasnet_small`
15
+
16
+ This model was trained by wyz using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag="wyz/vctk_dns2020_whamr_conv_tasnet_small",
32
+ normalize_output_wav=True,
33
+ device="cuda",
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config="exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/config.yaml",
38
+ # model_file="exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/xxxx.pth",
39
+ # normalize_output_wav=True,
40
+ # device="cuda",
41
+ # )
42
+
43
+ audio, fs = sf.read("/path/to/noisy/utt1.flac")
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
+ ```
46
+
47
+ module
48
+ <!-- Generated by scripts/utils/show_enh_score.sh -->
49
+ # RESULTS
50
+ ## Environments
51
+ - date: `Thu Feb 29 00:21:34 EST 2024`
52
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
53
+ - espnet version: `espnet 202304`
54
+ - pytorch version: `pytorch 2.0.1+cu118`
55
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
56
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
57
+
58
+
59
+ ## enhanced_test_16k
60
+
61
+
62
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
63
+ |---|---|---|---|---|---|---|---|---|---|---|
64
+ |chime4_et05_real_isolated_6ch_track|1.21|53.13|-2.72|-2.72|0.00|-31.10|2.85|3.22|3.71|3.24|
65
+ |chime4_et05_simu_isolated_6ch_track|1.41|82.10|7.96|7.96|0.00|1.39|2.80|3.12|3.84|3.04|
66
+ |dns20_tt_synthetic_no_reverb|2.51|95.75|15.75|15.75|0.00|15.74|3.19|3.49|3.96|3.76|
67
+ |reverb_et_real_8ch_multich|1.22|70.41|3.82|3.82|0.00|1.50|2.85|3.24|3.70|3.42|
68
+ |reverb_et_simu_8ch_multich|1.64|87.62|9.39|9.39|0.00|-9.93|2.86|3.30|3.64|3.58|
69
+ |whamr_tt_mix_single_reverb_max_16k|1.74|89.96|9.31|9.31|0.00|6.20|3.12|3.39|4.02|3.41|
70
+
71
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
72
+ # RESULTS
73
+ ## Environments
74
+ - date: `Wed Feb 14 08:28:52 EST 2024`
75
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
76
+ - espnet version: `espnet 202304`
77
+ - pytorch version: `pytorch 2.0.1+cu118`
78
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
79
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
80
+
81
+
82
+ ## enhanced_test_48k
83
+
84
+
85
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
86
+ |---|---|---|---|---|---|---|---|---|---|
87
+ |vctk_noisy_tt_2spk|94.04|13.02|13.02|0.00|12.77|2.94|3.38|3.70|3.36|
88
+
89
+ ## ENH config
90
+
91
+ <details><summary>expand</summary>
92
+
93
+ ```
94
+ config: conf/tuning/train_enh_conv_tasnet_small.yaml
95
+ print_config: false
96
+ log_level: INFO
97
+ dry_run: false
98
+ iterator_type: chunk
99
+ output_dir: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw
100
+ ngpu: 1
101
+ seed: 0
102
+ num_workers: 4
103
+ num_att_plot: 3
104
+ dist_backend: nccl
105
+ dist_init_method: env://
106
+ dist_world_size: null
107
+ dist_rank: null
108
+ local_rank: 0
109
+ dist_master_addr: null
110
+ dist_master_port: null
111
+ dist_launcher: null
112
+ multiprocessing_distributed: false
113
+ unused_parameters: true
114
+ sharded_ddp: false
115
+ cudnn_enabled: true
116
+ cudnn_benchmark: false
117
+ cudnn_deterministic: true
118
+ collect_stats: false
119
+ write_collected_feats: false
120
+ max_epoch: 100
121
+ patience: 10
122
+ val_scheduler_criterion:
123
+ - valid
124
+ - loss
125
+ early_stopping_criterion:
126
+ - valid
127
+ - loss
128
+ - min
129
+ best_model_criterion:
130
+ - - valid
131
+ - loss
132
+ - min
133
+ keep_nbest_models: 1
134
+ nbest_averaging_interval: 0
135
+ grad_clip: 5.0
136
+ grad_clip_type: 2.0
137
+ grad_noise: false
138
+ accum_grad: 1
139
+ no_forward_run: false
140
+ resume: true
141
+ save_interval: 1000
142
+ train_dtype: float32
143
+ use_amp: false
144
+ log_interval: null
145
+ use_matplotlib: true
146
+ use_tensorboard: true
147
+ create_graph_in_tensorboard: false
148
+ use_wandb: false
149
+ wandb_project: null
150
+ wandb_id: null
151
+ wandb_entity: null
152
+ wandb_name: null
153
+ wandb_model_log_interval: -1
154
+ detect_anomaly: false
155
+ pretrain_path: null
156
+ init_param: []
157
+ ignore_init_mismatch: false
158
+ freeze_param: []
159
+ num_iters_per_epoch: 8000
160
+ num_iters_valid: null
161
+ batch_size: 4
162
+ valid_batch_size: null
163
+ batch_bins: 1000000
164
+ valid_batch_bins: null
165
+ train_shape_file:
166
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
167
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
168
+ valid_shape_file:
169
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
170
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
171
+ batch_type: folded
172
+ valid_batch_type: null
173
+ fold_length:
174
+ - 80000
175
+ - 80000
176
+ sort_in_batch: descending
177
+ sort_batch: descending
178
+ multiple_iterator: false
179
+ chunk_length: 32000
180
+ chunk_shift_ratio: 0.5
181
+ num_cache_chunks: 1024
182
+ chunk_excluded_key_prefixes: []
183
+ chunk_discard_short_samples: false
184
+ train_data_path_and_name_and_type:
185
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
186
+ - speech_mix
187
+ - sound
188
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
189
+ - speech_ref1
190
+ - sound
191
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
192
+ - category
193
+ - text
194
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
195
+ - fs
196
+ - text_int
197
+ valid_data_path_and_name_and_type:
198
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
199
+ - speech_mix
200
+ - sound
201
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
202
+ - speech_ref1
203
+ - sound
204
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
205
+ - category
206
+ - text
207
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
208
+ - fs
209
+ - text_int
210
+ allow_variable_data_keys: false
211
+ max_cache_size: 0.0
212
+ max_cache_fd: 32
213
+ allow_multi_rates: true
214
+ valid_max_cache_size: null
215
+ exclude_weight_decay: false
216
+ exclude_weight_decay_conf: {}
217
+ optim: adam
218
+ optim_conf:
219
+ lr: 0.001
220
+ eps: 1.0e-08
221
+ weight_decay: 1.0e-05
222
+ scheduler: steplr
223
+ scheduler_conf:
224
+ step_size: 2
225
+ gamma: 0.99
226
+ init: null
227
+ model_conf:
228
+ normalize_variance_per_ch: true
229
+ always_forward_in_48k: true
230
+ categories:
231
+ - 1ch_8k
232
+ - 1ch_8k_r
233
+ - 1ch_16k_r
234
+ - 1ch_48k
235
+ - 1ch_24k
236
+ - 1ch_16k
237
+ - 2ch_8k
238
+ - 2ch_8k_r
239
+ - 2ch_16k
240
+ - 2ch_16k_r
241
+ - 5ch_8k
242
+ - 5ch_16k
243
+ - 8ch_8k_r
244
+ - 8ch_16k_r
245
+ criterions:
246
+ - name: mr_l1_tfd
247
+ conf:
248
+ window_sz:
249
+ - 256
250
+ - 512
251
+ - 768
252
+ - 1024
253
+ hop_sz: null
254
+ eps: 1.0e-08
255
+ time_domain_weight: 0.5
256
+ normalize_variance: true
257
+ use_builtin_complex: true
258
+ wrapper: fixed_order
259
+ wrapper_conf:
260
+ weight: 1.0
261
+ - name: si_snr
262
+ conf:
263
+ eps: 1.0e-07
264
+ wrapper: fixed_order
265
+ wrapper_conf:
266
+ weight: 0.0
267
+ speech_volume_normalize: null
268
+ rir_scp: null
269
+ rir_apply_prob: 1.0
270
+ noise_scp: null
271
+ noise_apply_prob: 1.0
272
+ noise_db_range: '13_15'
273
+ short_noise_thres: 0.5
274
+ use_reverberant_ref: false
275
+ num_spk: 1
276
+ num_noise_type: 1
277
+ sample_rate: 8000
278
+ force_single_channel: true
279
+ channel_reordering: true
280
+ categories:
281
+ - 1ch_8k
282
+ - 1ch_8k_r
283
+ - 1ch_16k_r
284
+ - 1ch_48k
285
+ - 1ch_24k
286
+ - 1ch_16k
287
+ - 2ch_8k
288
+ - 2ch_8k_r
289
+ - 2ch_16k
290
+ - 2ch_16k_r
291
+ - 5ch_8k
292
+ - 5ch_16k
293
+ - 8ch_8k_r
294
+ - 8ch_16k_r
295
+ speech_segment: null
296
+ avoid_allzero_segment: true
297
+ flexible_numspk: false
298
+ dynamic_mixing: false
299
+ utt2spk: null
300
+ dynamic_mixing_gain_db: 0.0
301
+ encoder: conv
302
+ encoder_conf:
303
+ channel: 1536
304
+ kernel_size: 120
305
+ stride: 60
306
+ separator: tcn
307
+ separator_conf:
308
+ num_spk: 1
309
+ layer: 8
310
+ stack: 4
311
+ bottleneck_dim: 64
312
+ hidden_dim: 128
313
+ kernel: 3
314
+ causal: false
315
+ norm_type: gLN
316
+ nonlinear: relu
317
+ decoder: conv
318
+ decoder_conf:
319
+ channel: 1536
320
+ kernel_size: 120
321
+ stride: 60
322
+ mask_module: multi_mask
323
+ mask_module_conf: {}
324
+ preprocessor: enh
325
+ preprocessor_conf: {}
326
+ required:
327
+ - output_dir
328
+ version: '202304'
329
+ distributed: false
330
+ ```
331
+
332
+ </details>
333
+
334
+
335
+
336
+ ### Citing ESPnet
337
+
338
+ ```BibTex
339
+ @inproceedings{watanabe2018espnet,
340
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
341
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
342
+ year={2018},
343
+ booktitle={Proceedings of Interspeech},
344
+ pages={2207--2211},
345
+ doi={10.21437/Interspeech.2018-1456},
346
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
347
+ }
348
+
349
+
350
+ @inproceedings{ESPnet-SE,
351
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
352
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
353
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
354
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
355
+ pages = {785--792},
356
+ publisher = {{IEEE}},
357
+ year = {2021},
358
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
359
+ doi = {10.1109/SLT48900.2021.9383615},
360
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
361
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
362
+ bibsource = {dblp computer science bibliography, https://dblp.org}
363
+ }
364
+
365
+
366
+ ```
367
+
368
+ or arXiv:
369
+
370
+ ```bibtex
371
+ @misc{watanabe2018espnet,
372
+ title={ESPnet: End-to-End Speech Processing Toolkit},
373
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
374
+ year={2018},
375
+ eprint={1804.00015},
376
+ archivePrefix={arXiv},
377
+ primaryClass={cs.CL}
378
+ }
379
+ ```
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/44epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6941f087feebba839d5e8e208ef308d99c686082cacaa034f6159d858b50393
3
+ size 4604021
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/config.yaml ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_conv_tasnet_small.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 10
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ save_interval: 1000
49
+ train_dtype: float32
50
+ use_amp: false
51
+ log_interval: null
52
+ use_matplotlib: true
53
+ use_tensorboard: true
54
+ create_graph_in_tensorboard: false
55
+ use_wandb: false
56
+ wandb_project: null
57
+ wandb_id: null
58
+ wandb_entity: null
59
+ wandb_name: null
60
+ wandb_model_log_interval: -1
61
+ detect_anomaly: false
62
+ pretrain_path: null
63
+ init_param: []
64
+ ignore_init_mismatch: false
65
+ freeze_param: []
66
+ num_iters_per_epoch: 8000
67
+ num_iters_valid: null
68
+ batch_size: 4
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
74
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
75
+ valid_shape_file:
76
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
77
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
78
+ batch_type: folded
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 80000
82
+ - 80000
83
+ sort_in_batch: descending
84
+ sort_batch: descending
85
+ multiple_iterator: false
86
+ chunk_length: 32000
87
+ chunk_shift_ratio: 0.5
88
+ num_cache_chunks: 1024
89
+ chunk_excluded_key_prefixes: []
90
+ chunk_discard_short_samples: false
91
+ train_data_path_and_name_and_type:
92
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
93
+ - speech_mix
94
+ - sound
95
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
96
+ - speech_ref1
97
+ - sound
98
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
99
+ - category
100
+ - text
101
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
102
+ - fs
103
+ - text_int
104
+ valid_data_path_and_name_and_type:
105
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
106
+ - speech_mix
107
+ - sound
108
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
109
+ - speech_ref1
110
+ - sound
111
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
112
+ - category
113
+ - text
114
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
115
+ - fs
116
+ - text_int
117
+ allow_variable_data_keys: false
118
+ max_cache_size: 0.0
119
+ max_cache_fd: 32
120
+ allow_multi_rates: true
121
+ valid_max_cache_size: null
122
+ exclude_weight_decay: false
123
+ exclude_weight_decay_conf: {}
124
+ optim: adam
125
+ optim_conf:
126
+ lr: 0.001
127
+ eps: 1.0e-08
128
+ weight_decay: 1.0e-05
129
+ scheduler: steplr
130
+ scheduler_conf:
131
+ step_size: 2
132
+ gamma: 0.99
133
+ init: null
134
+ model_conf:
135
+ normalize_variance_per_ch: true
136
+ always_forward_in_48k: true
137
+ categories:
138
+ - 1ch_8k
139
+ - 1ch_8k_r
140
+ - 1ch_16k_r
141
+ - 1ch_48k
142
+ - 1ch_24k
143
+ - 1ch_16k
144
+ - 2ch_8k
145
+ - 2ch_8k_r
146
+ - 2ch_16k
147
+ - 2ch_16k_r
148
+ - 5ch_8k
149
+ - 5ch_16k
150
+ - 8ch_8k_r
151
+ - 8ch_16k_r
152
+ criterions:
153
+ - name: mr_l1_tfd
154
+ conf:
155
+ window_sz:
156
+ - 256
157
+ - 512
158
+ - 768
159
+ - 1024
160
+ hop_sz: null
161
+ eps: 1.0e-08
162
+ time_domain_weight: 0.5
163
+ normalize_variance: true
164
+ use_builtin_complex: true
165
+ wrapper: fixed_order
166
+ wrapper_conf:
167
+ weight: 1.0
168
+ - name: si_snr
169
+ conf:
170
+ eps: 1.0e-07
171
+ wrapper: fixed_order
172
+ wrapper_conf:
173
+ weight: 0.0
174
+ speech_volume_normalize: null
175
+ rir_scp: null
176
+ rir_apply_prob: 1.0
177
+ noise_scp: null
178
+ noise_apply_prob: 1.0
179
+ noise_db_range: '13_15'
180
+ short_noise_thres: 0.5
181
+ use_reverberant_ref: false
182
+ num_spk: 1
183
+ num_noise_type: 1
184
+ sample_rate: 8000
185
+ force_single_channel: true
186
+ channel_reordering: true
187
+ categories:
188
+ - 1ch_8k
189
+ - 1ch_8k_r
190
+ - 1ch_16k_r
191
+ - 1ch_48k
192
+ - 1ch_24k
193
+ - 1ch_16k
194
+ - 2ch_8k
195
+ - 2ch_8k_r
196
+ - 2ch_16k
197
+ - 2ch_16k_r
198
+ - 5ch_8k
199
+ - 5ch_16k
200
+ - 8ch_8k_r
201
+ - 8ch_16k_r
202
+ speech_segment: null
203
+ avoid_allzero_segment: true
204
+ flexible_numspk: false
205
+ dynamic_mixing: false
206
+ utt2spk: null
207
+ dynamic_mixing_gain_db: 0.0
208
+ encoder: conv
209
+ encoder_conf:
210
+ channel: 1536
211
+ kernel_size: 120
212
+ stride: 60
213
+ separator: tcn
214
+ separator_conf:
215
+ num_spk: 1
216
+ layer: 8
217
+ stack: 4
218
+ bottleneck_dim: 64
219
+ hidden_dim: 128
220
+ kernel: 3
221
+ causal: false
222
+ norm_type: gLN
223
+ nonlinear: relu
224
+ decoder: conv
225
+ decoder_conf:
226
+ channel: 1536
227
+ kernel_size: 120
228
+ stride: 60
229
+ mask_module: multi_mask
230
+ mask_module_conf: {}
231
+ preprocessor: enh
232
+ preprocessor_conf: {}
233
+ required:
234
+ - output_dir
235
+ version: '202304'
236
+ distributed: false
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/enhanced_test_16k/RESULTS.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ module
2
+ <!-- Generated by scripts/utils/show_enh_score.sh -->
3
+ # RESULTS
4
+ ## Environments
5
+ - date: `Thu Feb 29 00:21:34 EST 2024`
6
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
7
+ - espnet version: `espnet 202304`
8
+ - pytorch version: `pytorch 2.0.1+cu118`
9
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
10
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
11
+
12
+
13
+ ## enhanced_test_16k
14
+
15
+
16
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
17
+ |---|---|---|---|---|---|---|---|---|---|---|
18
+ |chime4_et05_real_isolated_6ch_track|1.21|53.13|-2.72|-2.72|0.00|-31.10|2.85|3.22|3.71|3.24|
19
+ |chime4_et05_simu_isolated_6ch_track|1.41|82.10|7.96|7.96|0.00|1.39|2.80|3.12|3.84|3.04|
20
+ |dns20_tt_synthetic_no_reverb|2.51|95.75|15.75|15.75|0.00|15.74|3.19|3.49|3.96|3.76|
21
+ |reverb_et_real_8ch_multich|1.22|70.41|3.82|3.82|0.00|1.50|2.85|3.24|3.70|3.42|
22
+ |reverb_et_simu_8ch_multich|1.64|87.62|9.39|9.39|0.00|-9.93|2.86|3.30|3.64|3.58|
23
+ |whamr_tt_mix_single_reverb_max_16k|1.74|89.96|9.31|9.31|0.00|6.20|3.12|3.39|4.02|3.41|
24
+
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/enhanced_test_48k/RESULTS.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Feb 14 08:28:52 EST 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_48k
13
+
14
+
15
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|
17
+ |vctk_noisy_tt_2spk|94.04|13.02|13.02|0.00|12.77|2.94|3.38|3.70|3.36|
18
+
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/backward_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/clip.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/forward_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/gpu_max_cached_mem_GB.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/grad_norm.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/iter_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/loss.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/loss_scale.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/optim0_lr0.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/optim_step_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/si_snr_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ model_file: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/44epoch.pth
4
+ python: "3.8.16 (default, Mar 2 2023, 03:21:46) \n[GCC 11.2.0]"
5
+ timestamp: 1723016741.554693
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ train_config: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_small_raw/config.yaml