DPTNet trained with the jaCappella corpus for vocal ensemble separation

This model was trained by Tomohiko Nakamura using the codebase).
It was trained on the vocal ensemble separation task of the jaCappella dataset.
The paper was published in ICASSP 2023 (arXiv).

License

See the jaCappella dataset page.

Citation

See the jaCappella dataset page.

Configuration

data:
  num_workers: 12
  sample_rate: 48000
  samples_per_track: 13
  seed: 42
  seq_dur: 5.046
  source_augmentations:
  - gain
  sources:
  - vocal_percussion
  - bass
  - alto
  - tenor
  - soprano
  - lead_vocal
filterbank:
  kernel_size: 32
  n_filters: 64
  stride: 16
masknet:
  bidirectional: true
  chunk_size: 174
  dropout: 0
  ff_activation: relu
  ff_hid: 256
  hop_size: 128
  in_chan: 64
  mask_act: sigmoid
  n_repeats: 8
  n_src: 6
  norm_type: gLN
  out_chan: 64
optim:
  lr: 0.005
  optimizer: adam
  weight_decay: 1.0e-05
training:
  batch_size: 1
  early_stop: true
  epochs: 600
  gradient_clipping: 5
  half_lr: true
  loss_func: pit_sisdr

Results (SI-SDR [dB]) on vocal ensemble separation

Method	Lead vocal	Soprano	Alto	Tenor	Bass	Vocal percussion
DPTNet	8.9	8.5	11.9	14.9	19.7	21.9