metadata
license: cc-by-nc-4.0
language:
- ja
tags:
- music
- speech
- audio
- audio-to-audio
- a cappella
- vocal ensemble
datasets:
- jaCappella
metrics:
- SI-SDR
MRDLA trained with the jaCappella corpus for vocal ensemble separation
This model was trained by Tomohiko Nakamura using the codebase).
It was trained on the vocal ensemble separation task of the jaCappella dataset.
The paper was published in ICASSP 2023 (arXiv).
License
See the jaCappella dataset page.
Citation
See the jaCappella dataset page.
For MRDLA, please cite the following paper.
@article{TNakamura202104IEEEACMTASLP,
author={Nakamura, Tomohiko and Kozuka, Shihori and Saruwatari, Hiroshi},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title = {Time-domain audio source separation with neural networks based on multiresolution analysis},
year=2021,
doi={10.1109/TASLP.2021.3072496},
month=apr,
volume=29,
pages={1687--1701},
}
Configuration
data:
in_memory: true
num_workers: 12
sample_rate: 48000
samples_per_track: 13
seed: 42
seq_dur: 6.0
source_augmentations:
- gain
sources:
- vocal_percussion
- bass
- alto
- tenor
- soprano
- lead_vocal
loss_func:
lambda_t: 10.0
lambda_f: 1.0
band: high
model:
C_dec: 64
C_enc: 64
C_mid: 768
L: 12
activation: GELU
context: false
f_dec: 21
f_enc: 21
input_length: 288000
padding_type: reflect
signal_ch: 1
wavelet: haar
optim:
lr: 0.0001
lr_decay_gamma: 0.3
lr_decay_patience: 50
optimizer: adam
patience: 1000
weight_decay: 0.0
training:
batch_size: 16
epochs: 1000
Results (SI-SDR [dB]) on vocal ensemble separation
Method | Lead vocal | Soprano | Alto | Tenor | Bass | Vocal percussion |
---|---|---|---|---|---|---|
MRDLA | 8.7 | 11.8 | 14.7 | 11.3 | 10.2 | 22.1 |