AttentionPIT: Soft permutation invariant training for audio source separation with attention mechanism

HIrokazu Kameoka, Shogo Seki, Li Li, Chihiro Watanabe

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:47

11 May 2022

Permutation invariant training (PIT) is a framework to achieve end-to-end time-domain audio source separation. Its goal is to train a separation network that takes a mixture signal as input and produces J source signals. The idea of PIT is to first find the best output-target assignment and then update the network parameters based on that assignment at each iteration. However, there are two problems with PIT: One is that it has a time complexity of O(J!), which makes it infeasible as J increases, and the other is that it is prone to getting stuck in bad local optima due to the hard output-target assignment process. To overcome these problems, we propose AttentionPIT, which uses an attention mechanism to find soft output-target assignments, and can be run in polynomial time in J, as with the fast PIT variants such as SinkPIT and HungarianPIT. The training loss of AttentionPIT is fully differentiable, allowing us to simultaneously perform soft output-target assignment and network parameter update through backpropagation. Experiments on the LibriMix corpus revealed that while AttentionPIT works reasonably well, it works even better when combined with SinkPIT and HungarianPIT so that AttentionPIT is run only in the early stages of training.

Tags:

permutation invariant training (pit)

attention

end-to-end audio source separation

AttentionPIT: Soft permutation invariant training for audio source separation with attention mechanism

HIrokazu Kameoka, Shogo Seki, Li Li, Chihiro Watanabe

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ATTEN-ADAPTER: A UNIFIED ATTENTION-BASED ADAPTER FOR EFFICIENT TUNING

Cross-Inferential Networks for Source-free Unsupervised Domain Adaptation

SGSR: A Saliency-Guided Image Super-Resolution Network

Join the IEEE Signal Processing Society