Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:47
11 May 2022

Permutation invariant training (PIT) is a framework to achieve end-to-end time-domain audio source separation. Its goal is to train a separation network that takes a mixture signal as input and produces J source signals. The idea of PIT is to first find the best output-target assignment and then update the network parameters based on that assignment at each iteration. However, there are two problems with PIT: One is that it has a time complexity of O(J!), which makes it infeasible as J increases, and the other is that it is prone to getting stuck in bad local optima due to the hard output-target assignment process. To overcome these problems, we propose AttentionPIT, which uses an attention mechanism to find soft output-target assignments, and can be run in polynomial time in J, as with the fast PIT variants such as SinkPIT and HungarianPIT. The training loss of AttentionPIT is fully differentiable, allowing us to simultaneously perform soft output-target assignment and network parameter update through backpropagation. Experiments on the LibriMix corpus revealed that while AttentionPIT works reasonably well, it works even better when combined with SinkPIT and HungarianPIT so that AttentionPIT is run only in the early stages of training.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00