Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:59
05 Oct 2022

Recent deep learning approaches have achieved impressive performance in visually-guided sound source separation tasks. However, due to the lack of real-world mixed/separated audio sample pairs, most methods seriously rely on the "Mix-and-Separate" manner to learn sound source separation, often unsuitable for real-world mixtures. To address this issue, we utilize a semi-supervised learning technique --- preserving audio-visual consistency --- to improve the separation performance of real-world scenarios. in this way, our network is trained jointly by artificial and real-world mixtures. To the best of our knowledge, this could be the first attempt to improve real-world generalization. We also design a category-guided audio-visual fusion module to learn audio-visual matching. Comparative experiments are performed on two publicly-available datasets, MUSIC and AudioSet. Experiment results demonstrate that our method could often outperform other state-of-the-art ones in visual sound separation.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00