Skip to main content

Framewise multiple sound source localization and counting using binaural spatial audio signals

Lei Wang (Shanghai Jiao Tong University); Zhibin Jiao (Huawei Technologies Co., Ltd.); Qiyong Zhao ( Huawei Technologies Co., Ltd.); jie zhu (Shanghai Jiao Tong University); Yang Fu (Huawei Technologies Co., Ltd.)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Sound source localization is the problem of estimating the positions of one or several sound sources. In terms of binaural audio, localization is a paramount perceptual characteristic which can be assessed subjectively or objectively. For objective evaluation of binaural sound localization, typical methods exploit binaural or monaural cues to predict directions of sound sources. Since multiple sound sources are often perceived simultaneously in daily sound scenes, an objective sound localization model which can detect temporally overlapping sources is required. In this paper, we propose a binaural multiple sound source localization network (BMSSLnet) model, which can predict framewise azimuths without prior knowledge of sound source number in a binaural audio signal. We implement multiple azimuth prediction as a multi-label classification task, and propose to use separated multi-label cross-entropy and mean square error as the loss function. Experimental results show that the proposed model obtains the average precision of 0.9 and 0.75 for spatial prediction on the anechoic dataset and reverberant dataset with up to three temporally overlapping sources, respectively. Framewise temporal prediction with average accuracy of 38.3 ms is achieved.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00