ON MULTIPLE-INPUT/BINAURAL-OUTPUT ANTIPHASIC SPEAKER SIGNAL EXTRACTION

Xianrui Wang (Northwestern Polytechnical University); Ningning Pan (Northwestern Polytechnical University); Jacob Benesty (INRS); Jingdong Chen (Northwestern Polytechnical University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener’s head, while the competing speech signal is perceived on the opposite side and also away from the listener’s head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker’s signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.

Tags:

Signal processing systems

ON MULTIPLE-INPUT/BINAURAL-OUTPUT ANTIPHASIC SPEAKER SIGNAL EXTRACTION

Xianrui Wang (Northwestern Polytechnical University); Ningning Pan (Northwestern Polytechnical University); Jacob Benesty (INRS); Jingdong Chen (Northwestern Polytechnical University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

CAN2V: CAN-BUS DATA-BASED SEQ2SEQ MODEL FOR VEHICLE VELOCITY PREDICTION

Hardware-limited Non-uniform Task-based Quantizers

CANCELLING INTERMODULATION DISTORTIONS FOR OTOACOUSTIC EMISSION MEASUREMENTS WITH EARBUDS

Join the IEEE Signal Processing Society