ON MULTIPLE-INPUT/BINAURAL-OUTPUT ANTIPHASIC SPEAKER SIGNAL EXTRACTION
Xianrui Wang (Northwestern Polytechnical University); Ningning Pan (Northwestern Polytechnical University); Jacob Benesty (INRS); Jingdong Chen (Northwestern Polytechnical University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper studies the problem of target speaker signal exaction and
antiphasic rendering with an array of microphones in the scenarios
where there are two active speakers. Based on the important findings
achieved in the psychoacoustic field as well as our recent works
on single-channel speech enhancement, we present a rendering
based approach in which a temporal convolutional network (TCN)
is trained to take the multiple signals observed by the microphone
array as its inputs and generate two output (binaural) signals. The
TCN is trained in such a way that, when binaural output signals
are listened by the listener with headsets, the speech signal from
the desired speaker is perceived on one side of and close to the
listener’s head, while the competing speech signal is perceived on the
opposite side and also away from the listener’s head. Benefited from
rendering and the signal-to-interference ratio (SIR) improvement,
this antiphasic binaural presentation enables the listener to better
focus on the target speaker’s signal while ignoring the impact of the
competing speech. The modified rhyme tests (MRTs) are performed
to validate the superiority of the proposed method.