3-D Acoustic Modeling For Far-Field Multi-Channel Speech Recognition

Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:07

04 May 2020

The conventional approach to automatic speech recognition in multi-channel reverberant conditions involves a beamforming based enhancement of the multi-channel speech signal followed by a single channel neural acoustic model. In this paper, we propose to model the multi-channel signal directly using a convolutional neural network (CNN) based architecture which performs the joint acoustic modeling on the three dimensions of time, frequency and channel. The features that are input to the 3-D CNN are extracted by modeling the signal peaks in the spatio-spectral domain using a multi-variate autoregressive modeling approach. This AR model is efficient in capturing the channel correlations in the frequency domain of the multi-channel signal. The experiments are conducted on the CHiME-3 and REVERB Challenge dataset using multi-channel reverberant speech. In these experiments, the proposed 3-D feature and acoustic modeling approach provides significant improvements over an ASR system trained with beamformed audio (average relative improvements of 16% and 6% in word error rates for CHiME-3 and REVERB Challenge datasets respectively).

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

3-D Acoustic Modeling For Far-Field Multi-Channel Speech Recognition

Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society