Towards Improved Room Impulse Response Estimation for Speech Recognition
Anton J Ratnarajah (University of Maryland, College Park); Ishwarya Ananthabhotla (Reality Labs Research at Meta, Redmond, WA ); Vamsi Krishna Ithapu (Reality Labs Research at Meta, Redmond, WA); Pablo Hoffmann ( Reality Labs Research at Meta, Redmond, WA); Dinesh Manocha (University of Maryland at College Park); Paul Calamia ( Reality Labs Research at Meta, Redmond, WA)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We propose to characterize and improve the performance of blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a GAN-based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 72% on the energy decay relief and 22% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9% in word error rate).