Synthetic Data For Dnn-Based Doa Estimation Of Indoor Speech
Femke B. Gelderblom, Yi Liu, Johannes Kvam, Tor Andre Myrvoll
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:08:27
This paper investigates the use of different room impulse response (RIR) simulation methods for synthesizing training data for deep neural network-based direction of arrival (DOA) estimation of speech in reverberant rooms. Different sets of synthetic RIRs are obtained using the image source method (ISM) and more advanced methods including diffuse reflections and/or source directivity. Multi-layer perceptron (MLP) deep neural network (DNN) models are trained on generalized cross correlation (GCC) features extracted for each set. Finally, models are tested on features obtained from measured RIRs. This study shows the importance of training with RIRs from directive sources, as resultant DOA models achieved up to 51% error reduction compared to the steered response power with phase transform (SRP-PHAT) baseline (significant with p
Chairs:
Peter Vouras