NORD: Non-Matching Reference Based Relative Depth Estimation From Binaural Audio
Pranay Manocha (Princeton University); Israel D Gebru (Facebook); Anurag Kumar (Facebook Research); Dejan Markovic (Facebook Reality Labs); Alexander Richard (Facebook Reality Labs)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We propose NORD: a novel framework for estimating the relative depth between two binaural speech recordings. In contrast to existing depth estimation techniques, ours only requires audio signals as input. We trained the framework to solve depth preference (i.e. which input perceptually sounds closer to the listener’s head), and quantification tasks (i.e. quantifying the depth difference between the inputs). In addition, training leverages recent advances in metric and multi-task learning, which allows the framework to be invariant to both signal content (i.e. non-matched reference) and directional cues (i.e. azimuth and elevation). Our framework has additional useful qualities that make it suitable for use as an objective metric to benchmark binaural audio systems, particularly depth perception and sound externalization, which we demonstrate through experiments. We also show that NORD generalizes well under different reverberation and environments. The results from preference and quantification tasks correlate well with measured results.