CO-ATTENTION-GUIDED BILINEAR MODEL FOR ECHO-BASED DEPTH ESTIMATION
Go Irie, Takashi Shibata, Akisato Kimura
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:06:20
Echoes reflect a geometric structure of a scene surrounding a sound source. In this paper, we address the problem of estimating depth maps of indoor scenes based on echoes. First, we experimentally show that fusing multiple acoustic features, especially spectrogram and angular spectrum, can improve estimation accuracy. We then propose a novel bilinear model that incorporates dense co-attention for effective feature fusion. Our model is able to obtain a compact fused feature while capturing the second-order correlations of intra- and inter-features. Thorough evaluations on two datasets demonstrate the superiority of the proposed method over the state-of-the-art echo-based depth estimation and feature fusion methods.