A Dynamic Stream Weight Backprop Kalman Filter For Audiovisual Speaker Tracking
Christopher Schymura, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:10
Audiovisual speaker tracking is an application that has been tackled by a wide range of classical approaches based on Gaussian filters, most notably the well-known Kalman filter. Recently, a specific Kalman filter implementation was proposed for this task, which incorporated dynamic stream weights to explicitly control the influence of acoustic and visual observations during estimation. Inspired by recent progress in the context of integrating uncertainty estimates into modern deep learning frameworks, this paper proposes a deep neural-network-based implementation of the Kalman filter with dynamic stream weights, whose parameters can be learned via standard backpropagation. This allows for jointly optimizing the parameters of the model and the dynamic stream weight estimator in a unified framework. An experimental study on audiovisual speaker tracking shows that the proposed model shows comparable performance to state-of-the-art recurrent neural networks with the additional advantage of requiring a smaller number of parameters and providing explicit uncertainty information.