Audio-Visual Tracking of Multiple Speakers via a PMBM Filter

Jinzheng Zhao, Peipei Wu, Xubo Liu, Wenwu Wang, Yong Xu, Lyudmila Mihaylova, Simon Godsill

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:20

11 May 2022

Audio-visual tracking of multiple speakers requires to estimate the state (e.g. velocity and location) of each speaker by leveraging the information of both audio and visual modalities. Estimating the number of speakers and their states jointly remains a challenging problem. We propose an Audio-Visual Possion Multi-Bernoulli Mixture Filter (AV-PMBM) that can not only predict the number of speakers but also give accurate estimation of their states. We also propose a novel sound source localization technique based on DOA information and a deep learning based object detector to provide reliable audio measurements for the AV tracker. To our knowledge, this represents the first attempt using PMBM for multi-speaker tracking with audio visual modalities. Experiments on the AV16.3 dataset demonstrate that AV-PMBM achieves state-of-the-art performance in optimal sub-pattern assignment (OSPA).

Tags:

pmbm filter

audio-visual fusion

multiple-speaker tracking

Audio-Visual Tracking of Multiple Speakers via a PMBM Filter

Jinzheng Zhao, Peipei Wu, Xubo Liu, Wenwu Wang, Yong Xu, Lyudmila Mihaylova, Simon Godsill

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

Sorry, no results were found

Join the IEEE Signal Processing Society