Improved Deep Speaker Localization and Tracking: Revised Training Paradigm and Controlled Latency

Alexander Bohlender (IDLab, Ghent University - imec); Liesbeth Roelens (IDLab, Ghent University - imec); Nilesh Madhu (IDLab, Ghent University - imec)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Even without a separate tracking algorithm, the directions of arrival (DOAs) of moving talkers can be estimated with a deep neural network (DNN) when the movement trajectories used for training allow the generalization to real signals. Previously, we proposed a framework for generating training data with time-variant source activity and sudden DOA changes. Slowly moving sources could be seen as a special case thereof, but were not explicitly modeled. In this paper, we extend this framework by using small jumps between neighboring discrete DOAs to simulate gradual movements. Further, we investigate the benefit of a latency controlled bidirectional recurrent layer in the DNN architecture, such that the required strictly limited context of future frames may still be acceptable for real-time applications. Experiments with real recordings show that the revised data generation leads to more continuous DOA paths, whereas the future context enables a quicker detection of speech onsets and offsets.

Tags:

Acoustic and microphone array processing

Improved Deep Speaker Localization and Tracking: Revised Training Paradigm and Controlled Latency

Alexander Bohlender (IDLab, Ghent University - imec); Liesbeth Roelens (IDLab, Ghent University - imec); Nilesh Madhu (IDLab, Ghent University - imec)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

A DNN BASED NORMALIZED TIME-FREQUENCY WEIGHTED CRITERION FOR ROBUST WIDEBAND DOA ESTIMATION

DIFFUSION-BASED SOUND SOURCE LOCALIZATION USING NETWORKS OF PLANAR MICROPHONE ARRAYS

Soft label coding for end-to-end sound source localization with ad-hoc microphone arrays

Join the IEEE Signal Processing Society