PICKNET: REAL-TIME CHANNEL SELECTION FOR AD HOC MICROPHONE ARRAYS

Takuya Yoshioka, Xiaofei Wang, Dongmei Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:54

13 May 2022

This paper proposes PickNet, a neural network model for real-time channel selection using an ad hoc microphone array. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions. Speech recognition-based evaluation was carried out by using real conversational recordings obtained with various smartphones. The proposed model yielded significant gains in word error rate with limited computational cost over systems using a block-online beamformer and a single distant microphone.

Tags:

ad hoc microphone array

channel selection

real-time processing

PICKNET: REAL-TIME CHANNEL SELECTION FOR AD HOC MICROPHONE ARRAYS

Takuya Yoshioka, Xiaofei Wang, Dongmei Wang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

MULTI-OBJECT TRACKING AS ATTENTION MECHANISM

LOW-LATENCY HUMAN-COMPUTER AUDITORY INTERFACE BASED ON REAL-TIME VISION ANALYSIS

Join the IEEE Signal Processing Society