IMPROVED SIMULATION OF REALISTICALLY-SPATIALISED SIMULTANEOUS SPEECH USING MULTI-CAMERA ANALYSIS IN THE CHIME-5 DATASET
Jack Deadman, Jon Barker
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:15
Room simulation is an essential tool in the development of distant microphone ASR and source separation. However, most commonly used simulated datasets adopt uninformed and potentially unrealistic speaker location distributions. In earlier work, we analysed a 50-hour audio-visual dataset of multiparty recordings made in real homes to estimate typical angular separations between speakers. We now refine and extend this work using a multi-camera analysis to estimate full 2-D speaker location distributions. Results show that commonly used simulated datasets use unrealistically large angular separations, but unrealistically small ranges for target to interferer distance ratios. We generate more realistically distributed datasets and use them to re-evaluate state-of-the-art source separation and ASR approaches. Results suggest that imposing realistic angular separation distributions makes datasets more challenging, however, the pattern when using realistic distance ratios is more complicated and can depend on room size.