Probing Seismogenic Faults With Machine Learning

Paul Johnson, Chris Johnson

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:49

19 Oct 2022

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. in human communication, co-speech gestures that are naturally timed with speech also contribute to speech perception. in this work, we explore the use of co-speech gestures sequence, e.g. hand and body movements, as the speaker cue for speaker extraction, which could be easily obtained from low-resolution video recordings, thus more available than face recordings. We propose two networks using the co-speech gestures cue to perform attentive listening on the target speaker, one that implicitly fuses the co-speech gestures cue in the speaker extraction process, the other performs speech separation first, followed by explicitly using the co-speech gestures cue to associate a separated speech to the target speaker. The experimental results show that the co-speech gestures cue is informative in associating with the target speaker.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip

Probing Seismogenic Faults With Machine Learning

Paul Johnson, Chris Johnson

Value-Added Bundle(s) Including this Product

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

More Like This

A Patch-Based Algorithm For Diverse and High Fidelity Single Image Generation

Fusion Temporal Color Constancy

RPFNET: Complementary Feature Fusion For Hand Gesture Recognition

Join the IEEE Signal Processing Society