Motion Dynamics Improve Speaker-Independent Lipreading
Matteo Riva, Jürgen Schmidhuber, Michael Wand
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:46
We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ~6.8% on unseen speakers and of ~3.3% on known speakers, with respect to a baseline which uses a standard architecture.