Video-Driven Speech Reconstruction

Rodrigo Mira, Pingchuan Ma, Konstantinos Vougioukas, Stavros Petridis, BjÃ¶rn Schuller, Maja Pantic

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:52

04 May 2020

This demo will showcase our video-to-audio model which attempts to reconstruct speech from short videos of spoken statements. Our model does so in a completely end-to-end manner where raw audio is generated based on the input video. This approach bypasses the need for separate lip-reading and text-to-speech models. The advantage of such an approach is that it does not require large transcribed datasets and it is not based on intermediate representations like text which remove any intonation and emotional content from the speech. This demo will show for the first time the feasibility of end-to-end video-driven speech reconstruction for unseen speakers. The model is based on generative adversarial networks and achieves the state-of-the-art performance on seen speakers on the GRID dataset in terms of word error rate and speech quality and intelligibility. It is also the first model which can generate high quality and intelligible speech for unseen speakers. Additionally, this model is the first to produce intelligible speech when trained and tested on LRW, an 'in the wild' dataset which contains thousands of utterances taken from television broadcasts. The demo will be interactive, involving recording live video from a new participant. The previously unseen speaker will be asked to utter a short sentence in front of the camera, but no audio will be recorded. This video will then be fed into the model and it will (in only a few seconds) produce a new version of the same video which will feature the reproduced speech generated by our end-to-end model. The proposed model can have a significant impact on videoconferencing by alleviating common issues such as noisy environments, gaps in the audio and unvoiced syllables. The demo will be the first step in demonstrating the potential of this technology which we believe will be very attractive and relevant to the ICASSP audience. Samples of our work can be found on https://sites.google.com/view/speech-synthesis/home/extension .

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Video-Driven Speech Reconstruction

Rodrigo Mira, Pingchuan Ma, Konstantinos Vougioukas, Stavros Petridis, BjÃ¶rn Schuller, Maja Pantic

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society