Multi-speaker Speech Synthesis from Electromyographic Signals by Soft Speech Unit Prediction
Kevin Scheck (University of Bremen); Tanja Schultz (University of Bremen)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Electromyographic (EMG) signals of articulatory muscles reflect the speech production process even if the user is speaking silently i.e. moving the articulators without producing audible sound. We propose Speech-Unit-based EMG-to-Speech (SU-E2S), a system which relies on EMG to synthesize speech which contains the articulated content but is vocalized in another voice, determined by an acoustic reference utterance. It is based on a Voice Conversion (VC) system which decomposes acoustic speech into continuous soft speech units and a speaker embedding and then reconstructs acoustic features. SU-E2S performs speech synthesis by predicting soft speech units from EMG and using them as input to the VC system. Experiments show that the SU-E2S output is on par in terms of intelligibility of predicting acoustic features directly from EMG, but adds the functionality of synthesizing speech in other voices.