Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

A new neural network architecture is proposed that can be used to convert Mel spectrograms into an audio signal. The architecture is designed from the ground up to be run on a mobile device, taking advantage of operators that can be parallelized easily on mobile CPUs and GPUs, being fully convolutional and non-autoregressive. It introduces a lightweight combination of a nearest neighbor resize and separable convolution as its upsampling block, that provides fast upsampling with minimal checkerboarding artifacts. The model is trained as a GAN and demonstrates stable training behavior. A method for evaluating the performance characteristics of neural vocoders on mobile devices is also described. The model is shown to be able to run at up to 20x faster than realtime on a current generation mobile CPU and up to 65x faster than realtime on a current generation mobile GPU, while being neutral or better in quality when evaluated against a comparably sized WaveRNN model.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00