Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:07:19
12 May 2022

End-to-end raw-waveform modelling with learnable feature extraction front-ends has shown promising results in various speech/audio tasks. Despite its varied success, there have not been many attempts to understand how spectral/temporal feature integration from raw inputs helps recognize task-dependent information. Towards this aim, this work presents data-dependent and data-independent methods for understanding the modelling behavior of acoustic models. The first method employs time-frequency analysis to visualize input-specific response spectra as a function of short-time front-end block processing. The second method employs geometric properties of layer-wise weights to quantify the impact of architectural choices on signal propagation and trainability of the model. We demonstrate potential of the proposed methods with help of case studies on speech classification, speaker identification, and spoofing classification tasks.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00