Skip to main content

Time-domain speech separation networks with graph encoding auxiliary

Wang Tingting (Nanjing University of Posts and Tel); Zexu Pan (National University of Singapore); Meng Ge (Tianjin University); Zhen Yang (Nanjing University of Posts and Telecommunication); Haizhou Li (The Chinese University of Hong Kong, Shenzhen)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
09 Jun 2023

End-to-end time-domain speech separation with masking strategy has shown its performance advantage, where a 1-D convolutional layer is used as the speech encoder to encode a sliding window of waveform to a latent feature representation, i.e. an embedding vector. A large window leads to low resolution in the speech processing, on the other hand, a small window offers high resolution but at the expense of high computational cost. In this work, we propose a graph encoding technique to model the fine structural knowledge of speech samples in a window of reasonable size. Specifically, we build a graph representation for each latent representation, and encode the structural details with a graph convolutional network encoder. The encoded graph feature representation complements the original latent feature representation and benefits the separation and reconstruction of speech. Experiments on various models and datasets show that our proposed encoding technique significantly improves the speech quality over other time-domain speech encoders.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00