Time-domain speech separation networks with graph encoding auxiliary

Wang Tingting (Nanjing University of Posts and Tel); Zexu Pan (National University of Singapore); Meng Ge (Tianjin University); Zhen Yang (Nanjing University of Posts and Telecommunication); Haizhou Li (The Chinese University of Hong Kong, Shenzhen)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

End-to-end time-domain speech separation with masking strategy has shown its performance advantage, where a 1-D convolutional layer is used as the speech encoder to encode a sliding window of waveform to a latent feature representation, i.e. an embedding vector. A large window leads to low resolution in the speech processing, on the other hand, a small window offers high resolution but at the expense of high computational cost. In this work, we propose a graph encoding technique to model the fine structural knowledge of speech samples in a window of reasonable size. Specifically, we build a graph representation for each latent representation, and encode the structural details with a graph convolutional network encoder. The encoded graph feature representation complements the original latent feature representation and benefits the separation and reconstruction of speech. Experiments on various models and datasets show that our proposed encoding technique significantly improves the speech quality over other time-domain speech encoders.

Tags:

Signal Processing for Communications and Networking

Time-domain speech separation networks with graph encoding auxiliary

Wang Tingting (Nanjing University of Posts and Tel); Zexu Pan (National University of Singapore); Meng Ge (Tianjin University); Zhen Yang (Nanjing University of Posts and Telecommunication); Haizhou Li (The Chinese University of Hong Kong, Shenzhen)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

INPLACE CEPSTRAL SPEECH ENHANCEMENT SYSTEM FOR THE ICASSP 2023 CLARITY CHALLENGE

Gesper: A Unified Framework for General Speech Restoration

Multi-speaker Multi-lingual VQTTS System for LIMMITS 2023 Challenge

Join the IEEE Signal Processing Society