SSI-Net: A MULTI-STAGE SPEECH SIGNAL IMPROVEMENT SYSTEM FOR ICASSP 2023 SSI CHALLENGE
weixin zhu (tencent); Zilin Wang (Tsinghua University); Jiuxin Lin (Tsinghua University); Chang Zeng (National Institute of Informatics); Tao Yu (Tencent)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The ICASSP 2023 Speech Signal Improvement (SSI) Challenge
concentrates on improving the speech signal quality of real-time
communication (RTC) systems. In this paper, we introduce the
speech signal improvement network (SSI-Net) submitted to the
ICASSP 2023 SSI Challenge, which satisfies the real-time condition.
The proposed SSI-Net has a multi-stage architecture. We
present the time-domain restoration generative adversarial network
(TRGAN) in the first restoration stage for speech restoration. Regarding
the second enhancement stage, we employ a lightweight
multi-scale temporal frequency convolutional network with axial
self-attention (MTFAA-Net) called MTFAA-Lite to enhance the
fullband speech. In the subjective test on the SSI Challenge blind
test set, our proposed SSI-Net yields a P.835 overall mean opinion
score (MOS) of 3.190 and a P.804 overall MOS of 3.178, which
eventually takes the 3rd place in tracks 1&2.