SUPERVISED AND SELF-SUPERVISED PRETRAINING BASED COVID-19 DETECTION USING ACOUSTIC BREATHING/COUGH/SPEECH SIGNALS
Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:06:52
A rapid-accurate detection method for COVID-19 is rather important for avoiding its pandemic. In this work, we propose a bi-directional long short-term memory (BiLSTM) network based COVID-19 detection method using breath/speech/cough signals. Three kinds of acoustic signals are taken to train the network and individual models are built, respectively, whose parameters are averaged to obtain an average model, which is then used to initialize the BiLSTM model training of each task. It is shown that such an initialization can significantly improve the detection performance on three tasks. This is called supervised pre-training based detection. Besides, we utilize an existing pre-trained wav2vec2.0 model and pre-train it using the DiCOVA dataset, which is utilized to extract a high-level representation to replace conventional mel-frequency cepstral coefficients (MFCC) features. This is called self-supervised pre-training based detection. To reduce the information redundancy contained in the recorded sounds, silent segment removal, amplitude normalization and time-frequency masking are also considered. The proposed detection model is evaluated on the DiCOVA dataset and results show that our method achieves an area under curve (AUC) score of 88.44% on blind test in the fusion track. It is shown that using high-level features together with MFCC is helpful for diagnosing accuracy.