FUSION OF MODULATION SPECTRAL AND SPECTRAL FEATURES WITH SYMPTOM METADATA FOR IMPROVED SPEECH-BASED COVID-19 DETECTION
Yi Zhu, Tiago Falk
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:19
Existing speech-based coronavirus disease 2019 (COVID-19) detection systems provide poor interpretability and limited robustness to unseen data conditions. In this paper, we propose a system to overcome these limitations. In particular, we propose to fuse two different feature modalities with patient metadata in order to capture different properties of the disease. The first feature set is based on modulation spectral properties of speech. The second comprises spectral shape/descriptor features recently used for COVID-19 detection. Lastly, we fuse patient metadata in order to improve robustness and interpretability. Experiments are performed on the 2021 INTERSPEECH COVID Speech Sub-Challenge dataset with several different data partitioning paradigms. Results show the importance of the modulation spectral features. Metadata, in turn, did not perform very well when used alone but provided invaluable insights when fused with the other features. Overall, a system relying on the fusion of all three modalities showed to be robust to unseen conditions and to rely on interpretable features. The simplicity of the model suggests that it can be deployed in portable devices, hence providing accessible COVID-19 diagnostics worldwide.