NEURAL ARCHITECTURE SEARCH WITH MULTIMODAL FUSION METHODS FOR DIAGNOSING DEMENTIA

Michail Chatzianastasis (École Polytechnique ); Loukas Ilias (National Technical University of Athens); Dimitris Askounis (National Technical University of Athens); Michalis Vazirgiannis (École Polytechnique)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Alzheimer’s dementia (AD) affects memory, thinking, and language, deteriorating person’s life. An early diagnosis is very important as it enables the person to receive medical help and ensure quality of life. Therefore, leveraging spontaneous speech in conjunction with machine learning methods for recognizing AD patients has emerged into a hot topic. Most of the previous works employ Convolutional Neural Networks (CNNs), to process the input signal. However, finding a CNN architecture is a time-consuming process and requires domain expertise. Moreover, the researchers introduce early and late fusion approaches for fusing different modalities or concatenate the representations of the different modalities during training, thus the inter-modal interactions are not captured. To tackle these limitations, first we exploit a Neural Architecture Search (NAS) method to automatically find a high performing CNN architecture. Next, we exploit several fusion methods, including Multimodal Factorized Bilinear Pooling and Tucker Decomposition, to combine both speech and text modalities. To the best of our knowledge, there is no prior work exploiting a NAS approach and these fusion methods in the task of dementia detection from spontaneous speech. We perform extensive experiments on the ADReSS Challenge dataset and show the effectiveness of our approach over state-of-the-art methods.

Tags:

Multimodal processing of language

NEURAL ARCHITECTURE SEARCH WITH MULTIMODAL FUSION METHODS FOR DIAGNOSING DEMENTIA

Michail Chatzianastasis (École Polytechnique ); Loukas Ilias (National Technical University of Athens); Dimitris Askounis (National Technical University of Athens); Michalis Vazirgiannis (École Polytechnique)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Exploring complementary features in multi-modal speech emotion recognition

Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

Join the IEEE Signal Processing Society