TAMM: A TASK-ADAPTIVE MULTI-MODAL FUSION NETWORK FOR FACIAL-RELATED HEALTH ASSESSMENTS ON 3D FACIAL IMAGES
Kai Wang, Xiaohong Liu, Zhengchao Luo, Fajin Feng, Guangyu Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Previous studies showed that facial appearance is an important phenotypic indicator of human diseases or biological conditions. Recent advancements in deep learning have shown great potential in facial image analysis, including health status assessments. However, prior methods mainly focused on single modality analysis of either the 2D texture images or 3D facial meshes, which are limited in their ability to fully capture the relationships between biometric measurements and diseases. To address these issues, we propose a task-adaptive multi-modal fusion network, TAMM, for facial-related health assessments. Our model leverages both the geometric and texture features of 3D facial images by a task-adaptive Transformer (TAFormer), which can dynamically extract features from different modalities and scales for various tasks via spatial attention and cross modal multi-scale attention, effectively capture intra- and inter-modal relationships between features. Experimental results on a dataset of 19,775 patients demonstrate that TAMM achieves the state-of-the-art performance on various regression and classification tasks including age, BMI, and fatty liver disease predictions. Ablation studies shows the importance of multi-modal fusion and task-specific adaptability of our model in achieving optimal performance.