Multi-Modal Approach to Food Classification Diet Tracking System with spoken and visual inputs
Shivani Gowda Kallappanahalli (Loyola Marymount University); Yifan Hu (Loyola Marymount University); Mandy B Korpusik (Loyola Marymount University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In this paper, we present multimodal approaches to dietary tracking. As health and well-being become increasingly important, mobile applications for dietary tracking attract much interest. However, these applications often require users to log their meals based on relatively unreliable memory recall, thereby underestimating nutritional intake and, thus, undermining the efforts of nutrition tracking. To accurately record dietary intake, there is an increasing need for image computational methods. We investigated multi-modal transfer-learning approaches on a novel, food-specific image-text dataset, specifically a Vision-and-Language Transformer, achieving a held-out test set Micro-F1 score of 77.70% and Macro-F1 score of 51.43% for 696 food categories. We aim to give other researchers new insight into the process of developing domain-specific, multi-modal deep learning models with small datasets.