-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:29:10
With recent progress in deep learning, there has been an increased interest in visually grounded dialogue, which requires an AI agent to hold a meaningful conversation with humans in Natural Language about visual content in other modalities, e.g. pictures or videos. In this talk, I will present two case studies: one in generating responses for closed-domain task-based multimodal dialogue systems with applications in conversational multimodal search; and one case-study in selecting/ retrieving responses for open-domain multimodal systems with applications in visual dialogue and visual question answering.
Throughout my talk I will highlight open challenges for deep learning and beyond, including context modelling, knowledge grounding, encoding history, multimodal fusion, evaluation techniques, and shortcomings of current datasets.
Throughout my talk I will highlight open challenges for deep learning and beyond, including context modelling, knowledge grounding, encoding history, multimodal fusion, evaluation techniques, and shortcomings of current datasets.