Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 0:29:10
19 Jan 2021

With recent progress in deep learning, there has been an increased interest in visually grounded dialogue, which requires an AI agent to hold a meaningful conversation with humans in Natural Language about visual content in other modalities, e.g. pictures or videos. In this talk, I will present two case studies: one in generating responses for closed-domain task-based multimodal dialogue systems with applications in conversational multimodal search; and one case-study in selecting/ retrieving responses for open-domain multimodal systems with applications in visual dialogue and visual question answering.
Throughout my talk I will highlight open challenges for deep learning and beyond, including context modelling, knowledge grounding, encoding history, multimodal fusion, evaluation techniques, and shortcomings of current datasets.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00