Dialogue System with Missing Observation

Djallel Bouneffouf (IBM); mayank agarwal (ibm); Irina Rish (university of montreal)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

04 Jun 2023

Within the domain of dialogue, the ability to orchestrate multiple independently trained dialogue agents to create a unified system is of particular importance. Where we define orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. In this work, we study the task of online dialogue orchestration where the user feedback associated with the dialogue agent may not always be observed. In order to address the missing feedback setting, we propose to combine the attentive contextual bandit approach with an unsupervised learning mechanism such as clustering. By leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on proprietary conversational datasets.

Tags:

Machine learning methods for language