Adviser: A Toolkit For Developing Multimodal And Socially-Engaged Conversational Agents
Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Voelkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:18
Dialog systems or chatbots, both text-based and multimodal, have received much attention in recent years, with an increasing number of dialog systems in both industrial contexts such as Amazon Alexa, Apple Siri, Microsoft Cortana, Google Duplex and XiaoIce, as well as academia such as MuMMER and Alana. However, open-source toolkits and frameworks for developing such systems are rare, especially for developing multimodal dialog systems comprised of speech, text and vision. Most of the existing toolkits are designed for developing systems focusing on core dialog components, with or without the option to access external speech processing services. To the best of our knowledge, there are only two toolkits MuMMER (Foster et al., 2016), and \psi (Bohus et al., 2017). which support multimodal processing and leverage social signals for conversational agents. Both provide a great platform for building dialog systems, however, the former is not open-source and the latter is based on the .NET platform, which could be less convenient for non-technical users such as linguists and cognitive scientists, who play an important role in dialog research. In this demonstration, we introduce ADVISER, a new option for building multimodal (speech, text, gaze and vision) dialog systems, which is open-source and Python based for easy usage and fast prototyping. The toolkit is designed in such a way that it is modular, flexible, transparent and user-friendly for both technically experienced and less technically experienced users. ADVISER focuses on task-oriented dialog systems that support users in fulfilling certain goals with a minimal number of dialog turns, while being friendly and likeable by considering social signals such as emotional states and engagement levels.