The past few years have seen an increasing interest in developing neural-network-based agents for visually-grounded dialogue, where the conversation participants communicate about visual content. I will start by discussing how visual grounding can be integrated with traditional task-oriented dialogue system components. Most current work in the field focuses on reporting numeric results solely based on task success. I will argue that we can gain more insight by (i) analysing the linguistic output of alternative systems and (ii) probing the representations they learn. I will also introduce a new dialogue dataset we have developed using a data-collection setup designed to investigate linguistic common ground as it accumulates during visually-grounded interaction.
Raquel Fernández is Associate Professor at the Institute for Logic, Language and Computation (ILLC), University of Amsterdam, where she leads the Dialogue Modelling Group. She received her PhD from King’s College London and has held research positions at the University of Potsdam and at CSLI, Stanford University. Her work and interests revolve around language use, encompassing topics that range from computational semantics and pragmatics to the dynamics of dialogue interaction, visually grounded language processing, and child language acquisition.