Bannière Faculté des sciences DIC
Title : Grounded Language Learning in Virtual Environments
Tutor : Stephen Clark
Number : 42/20
Status : Not exceeded
Begin : Thursday, 19 November, 2020 à 10:30AM
End : Tuesday, 19 November, 2024 à 12:30PM
Closing date : Tuesday, 19 November, 2024 à 22:00PM
Location : Zoom :
Bookable : 12

Pour revoir la conférence :


Natural Language Processing is currently dominated by the application of text-based language models such as BERT and GPT. One feature of these models is that they rely entirely on the statistics of text, without making any connection to the world, which raises the interesting question of whether such models could ever properly “understand” the language. One way in which these models can be grounded is to connect them to images or videos, for example by conditioning the language models on visual input and using them for captioning.

In this talk I extend the grounding idea to a simulated virtual world: an environment which an agent can perceive, explore and interact with.

More specifically, a neural-network-based agent is trained -- using distributed deep reinforcement learning -- to associate words and phrases with things that it learns to see and do in the virtual world.The world is 3D, built in Unity, and contains recognisable objects, including some from the ShapeNet repository of assets.

One of the difficulties in training such networks is that they have a tendency to overfit to their training data, so first we’ll demonstrate how the interactive, first-person perspective of an agent provides it with a particular inductive bias that helps it to generalize to out-of-distribution settings. Another difficulty is that training the agents typically requires a huge number of training examples, so we’ll show how meta-learning can be used to teach the agents to bind words to objects in a one-shot setting. Moreover, the agent is able to combine its knowledge of words obtained one-shot with its stable knowledge of word meanings learned over many episodes, providing a form of grounded language learning which is both “fast and slow”.

Joint work with Felix Hill.

Bio :

Stephen Clark is a Research Scientist at DeepMind, and an Honorary Professor at Queen Mary University of London. Previously he spent 18 years working at UK universities, first as a postdoctoral researcher at the University of Edinburgh; then as a member of Faculty at the Oxford University Department of Computer Science, and a Fellow of Keble College, Oxford; and finally as a member of Faculty at the University of Cambridge Department of Computer Science and Technology.

He holds a PhD in Computer Science and Artificial Intelligence from the University of Sussex, and an MA in Philosophy from Gonville and Caius College, Cambridge. He carries out research at the intersection of Computational Linguistics and Machine Learning, with much of his previous work focusing on the syntactic and semantic analysis of natural language text (including an ERC Starting Grant on compositional distributional models of meaning). His current research focus is the acquisition of language by artificial agents in the context of realistic virtual environments.