Multimodal Integration for Sample-Efficient Deep Reinforcement Learning

Leveraging multimodal learning to improve the transferability and sample efficiency of Reinforcement Learning (RL)

Led Amos StorkeyStefano AlbrechtPeter Bell, and Trevor McInroe with Donghe Han (Postdoctoral Researcher)


In this project, we leverage multimodal learning to improve the transferability and sample efficiency of Reinforcement Learning (RL).  We conjecture that using multimodal image and text information sources can achieve a similar synergistic effect for RL policies as it does for representation learning for images. Through multimodal sources we can better understand the connection between textual descriptors of environments, actions within the environment and the required goals. This understanding leads to refined scoping for policies and models that will reduce the data cost of training in a new environment. This is akin to the idea that, for humans, reading the rules of e.g. a board game helps a person play that game better and more immediately than just doing random things and seeing what happens, while descriptions of actions and situations during play provide attentional and directional descriptors that can help inform action.