Multimodal Integration for Sample-Efficient Deep Reinforcement Learning

Leveraging multimodal learning to improve the transferability and sample efficiency of Reinforcement Learning (RL)

Led Amos Storkey, Stefano Albrecht, Peter Bell, and Trevor McInroe with Donghe Han (Postdoctoral Researcher)

In this project, we leverage multimodal learning to improve the transferability and sample efficiency of Reinforcement Learning (RL). We conjecture that using multimodal image and text information sources can achieve a similar synergistic effect for RL policies as it does for representation learning for images. Through multimodal sources we can better understand the connection between textual descriptors of environments, actions within the environment and the required goals. This understanding leads to refined scoping for policies and models that will reduce the data cost of training in a new environment. This is akin to the idea that, for humans, reading the rules of e.g. a board game helps a person play that game better and more immediately than just doing random things and seeing what happens, while descriptions of actions and situations during play provide attentional and directional descriptors that can help inform action.

Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2023. (Pre print download available here: https://www.marl-book.com )
Yuri Burda, Harrison Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros (2019) Large-Scale Study of Curiosity-Driven Learning. International Conference on Learning Representations (ICLR)
Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht (2022) Learning Representations for Reinforcement Learning with Hierarchical Forward Models. NeurIPS 2022 Workshop on Deep RL

This article was published on 2024-11-22

Team Members

Project Summary

Publications