IPAB Workshop - 10/07/2025 | IPAB | School of Informatics

Speaker: Marina Aoyama

Title: Learning Task-Informed Exploration Policies for Dynamic Robotic Tasks

Abstract: Dynamic robotic tasks, such as striking a puck into a goal beyond the robot’s reachable workspace, often require the robot to estimate object properties before task execution, as it cannot recover from failure without human intervention. To address this, we propose a task-informed exploration approach using reinforcement learning, in which the robot learns an exploration policy guided by the sensitivity of a privileged task policy to errors in estimated properties. Additionally, we introduce an uncertainty-based mechanism to transition from exploration to task execution. We validate our method on striking and pushing tasks using a KUKA iiwa robot arm, achieving significant improvements over baseline approaches.

Speaker: Jack Rome

Title: Perceptual Geometry for Policy Learning: Surface Normals and Multi-Objective Encoders

Abstract: In robotic cloth manipulation, accurately capturing the underlying geometry of deformable objects is critical. This talk explores the use of surface-normals as a primary visual input modality, emphasizing their geometric richness and representational uniformity compared to standard RGB inputs. A central challenge addressed in this work is the trade-off between representation quality for downstream reinforcement learning (RL) tasks and visual reconstruction fidelity. Convolutional neural networks (CNNs) are commonly employed as encoders due to their ability to produce compact latent representations suitable for actor-critic frameworks. However, these same encoders often fall short in generating high-fidelity reconstructions. This talk presents methods for designing encoder architectures that bridge this gap—producing representations that are simultaneously effective for policy learning and capable of reconstructing detailed input observations. Through this lens, we discuss architectural trade-offs, training strategies, and implications for real-world manipulation tasks.

Speaker: Kale-Ab Tessera

Title: Remembering the Markov Property in Cooperative MARL

Abstract: Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents' behaviour. Yet in practice, current model-free MARL algorithms rely on simple recurrent function approximators to handle partial information. In this talk, I will argue that their empirical success is not due to effective Markov signal recovery, but rather to learning simple conventions that sidestep environment observations and memory altogether. I will present a targeted case study showing how co-adapting agents learn brittle conventions that fail when paired with non-adaptive partners. Importantly, the same models can learn grounded policies when task design enforces it, demonstrating that the problem lies not in the models themselves but in benchmark design. I will also discuss why many modern MARL environments may not adequately test Dec-POMDP assumptions, and make the case for designing new cooperative environments built on two key principles: (1) grounding behaviours in observations and (2) requiring memory-based reasoning about other agents, ensuring that success demands genuine skill instead of fragile, co-adapted agreements.