Tuesday 3 February 2026 Host: Ian SimpsonSpeaker: Chaeeun LeeTitle: Process-Supervised Multi-Agent Reinforcement Learning for Reliable Clinical ReasoningAbstract: Clinical decision-making requires nuanced reasoning over heterogeneous evidence and traceable justifications. While recent LLM multi-agent systems (MAS) show promise, they largely optimise for outcome accuracy while overlooking process-grounded reasoning aligned with clinical standards. One critical real-world case of this is gene–disease validity curation, where experts must determine whether a gene is causally implicated in a disease by synthesising diverse biomedical evidence. We introduce an agent-as-tool reinforcement learning framework for this task with two objectives: (i) process-level supervision to ensure reasoning follows valid clinical pathways, and (ii) efficient coordination via a hierarchical multi-agent system. Our evaluation on the ClinGen dataset shows that with outcome-only rewards, MAS with a GRPO-trained Qwen3-4B supervisor agent substantially improves outcome accuracy compared to a base model supervisor (0.732 vs. 0.195), but results in poor process alignment (0.392 F1). Conversely, with process + outcome rewards, MAS with GRPO-trained supervisor achieves higher outcome accuracy (0.750) while significantly improving process fidelity to 0.512 F1 (>10% improvement over baseline). We also find that decomposing the task into specialised agents and providing process-based rewards improves robustness on unseen data compared to a standard single-agent baseline. Feb 03 2026 13.00 - 14.00 Tuesday 3 February 2026 Speaker: Chaeeun Lee IF, G.03
Tuesday 3 February 2026 Host: Ian SimpsonSpeaker: Chaeeun LeeTitle: Process-Supervised Multi-Agent Reinforcement Learning for Reliable Clinical ReasoningAbstract: Clinical decision-making requires nuanced reasoning over heterogeneous evidence and traceable justifications. While recent LLM multi-agent systems (MAS) show promise, they largely optimise for outcome accuracy while overlooking process-grounded reasoning aligned with clinical standards. One critical real-world case of this is gene–disease validity curation, where experts must determine whether a gene is causally implicated in a disease by synthesising diverse biomedical evidence. We introduce an agent-as-tool reinforcement learning framework for this task with two objectives: (i) process-level supervision to ensure reasoning follows valid clinical pathways, and (ii) efficient coordination via a hierarchical multi-agent system. Our evaluation on the ClinGen dataset shows that with outcome-only rewards, MAS with a GRPO-trained Qwen3-4B supervisor agent substantially improves outcome accuracy compared to a base model supervisor (0.732 vs. 0.195), but results in poor process alignment (0.392 F1). Conversely, with process + outcome rewards, MAS with GRPO-trained supervisor achieves higher outcome accuracy (0.750) while significantly improving process fidelity to 0.512 F1 (>10% improvement over baseline). We also find that decomposing the task into specialised agents and providing process-based rewards improves robustness on unseen data compared to a standard single-agent baseline. Feb 03 2026 13.00 - 14.00 Tuesday 3 February 2026 Speaker: Chaeeun Lee IF, G.03