Multi-agent simulation and constrained, multi-objective reinforcement learning technologies for install scheduling optimisation | CDT in Machine Learning Systems

This research project aims to develop advanced reinforcement learning algorithms within a multi-agent simulation framework to model real-world conditions, generate high-fidelity reward structures, and optimise scheduling and resource allocation under uncertainty.

Deadline for Application & Eligibility

The deadline to submit your first-stage application form is 13 February 2026, 23:59.

You must follow the CDT MLSystems application process as described in those webpages.

Please contact the project's PI ahead of submitting your application to first check suitability and interest.

Eligibility: this project is open for applications to Home students only.

Co-funding Company

GoFibre

GoFibre is an independent Scottish broadband provider on a mission to bring top-quality digital connectivity to homes and businesses across Scotland and the north of England with the best broadband around.

https://gofibre.co.uk/

Supervisory team

University of Edinburgh PI: Fengxiang He - f.he@ed.ac.uk (School of Informatics)
Personal website: https://fengxianghe.github.io/

Company supervisor: Chetna Arora - chetna.arora@gofibre.co.uk

Abstract

GoFibre, a leading rural broadband provider in Scotland and Northern England, is undergoing rapid network expansion, targeting 250,000 premises within three years. This growth introduces complex operational challenges, notably uncertainty due to dynamic factors such as customer availability, traffic, and on-site variability etc. These issues can be formalised as a constrained, multi-objective optimisation problem with evolving constraints. This PhD project aims to develop advanced reinforcement learning algorithms within a multi-agent simulation framework to model real-world conditions, generate high-fidelity reward structures, and optimise scheduling and resource allocation under uncertainty.

Project Background

At GoFibre we’re on an exciting journey to revolutionise broadband capabilities for homes and businesses in rural towns and villages across Northern England and Scotland, connecting communities and affording them digital capability equal to their city counterparts; whilst being as environmentally conscious as possible, and creating social value in the areas we serve.

We’re growing fast and we don’t intend to slow down anytime soon as we play our part in ensuring future-proof full fibre coverage. Collaboration, innovation, commitment, continual improvement of our business and ourselves, are the cornerstones of what creates our collective success.

Project Definition:

Optimize customer installation scheduling using AI to improve engineer productivity, reduce reschedules, and lower operational costs.

Current Challenges:

Manual or semi-automated scheduling processes with limited optimization.
Inefficiencies in engineer allocation and routing.
Frequent reschedules due to customer availability or operational constraints.

AI Opportunity:

AI can dynamically optimize scheduling by predicting reschedules, learning from historical data, and continuously improving route efficiency.

Expected Outcomes:

Higher installation success rates and reduced rescheduling.
Improved field resource utilization and reduced travel costs.
Real-time adaptive scheduling based on predictive insights.

Priority: Strong operational benefit with clear potential for automation and AI-driven efficiency gains.

Project Aims

This project aims to solve constrained, multi-objective optimisation problem with a rich and evolving constraint structure. The student is expected to design (1) constrained, multi-objective reinforcement learning technologies to address the problem, and (2) a multi-agent simulation system to mimic the real-world environment, for generating high-quality reward and penalty signals.

Expected Outcome and Impact

Target Outcomes Papers:

Multi-agent Evaluation Environments and Benchmarks (Month 9)
Paper: Multi-agent LLM Planning for Games (Month 15)
Paper: Mitigating Forgetting in Online Adaptation for Multi-Agent settings (Month 24)
Paper: Efficient online adaptation (Month 36)
Thesis/Paper: An LLM-Based AI Companion for Multi-Agent Collaboration (Month 48)

Demonstrator: Practical demonstrator tooling showing methods in practice in an environment

Code: open source codebase for implementing the methodology

Video: Demonstration video of approach

Data and Methodology

Data:

Open-access data and simulation environments will be used in developing and trialling algorithms.
Real-world data for building up the simulation environment and trialling the prototype system.

Methodology:

Predictive analytics technologies based on machine learning (including large language models, and other foundation models) for predicting installation conditions and job durations.

Constrained, multi-objective reinforcement learning for operational research in general.

Timescale and Expected Outputs/activities (over 4 years)

Expected outcomes:

Through executing this project, expected outcomes are four-fold:

A multi-agent simulation system for simulating the install scheduling environment, where the agents learn knowledge from real data, mimicking the real-world environment, including rewards for scheduling optimisation actions and the constraints / penalties imposed to the scheduling. The environment will allow running and testing our developed algorithms.
Novel algorithms for constrained, multi-objective reinforcement learning problems in the context of multi-party logistics. The algorithms will apply to the developed simulation environment.
A prototype system that integrates all developed algorithms. The prototype will be deployed to real-world scenarios.
Performance evaluation and assurance, in both theory and empirical manners, from critical aspects of generalisability (ensuring performance in unseen data), regret cumulated alongside the time (ensuring performance in dynamic environment), convergence speed (about computational overhead), stability of deployment in fluctuating environments, and compliance to regulations.

Timetable:

intensive training provided by the CDT (months 1 to 12)
Literature review (months 1 to 6)
Developing the multi-agent simulation system (months 7 to 18)
Designing the reinforcement learning algorithms (months 19 to 36)
Developing the evaluation tool and assurance results (months 37 to 42)
Developing the prototype system for deployment (months 43 to 48)

Students Requirements

A good Bachelor’s degree (First Class Honours or international equivalent) or Master’s degree in a relevant subject (mathematics, statistics, economics, or related subject).
Strong programming skills in Python, PyTorch, TensorFlow, etc. are a plus but not necessary.
A strong mathematical background.
Proficiency in English (both oral and written).
Relevant research experiences in machine learning, statistics, etc. are preferred.
More on the CDT MLSystems requirements for candidates

This article was published on 2025-11-04