UG Research Experience

The CDT MLSystems is running a Research Experience Scheme for Home Undergraduate students to allow them to gain research experience via Summer internships.

Take the next step in your learning journey and gain research experience with our Summer internship scheme.

Context

The MLSystems Research Experience scheme aims to give Home students the opportunity to experience research in an academic setting in order for them to:

Work alongside leading researchers on real-world challenges in data science, artificial intelligence, cybersecurity and computational systems;
Develop research skills through gaining advanced technical and analytical skills;
Get an insight into postgraduate study and research careers in Academia;
Strengthen their CV and expand their professional network.

The CDT’s wider goal is to increase the number of its Home applicants and to diversify the profiles of the students recruited on PhD programmes. Ultimately the scheme also aims to contribute to improve diversity in academic research careers in the sector of Artificial Intelligence and Computing Science. We therefore strongly encourage interests from students who are from an under-represented group in these areas.

Research Experience 2026

Eligibility of candidates

To apply for this scheme candidates must:

be eligible for Home fees under the UKRI rules (which are detailed on the CDT MLSystems webpage);
be undertaking their first undergraduate degree studies (or integrated Masters)
be in the middle years of their degree (neither first or final years)
be studying in a subject that relates to the MLSystems CDT remit: Computing Science, Machine Learning, AI, Mathematics, Engineering and Physics;
not have applied for a PhD degree yet;
have the right to work in the UK. All appointees will be required to complete HR documentation and provide proof of right to work if successful. The placement cannot begin without those checks being carried out and an employment contract being signed.

Although this is not an eligibility criterion to apply to the scheme, this scheme will prioritise appointing candidates who belong to a group that is under-represented in the areas of Machine Learning and Computing Systems research such as female students, students from a lower economic background, students from an ethnic minority in the UK, students who are disabled or who have caring responsibilities, students who are care-experienced, refugees, asylum seekers or estranged from their family.

Details of the internships

Internships run during the Summer period, between 15 June and 31 August 2026
Internships are between 4 and 8 weeks with part-time options available (minimum 50%)
Interns will receive a short-term employment contract, paid at UoE grade 03 (i.e. £2,060.75 per month, full-time equivalent)
Internships are 100% in-person only in the Informatics Forum, Edinburgh EH89AB (this is not a distance internship scheme)
Interns will be allocated two supervisors and a PhD tutor

Applications for Summer 2026 are now closed.

Timeline

Step	Timeframe
Applications open	20/03/2026
Applications close	20/04/2026, 23:59
Candidate selection by	18/05/2026
First placements start	15/06/2026
Last placements finish	31/08/2026

Research Experience Projects available for Summer 2026

Duration: 8 weeks (part-time available)

Preferred start date: 15/06/2026 (flexible)

Project team: Jianyi Cheng jianyi.cheng@ed.ac.uk (PI), Elizabeth Polgreen and Leiqi Ye.

Modern systems are extremely complex, and ensuring them behave correctly is challenging. Formal verification checks whether a design meets its specifications by identifying discrepancies between the intended behaviour and actual implementation.

However, verification is time-consuming and labour-intensive. More than 60% of development time at Arm is spent on verification. What if we could automate this process using LLMs?

This project explores how automated verification techniques can be enabled by LLMs to help reduce the effort required to verify realistic designs. In collaboration with researchers at EPFL, the project will investigate how well these techniques work on practical modules.

Depending on the student’s interests, the project may take a number of directions:

(Hardware verification with EBMC) automatically generating specifications for hardware using LLMs and verifying them with the EBMC model checker, and evaluating the generated assertions in terms of precision, size, and effectiveness.
(Hardware verification with EBMC) Using LLMs to generate auxiliary specifications which help EBMC prove a specification of interest holds.
(Software verification with Frama-C) Given a C function and a specification, use LLMs to generate auxiliary invariants to enable Frama-C to prove that the specification holds.

Students interested in pursuing research may explore additional directions that could lead to research publications.

Tool references
1. Frama-C: https://julien-signoles.fr/publis/2016_rv.pdf
2. EBMC: https://github.com/diffblue/hw-cbmc

Candidate Requirements

Essential	Strong programming skills, particularly in C or Verilog.
Desirable	Formal verification background

Duration: 8 weeks (part-time available)

Preferred start date: 20/07/2026 (flexible)

Project team: Ajitha Rajan arajan@ed.ac.uk (PI), Jackson Woodruff and Yulin Jin.

Background
Quantisation is the process of reducing the precision of the numbers inside a neural network — for example, converting 32-bit floating point numbers to 8-bit integers. This makes models smaller and faster to run, at the cost of a small drop in accuracy.

Project task
Build a compiler pipeline in MLIR that automatically quantizes floating-point neural network models, eg. FP32 to INT8/INT4, and includes a testing framework to benchmark accuracy vs. performance tradeoffs.

Steps:

MLIR Dialect & Passes
- Define a simple quant dialect or extend the existing mlir::quant dialect
- Write analysis passes to identify quantization-sensitive layers (e.g. skip first/last layers)
- Implement a lowering pass: FP32 ops → quantized integer ops
Quantization Strategies
- Implement post-training quantization (PTQ) — quantize a pre-trained model without retraining
- Add per-channel vs per-tensor quantization modes as a comparison (time permitting)
- Calibrate scale factors using a small representative dataset
Testing & Evaluation Framework
- Write unit tests for each MLIR pass using mlir-opt + FileCheck
- Build an end-to-end benchmark: run a small model (e.g. MobileNet) through your pipeline
- Measure and report: model size, inference latency, and accuracy degradation
Analysis
- Compare INT8 vs INT4 results
- Discuss which layers are most sensitive to quantization
- Reflect on MLIR's strengths/limitations for this task

Candidate Requirements

Essential	C++ programming, enjoys programming
Desirable	Exposure to compilers

Duration: 8 weeks (part-time available)

Preferred start date: 15/06/2026 (flexible)

Project team: Adriana Sejfia asejfia@ed.ac.uk (PI), Jingjie Li and Karen Zheng

Context
Software teams often track and make publicly available issues found in their software (e.g. using a tool like BugZilla). Increasingly, the automation of different software engineering activities is being powered by artifacts produced by human developers and users, such as the issues reported in issue tracking systems.

Aim
The aim of this project is to provide insights into how developers address, understand, and fix certain kinds of issues. In the issue tracking systems (see an example of a report here: https://issues.chromium.org/issues/492894211), the reporters (which can be users or third parties) provide details about the kind of issue or bug they found. Developers then comment on it and jointly try to come to an understanding of that the work consists of and how to fix it. We aim to collate this kind of data for a subcategory of bugs and try to characterize the discussions: the nature, the length and its relation to the complexity of the bug/issue. This problem will require reasoning about both code (to understand the complexity) and text. We will analyze the comments leveraging LLM-based NLP techniques and the code leveraging light-weight static analysis tools.

Deliverables
The project should deliver:

a dataset of issues tracked in large open-source systems,
an LLM-based framework for analyzing such issues, and
a technical report (or a publication) to elucidate the insights by applying the framework on the data.

Ultimately, we hope the report to guide and inform automated tools that rely on issue tracking systems.

Candidate Requirements

Essential	Strong programming skills in Python and experience with ML
Desirable	Experience with big data, LLMs

Duration: 8 weeks (part-time available)

Preferred start date: 15/06/2026 (flexible)

Project team: Rik Sarkar, rsarkar@inf.ed.ac.uk (PI); Michele Ciampi and Thomas Wong

Zero knowledge proofs are a reliable and secure way of creating trustworthy decentralised systems. Several zero knowledge frameworks for ML have been published recently, such as zkLLM (2024), zkGPT (2025), zkTorch (2025) etc. It is unclear how the performance of these frameworks scale and compare.

Measurements carried out in this project on different frameworks and models of different sizes, will form the basis of practical system development, and act as preliminary data for further research in this area.

Aims

The student will select a set of tasks and models of various sizes to be benchmarked under different frameworks. The selection will be carried out in consultation with the supervisors and tutor.

The Tutor has experience in these frameworks and will help the intern set up the frameworks on the research cluster.

The intern will record the several metrics for the frameworks, including proof generation time (on CPU and GPU), memory usage, proof size , verification time, and accuracy degradation due to quantisation.

Deliverables

1. An open source reproducible benchmark suite useful to the community.
2. The intern will write a short report accompanying the benchmark suite describing the measurement result, and the strengths, weaknesses and bottlenecks of various state-of-the-art frameworks.

Candidate Requirements

Essential	Good programming skills. Python. Familiarity with basic machine learning.
Desirable	Familiarity with basic cryptography.

Duration: 8 weeks (part-time available)

Preferred start date: 15/06/2026 (flexible)

Project team: Tariq Elahi, t.elahi@ed.ac.uk (PI), Marc Juarez Miro and Aradhika Bagchi

Context
Online platforms increasingly need reliable ways to verify users’ ages in order to restrict access to age-inappropriate content. In the past these checks were easily circumvented. For example, website asking the user to check a box to affirm that they were above 18 to enter a restricted website. governments around the world, including the UK and the US have made it a legal obligation for websites and online platforms, such as Discord, Reddit, and Matrix, to enforce age verification more strongly.

One popular age verification method is "Age Estimation" based on facial recognition machine learning models. To use them, the user’s camera takes pictures (selfies) of their face and then the model runs infers their age. Based on the result of the inference step, the user is either admitted to the website or denied. This method requires users to submit sensitive personal information (the biometric information in their selfie), which raises privacy concerns.

Aims
The aim of this project is to investigate privacy preserving machine learning techniques that can accurately estimate a user’s age from facial images while reducing the ability to infer other sensitive attributes such as gender, identity, or ethnicity.

The work will focus on developing and subsequently evaluating a system that learns a compressed facial representation designed specifically for age estimation, incorporating techniques from privacy-preserving machine learning such as adversarial training and differential privacy.

Expected deliverables
(1) implementation and evaluation of a prototype privacy-preserving age-verification system using facial image datasets;
(2) experiments measuring the trade-off between age-prediction accuracy and privacy leakage;
(3) a short technical report summarising the approach, results, and potential implications.

Candidate Requirements

Essential	Comfortable with Python, the command line, and has some background with security concepts (e.g. taking the Computer Security course in 3rd year).
Desirable	Experience with machine learning libraries and deploying jobs on GPU clusters.

Context

Research Experience 2026

Eligibility of candidates

Details of the internships

Applications for Summer 2026 are now closed.

Timeline

Research Experience Projects available for Summer 2026

Automating Verification via LLM Agents

Candidate Requirements

Lightweight Neural Network Quantization Pipeline using MLIR

Candidate Requirements

AI for Software Engineering

Candidate Requirements

Benchmarking Zero Knowledge Machine Learning Frameworks

Candidate Requirements

Limiting sneaky ML models from learning too much

Candidate Requirements

See also: