Meet our 2021 cohort. Raman Dutt [scald=5012:sdl_editor_representation {"alt":"Raman Dutt","caption":""}] Personal webpage PhD project: Self-supervised Transformers for Medical Image Analysis Supervisors: Timothy Hospedales, Sotiros Tsaftaris Medical image analysis has significantly benefited from advances in deep learning and computer vision. For a long time, the combination of supervised learning and Convolutional Neural Networks (CNNs) have dominated computer vision and hence, have been successfully adopted in medical imaging. This combination requires large, annotated and structured datasets to yield a good performance that remains a major bottleneck for medical imaging. Self-supervised learning is a training paradigm that does not require annotated datasets and recent advances have matched the results obtained through supervised learning. Further, a new algorithm termed Vision Transformer (ViT) has been introduced that delivered competitive results with respect to CNNs on different imaging tasks and benchmarks. Apart from a performance-based comparison, ViTs have also exhibited several other interesting properties leading to the development of different variants improving performance and efficiency. However, despite the success with natural images, the adoption of these two training methods remains limited for medical imaging tasks. In this project, we aim to explore the intersection of self-supervised learning and vision transformers to improve medical image analysis. Hans-Christof Gasser PhD project: Integrating immune-visibility targets into the computational protein design process Supervisors: Ajitha Rajan, Diego Oyarzún, Javier Alfaro Proteins have an arsenal of medical applications that include disrupting protein interactions, acting as potent vaccines, and replacing genetically deficient proteins. On one hand therapeutics must avoid triggering unwanted immune-responses directly against the therapeutic protein or against a vector protein. On the other hand, vaccines should support a robust immune-reaction targeting a broad range of pathogen variants. Taken together, there is a pressing need for techniques capable of integrating diverse immune-response objectives into the computational protein design process. To foster development in this area we are developing the CAPE (Controlled Amplitude of Present Epitopes) set of tools. In its final stage we aim to offer tools from a range of computational approaches modifying the immune-visibility to a broad range of immune-system components. Thibaut Goldsborough [scald=5798:sdl_editor_representation {"alt":"Thibaut Goldsborough","caption":""}] PhD project: Deep morphological analysis of tissue images for the segmentation and downstream characterisation of the cellular composition of tissues Supervisors: Peter Bankhead, Hakan Bilen Biomedical research often depends upon analysing biological tissues using a range of imaging techniques. The continual improvement of imaging platforms requires the constant development of image analysis pipelines that can deal with the higher dimensionality, complexity and size of datasets. This project aims to provide an open source and easy to implement image analysis pipeline to identify and characterise the cells that make up the diverse tissues in the human body. To do this, this project will focus on bridging the latest developments in deep learning with biological and mathematical insights of tissue imaging. Sebestyén Kamp [scald=5013:sdl_editor_representation {"alt":"Sebestyen Kamp","caption":""}] Personal profile PhD project: Analysing cancer networks evolution using graph neural networks Supervisors: Giovanni Stracquadanio, Ian Simpson Cancer is one of the deadliest diseases worldwide. In order to understand how a normal cell becomes cancerous and escape treatment, it is important to examine tumour progression. Here we propose a method that applies modern experimental techniques and artificial intelligence to uncover hidden relations. Using measurements from single cells, rather than 1000s of cells together, we can reconstruct how cancer evolves over time. From these experimental measurements we can compute the gene expression of each protein coding gene and thus we can build a network that describes the underlying connections of genes and pathways. By computational modeling with the help of graph neural networks, we can infer how tumors evolve over time and which molecular programs are activated or deactivated. Charlotte Merzbacher [scald=5009:sdl_editor_representation {"alt":"Charlotte Merzbacher","caption":""}] Personal webpage PhD project: Integrating Genome-Scale Mechanistic Modelling Approaches for Dynamic Pathway Metabolic Engineering Supervisors: Diego Oyarzún, Oisin MacAodha Manufacturing complex pharmaceutical chemicals can be expensive and produce toxic and unsustainable byproducts. Metabolic engineers can modify microorganisms like bacteria using genetic engineering techniques to create these chemicals instead. Forcing bacteria to produce large amounts of a foreign chemical can cause them to grow more slowly, reducing total overall product yield. Inserting genetic regulatory circuits which automatically respond to cellular conditions can help organisms dynamically respond to their environment and continue producing the desired product. However, creating these more complex circuits can require many costly experimental iterations. Computational optimization methods can speed the design process and reduce laboratory costs by simulating many potential pathways and suggesting the best options for experimentation. One commonly used computational method is Flux Balance Analysis (FBA), which assumes all chemicals in the cell remain at constant concentrations. This assumption is inaccurate for dynamic control circuits, which change enzyme expression levels in response to metabolite concentrations. Instead, ordinary differential equations (ODEs) are another approach used to describe biochemical dynamics using mathematical equations. Combining these two approaches would enable experimental metabolic engineers to model pathways under dynamic control while drawing on existing accurate models of host organism metabolism. In addition to integrating the two methods, we propose to design an algorithm to predict the best control methods for a genetically engineered pathway. Dominic Phillips [scald=5010:sdl_editor_representation {"alt":"Dominic Phillips","caption":""}] PhD project: AI-driven enhanced sampling of molecular dynamics simulations for applications in drug discovery Supervisors: Benedict Leimkuhler, Antonia Mey, Flaviu Cipcigan Project co-funded by IBM Research Medications contain active ingredients designed to interact with particular biological molecules in the body. Often these molecules are large, consisting of thousands of atoms. The ability to simulate large biological molecules thus aids the development of new, effective medicines. However, a key challenge is that these simulations can be prohibitively expensive, taking months to run on a supercomputer. Various so-called enhanced sampling simulation methods have been proposed to speed up these calculations. These methods usually work by incorporating expert knowledge of a molecule’s bonds or atoms that are most important for describing its dynamic behaviour. Enhanced sampling methods, therefore, tend to work well on smaller, well-studied molecules where there exists prior expert knowledge but less well on the more complex, larger molecules of biological significance. Recent developments in machine learning are promising new ways of improving enhanced sampling methods by automatically learning these relevant bonds and atoms from simulation data. But several limitations, such as poor accuracy and a lack of a standardised approach, prevent these innovations from being widely adopted by the pharmaceutical industry. We propose to develop a new enhanced sampling simulation framework that incorporates aspects of machine learning and experimental knowledge of biological molecules. The aim is to produce a framework that improves on the state-of-the-art in speed and accuracy. The framework will be applied first to small biological molecules before being scaled up to larger molecules studied in crucial medical research areas, such as antibiotic resistance and Alzheimer’s disease. Ben Philps [scald=5047:sdl_editor_representation {"alt":"Ben Philps","caption":""}] PhD project: Domain Generalization, Adaption and Model Robustness for improving trustworthiness in segmentation of stroke, White Matter Hyperintensities, abnormalities in brain MRI and related medical imaging domains Supervisors: Maria Valdez Hernandes, Miguel O. Bernabeu Recent advances in AI systems have yielded state of the art performance in a number of medical imaging domains. This project specifically examines the taskof segmentation of lesions such as White Matter Hyperintensities, Stroke and inflammation abnormalities in Fluid Attenuated Inversion Recovery (FLAIR) and T1-weighted Magnetic resonance imaging (MRI). Introduction of such AI systems in clinical practice however has a number of challenges beyond the predictive accuracy of a model. Ethical and Legislative issues around the safety and trustworthiness of AI systems prevent use in clinical settings. These requirements are underpinned a numerous technical properties, of which a core issue is Domain Generalization (DG). DG seeks to ensure that models continue to behave as expected as the data on which the model is tasked with analysing changes. Specifically, changes in scanning technologies and patient demographics can easily degrade model performance and potentially lead to unsafe outcomes. However, examining the generalization performance of DG strategies requires careful experimental design and testing frameworks, which are specifically lacking in the medical imaging domain. This project will therefore seek to not just develop strategies to ensure strong generalization performance on brain MRI data, but provide a broader medical imaging generalization testing framework for evaluating future work. Barry Ryan [scald=5008:sdl_editor_representation {"alt":"Barry Ryan","caption":""}] PhD project: Identification of sub-groups within Parkinson's Disease using Patient Similarity Networks Supervisors: Ian Simpson, Riccardo Marioni Motor loss symptoms resulting from Parkinson’s disease (PD) have a large number of causes and manifestations many of which remain unknown. Treatment of PD is difficult and current drug therapies have a large number of limitations. Individuals with PD undergoing treatment will undergo ON/OFF phases where treatment alleviates symptoms. Furthermore, treatment is only effective for a few years. Beginning PD treatment early can delay the onset of the disease and slow disease progression. PD has shown to have a hereditary incidence and through the study of familial cases of PD novel genetic associations have been found. These genetic cases account for approximately 30% of PD cases however they have not yet led to a neuroprotective therapy. Genetic associations have however increased the understanding of the disease resulting in many new genetic, electronic health record and longitudinal data sets. As PD has many different causes it is important to identify sub groups of PD individuals in a robust manner. Some therapies may be very effective for a small sub group but these findings are clouded by large clinical trials with many individuals not suited for this therapy. Incorporating these many novel datasets in a single computer network can identify novel genetic associations, identify sub groups for targeted clinical trials and identify biomarkers which could be useful for early diagnosis. Fiona Smith [scald=5011:sdl_editor_representation {"alt":"Fiona Smith","caption":""}] PhD project TBC Supervisors: Jacques Fleuriot, Ewa Majdak-Paredes Aleksandra Sobieska [scald=5007:sdl_editor_representation {"alt":"Aleksandra Sobieska","caption":""}] PhD project: Combining machine learning and polymer physics simulations to study the mechanisms of large-scale genome organisation in health and disease Supervisors: Chris Brackley, Kartic Subr The spatial organisation of the genome in the cell nucleus plays a vital role in the development of genetic diseases, such as cancers, via gene regulation. Inactive genome regions (containing silenced genes) tend to be close to other inactive regions, and at the nuclear periphery. Active genes tend to be located in the nuclear interior. Gene expression often requires that gene promoters come into physical contact with enhancers (regulatory sites which can be far away from the gene along the chromosome) within the so-called “active compartment”. As cells differentiate, different genes move between the active and inactive compartments. However, mechanisms driving this are still not fully understood. So-called “chromosome-conformation-capture” experiments, such as Hi-C, from which 3D genome structure can be inferred, are time-consuming, expensive, and difficult to interpret. Computational methods which can make predictions, help target new experiments or help us better understand these experimental datasets are therefore very useful. For example, machine learning (ML) has previously been used to predict Hi-C maps using data on DNA sequence and “histone modifications” (HMs; these are chemical “tags” which mark, or alter the properties of, different chromosome regions). Additionally, polymer physics simulations, where chromosomes are represented as chains of “beads” in simple mechanistic models, have been successfully used to understand some of the mechanisms driving the organisation, and also make predictions. In this project, ML and physics-based methods are combined with the aim to understand higher-level mammalian chromosome organisation at the whole-nucleus scale. Particularly, it is not clear what controls the relative positions of genes and chromosomes within the nucleus, and how these change through differentiation. Previous work with polymer simulations has used HM data to apply properties to beads; in the context of a lower-resolution whole-genome model, how to do this a priori is unclear. To overcome this, we use ML to generate the rules for converting HM data into a physics force field. The end result will be an ML-aided polymer model which can predict whole-genome organisation through differentiation. The mechanistic nature of polymer simulations allows interventions to be made, probing different scenarios to gain understanding of the underlying mechanisms. Publicly available Hi-C and HM data are used for model development, training, and validation. We will work closely with experimental collaborators to ensure the modelling tools can be quickly deployed to answer key biomedical questions, and can be applied to the latest experimental data. Xiao Yang [scald=5014:sdl_editor_representation {"alt":"Xiao Yang","caption":""}] PhD project: Platformization in Diagnostic AI: Examination of Different Strategies for Scaling-up Supervisors: Robin Williams, Michael Barany The process of deploying AI in healthcare practice is slower than expected. The challenges of procurement, validation, integration, and post-surveillance were potentially insurmountable for a standalone AI. My project was launched at a particular juncture when suppliers, users and regulators simultaneously resort to an intermediary strategy — platform. It refers to a strategy to solve the challenges of diagnostic AI deployment as a community, meanwhile connecting external players and internals in the long term. I will reflect upon the new solution drawing on STS theories with a focus on qualitative research methods in order to understand how this new institutional and technological machinery is being created to allow various diagnostic AI tools to be sustainably exploited at scale. Yongshuo Zong [scald=5015:sdl_editor_representation {"alt":"Yongshuo Zong","caption":""}] PhD project: Enhancing the Trustworthiness of Vision Large Language Models for Healthcare Supervisors: Timothy Hospedales, Yongxin Yang The rapid advancement of large language models (LLMs) has paved the way for the development of Vision Large Language Models (VLLMs), which build upon LLMs to offer promising capabilities in image understanding and vision-language reasoning. Leveraging this potential, VLLMs are increasingly being applied in healthcare for tasks such as medical image analysis and the generation of medical reports. However, despite their promising applications, VLLMs face significant trustworthiness challenges, including safety concerns, a lack of robustness, and a tendency to produce inaccurate or hallucinatory content. Our research is dedicated to enhancing the trustworthiness of VLLMs within healthcare contexts. By implementing targeted interventions and pioneering innovative methodologies, we aim to improve the safety, reliability, and overall effectiveness of VLLMs, thereby better supporting healthcare professionals and improving patient care outcomes. This article was published on 2024-11-22