Meet our 2022 cohort. Passara Chanchotisatien [scald=5838:sdl_editor_representation {"alt":"Passara Chanchotisatien","caption":""}] PhD Project: Wearable Sensor-Based Insights into Respiratory Illnesses: Activity Patterns, Symptoms, and Environmental Factors Supervisors: DK Arvind, Amos Storkey, Mark Miller Respiratory illnesses, encompassing conditions such as asthma, chronic obstructive pulmonary disease (COPD), and long Covid, pose significant challenges to individuals and healthcare systems worldwide. This research leverages wearable sensor technology, including RESpeck and Airspeck devices, to gather high-frequency data on activity patterns, symptom occurrences, and environmental exposures in individuals with respiratory conditions. Statistical and machine learning techniques are applied to classify activities and vital signs such as respiratory rate and breathing flow/effort, and symptoms, such as cough episodes, to uncover causal relationships. By integrating these multidimensional data streams and collaborating with clinical experts, this research aims to investigate the relationship between respiratory health, physical activity level, and environmental factors for the purpose of predicting changes in their wellness. Wolf De Wulf [scald=5822:sdl_editor_representation {"alt":"Wolf De Wulf","caption":""}] Personal webpage PhD Project: Understanding Neural Spatial Codes in Goal-Directed Navigation Supervisors: Matthias Hennig, Matthew Nolan Investigating how neurons in the brain work together to form intelligence is a fundamental aspect of both neuroscience and artificial intelligence research. Understanding the neural codes behind a particular behaviour facilitates the development of treatments or cures for conditions related to that behaviour. The hippocampal formation is known to contain several types of neurons that encode different aspects of how we navigate the world. Various questions concerning these neurons remain to be answered. What exactly do they encode? How do they communicate? How do they relay information to other brain regions? Do they perform similar computations for other tasks? For this PhD project, I propose to investigate neural recordings of mice performing a navigation task to increase our understanding of the neural codes behind navigation. Ultimately, this research can facilitate investigations into how navigation deficits arise in neural disorders such as early-stage Alzheimer’s disease and autism spectrum disorder. Maria Dolak [scald=5823:sdl_editor_representation {"alt":"Maria Dolak","caption":""}] I am interested in combining bioinformatics and machine learning tools to investigate mechanistic pathways of cognitive disorders using population genomics. This could guide further research and healthcare decisions e.g., in dementia or lead to repurposing existing drugs. I am also interested in more fundamental neuroscience research in memory using computational methods. Achille Fraisse [scald=5824:sdl_editor_representation {"alt":"Achille Fraisse","caption":""}] PhD Project: Understanding Bacterial Response to Antibiotic Treatment Through Single-cell Analysis. Supervisors: Meriem El Karoui, Diego Oyarzún Antimicrobial resistance is a rising health concern and studying bacteria under antibiotic exposure at the single cell level is a promising approach to better understand it. To address this question experimentally, one experimental modality is to expose individual bacteria trapped in a micrometre-wide channel to antibiotics and image them using fluorescence microscopy. Thousands of bacteria can be imaged in parallel, over tens of generations using a microfluidics device called the “mother machine”, thus generating times series for individual cells. Whilst recent deep learning-based packages have been developed to segment and track the bacteria in the resulting images, the analysis of the resulting time series has so far relied on semi-manual procedures. The objective of my PhD is therefore to develop robust data analysis methods for these time series datasets. During my Individual Project I used time-series clustering to analyse trajectories of individual bacteria exposed to a DNA damaging antibiotic (Ciprofloxacin). The methods used were Dynamic Time Warping and kernel PCA for data featurization and k-means or DBSCAN for clustering on the featurized data. I found evidence that they are sub-populations undergoing different stress patterns in response to antibiotic exposure. I also found that some sub-populations exhibited different survival rates. This type of time-series data obtained from mother machine experiment is very rich and can be exploited in much more details than what is currently done. The first objective of my PhD will be to extend this work to other data with multiple growth conditions, antibiotics, and antibiotics combinations. The second objective will be to develop a toolbox for time series analysis of single cell-data from microfluidic devices. Aryo Gema [scald=5825:sdl_editor_representation {"alt":"Aryo Gema","caption":""}] Personal webpage PhD Project: Knowledge-Augmented Language Models for Temporal Patient Information Modelling Supervisors: Beatrice Alex, Pasquale Minervini Abstract Electronic Health Records (EHRs) hold important information about patient’s health over time. A significant part of this data is hidden in unstructured notes from medical professionals. Large Language Models (LLMs), advanced computer programs that can understand and process human language, can be used to extract insights from these narratives. However, a challenge arises when trying to include a vast amount of medical knowledge in these programs due to the high computational resources required and the possibility of mistakes, which is especially important in medical decision-making. This project proposes a solution in several steps. First, I plan to teach LLMs medical knowledge by using computationally efficient methods. Then, I will assess how much it helps if these programs are taught to find more information from external literature when needed. I will test these improvements using tasks that are agreed upon in the medical community. Next, I will explore techniques that teach these computer programs to be better at explaining why they make certain predictions by explaining the logical steps they have considered. Lastly, we want these programs to understand not just one doctor’s note, but all the notes from a patient’s history. This can help them give better advice by knowing how things changed over time. In summary, this project seeks to enhance the usability of EHRs by teaching these computer programs to better understand medical stories, using external knowledge, and improving accuracy and practicality in clinical applications. Dominik Grabarczyk [scald=5836:sdl_editor_representation {"alt":"Dominik Grabarczyk","caption":""}] PhD Project: Generative NLP for Designing mRNA Therapeutics Supervisors: Shay Cohen, Javier Alfaro mRNA therapeutics can be seen as templates for the actual drugs. When these are injected into our bodies, these templates find their ways into our cells where they are used to produce said drugs. They are useful because creating mRNA in a lab is a lot cheaper than creating drugs. One of their usecases is to make vaccines, as was done during the SARS-CoV-2 pan-demic. Their design is also relatively easy. In the context of vaccines, one has to find of create a protein which is representative of the virus and our immune systems can easily recognise. Then this protein is translated into mRNA which needs to be easy for our bodies to use. AI has been used to great effect both in finding and translating these parts, greatly speeding up the development of such drugs. However, it might be possible to find mRNA sequences directly. This would be beneficial because one can then find the right balance between making a good drug and one that is easily made by our body. This PhD will attempt to use AI to find mRNA sequences and see if this works at least as well as existing approaches. Then these AI tools will be modified to create more easily usable mRNA sequences. This will be challenging as it will require the sequences differ significantly from natural ones. Iris Ho [scald=5826:sdl_editor_representation {"alt":"Iris Ho","caption":""}] PhD Project: Unveiling Biological Significance: Exploring Health trajectory Representations with Transformer Models in Large Cohorts Supervisors: Sohan Seth, Konrad Rawlik, Bruce Guthrie In the realm of healthcare, traditional binary definitions of medical conditions prove inadequate in capturing the intricacies of an individual's medical history, socio-demographic factors, and the nuanced progression of diseases. This pioneering research aims to harness the potential of generative pre-trained transformer (GPT)-based medical history representations to revolutionize healthcare understanding across multiple dimensions. Firstly, it will explore multi-morbidity by utilizing GPT-based medical history representations for clustering health history trajectories and directly comparing results with manual curation, providing nuanced insights into complex disease relationships. Secondly, it will investigate the biological significance of these representations as a phenotype in prominent cohorts, conducting genome-wide association studies to unravel genetic and epigenetic associations shaping health trajectories. These associations are further integrated with existing Genome-Wide Association Studies (GWAS) data to establish biological pathways and causal mechanisms underpinning the variations in medical history representations. Lastly, the research ventures into disease progression analysis, exploring the paths individuals follow within their health histories, and probing whether genetics and epigenetics influence the speed and branching of these trajectories. In essence, this work pioneers a holistic approach that combines machine learning, genomics, and epidemiology to redefine medical conditions, unveil their biological foundations, and illuminate the dynamics of disease progression, ultimately contributing to a more personalized and effective healthcare landscape. Michal Kobiela [scald=5827:sdl_editor_representation {"alt":"Michal Kobiela","caption":""}] PhD Project: Computational optimization for the design of genetic circuits under misspecified models Supervisors: Michael Gutmann, Diego Oyarzún Synthetic biology seeks to design biological systems that perform novel functions. This is accomplished by introducing new genes into living cells. These genes possess the ability to impact one another, giving rise to interconnected networks of interactions referred to as genetic circuits. The construction of genetic circuits requires the specification of multiple components that need to be optimized. This encompasses both continuous controls and the selection of discrete architectural choices, which results in an extensive design space that is challenging to explore through wet lab experiments. To expedite the design of genetic circuits, computational optimization approaches are used to perform such design search, where the behaviour of the system is predicted by mechanistic models based on differential equations. While these mathematical models are useful, they only provide a simplified view of complex biological systems and our knowledge of their parameters is frequently limited, resulting in model misspecification. To address this issue, we propose to use Bayesian approaches to optimize genetic circuits simulated with a misspecified model or collection of models, that cannot be entirely trusted. By enforcing risk-averse objectives we will promote designs that are most likely to function in real-life experiments, even when the models' reliability is limited. Such objectives need to be maximized through computational optimization. We will investigate the use of modern Bayesian optimization techniques, which claim to be efficient in high-dimensional, mixed (discrete and continuous) spaces. We will assess whether the assumptions that allow these methods to be effective are applicable to the genetic circuit design. This has the potential to enable us to efficiently explore more complicated design spaces that are intractable to investigate using previously used methods. Consequently, this can enable better optimization of known architectures and the discovery of novel designs that can perform desired functions. Scott Pirrie [scald=5963:sdl_editor_representation {"alt":"Scott Pirrie","caption":""}] PhD Project: Prioritisation of cancer driver gene candidates within structural variation hotspots Supervisors: Colin Semple, Stuart Aitken Structurally diverse tumours such as HGSOC and glioblastoma often show regions subject to recurrent SVs across tumour samples, and it is known that some of these variants are targets of selection and play critical roles in tumourigenesis. Some have direct functional consequences, such as amplifications of oncogenes, while others appear to affect the expression of nearby genes, but most remain mysterious (Li et al, Nature, 2020). It remains challenging to detect the influence of selection in generating the SVs observed, against the background of the highly variable mutation rate across the tumour genome, without a proxy for this rate. We developed such a proxy, in the form of accurate genome-wide regression models of DNA double strand break (DSB) susceptibility (Ballinger et al, Genome Biol, 2019); allowing us to model the probability of observing recurrent SV breakpoints at any locus due to chromatin-mediated mutational bias alone. However, there are now opportunities to improve these models using additional data, and discover novel driver genes. Firstly, our unpublished HGSOC data show that gene expression patterns can be a potent tool in discerning particularly promising candidate genes from passenger genes within hotspots, and we will include these patterns to improve model predictions. Secondly, abundant genome-wide CRISPR screen data have accumulated, providing rich gene essentiality data for 112 ovarian cancer and 161 glioma cell lines (Boehm et al, Nature, 2021) which have not previously been used in driver gene prioritisation. Thirdly, recent work has shown that SV length distributions have great potential in driver gene prediction (Shia et al, Nature, 2023), though this approach has not been used with WGS data. We will explore the potential of these three additional data types using our unpublished HGSOC and glioblastoma WGS/RNAseq cohorts, as well as published WGS data for 2700 other tumours (ICGS PCAWG, Nature, 2020), to predict novel cancer driver genes within SV hotspots. Stefani Tirkova [scald=5828:sdl_editor_representation {"alt":"Stefani Tirkova","caption":""}] Personal webpage PhD Project: Patient Stratification of Hard-to-Diagnose Diseases using Patient Similarity Networks Supervisors: Ian Simpson, Riccardo Marioni The primary objective of this PhD project is to enhance diagnostic processes for complex and hard-to-diagnose diseases by leveraging Patient Similarity Networks (PSNs). Building on methodologies developed in previous work on Autism Spectrum Disorder (ASD), this research will first focus on Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). Using multi-modal data from the UK Biobank, which includes genetic, phenotypic, and electronic health records (EHRs), this study aims to stratify ME/CFS patients into homogeneous subgroups. The study will employ Random Forest analysis to identify and rank informative features across diverse data types. These features will be used to construct PSNs, and community detection algorithms, such as the Louvain method, will be utilised to identify potential subgroups within the ME/CFS cohort. Predictive models for ME/CFS diagnosis and subgroup classification will be developed to estimate the prevalence of the condition in the larger UKBB population. The methodologies developed will be validated by extending the approach to other hard-to-diagnose conditions, with a particular focus on fibromyalgia. By improving disease stratification and diagnostic accuracy, this research aims to contribute significantly to personalised treatment approaches and the efficient execution of clinical trials, thereby advancing our understanding of complex disease mechanisms. Ke Wang [scald=5829:sdl_editor_representation {"alt":"Ke Wang","caption":""}] PhD Project: Deep Dependency Parsing Techniques for Enhanced RNA Secondary Structure Prediction: Bridging NLP Techniques with Molecular Biology Supervisors: Shay Cohen, Grzegorz Kudla In the realm of computational biology, accurately predicting RNA secondary structure is pivotal for understanding intricate biological processes and diagnosing diseases. Traditional methods, while pioneering, are limited in capturing the multifaceted dynamics inherent to RNA structures. This research proposal outlines the development of RNA-BiaffineParser, an innovative deep learning model that employs dependency parsing techniques from the field of Natural Language Processing (NLP) for RNA secondary structure prediction. By identifying similarities between NLP's syntactic parsing and RNA folding mechanisms, the proposal presents a groundbreaking interdisciplinary approach to tackle this enduring challenge. Notably, overfitting remains a significant concern for high-parameter machine learning models. To address this, our research aims to integrate predictions of RNA structures alongside Turner's nearest-neighbour free energy parameters. We propose to utilize thermodynamic regularization during the model training phase to ensure a close alignment between predicted RNA structures and calculated free energies, thus mitigating the risk of overfitting and enhancing model robustness. The proposed research sets the stage for a transformative computational tool capable of predicting the secondary structures of novel RNA families, thereby contributing to both computational biology and linguistics. Lars Werne [scald=5837:sdl_editor_representation {"alt":"Lars Werne","caption":""}] PhD Project: Computational modelling of episodic memory in the context of Post-traumatic Stress Disorder Supervisors: Peggy Seriès, Angus Chadwick In humans, episodic memory describes the ability to recall events one has experienced in the past. Computational models of episodic memory have been proposed, in which artificial neurons, arranged in biologically plausible networks, communicate in ways conjectured to correspond to how the biological brain stores or retrieves such memories. Briefly, it is widely believed that patterns of neural activity in the brain’s neocortex, initiated by sensory stimuli, reshape synaptic connections between neurons in the hippocampus. This reshaping is thought to occur in a way that later enables the collective activity of the hippocampal neurons to re-instantiate the pattern of neocortical activity. In computational models, this population activity can be represented as a point in a high-dimensional vector space; during the recall of a specific memory, the hippocampal activity may be thought of traversing a low-dimensional subregion of that space – called a `continuous attractor’. Investigating plausible computational models of cognitive processes may be a means of validating or formulating hypotheses about the neural underpinnings of mental disorders, which, in many cases, are insufficiently understood. For instance, this approach has notably been chosen in the context of memory to investigate disruptions of working (or short-term) memory in Schizophrenia. Inspired by these research efforts, we propose to formulate and study a computational model of episodic memory to gain theoretical insights into the neural mechanisms underlying Post-traumatic Stress Disorder (PTSD). Sparked by traumatic, high-impact life events, PTSD has a twofold effect on episodic memory. On the one hand, the traumatic episode may be re-experienced as nightmares or intrusive memories. On the other hand, PTSD patients have commonly reported deficits in episodic memory unrelated to the episode. Although these effects have been observed across many studies, their neural correlates are not precisely known. Several computational models of PTSD have been proposed but have generally focussed on a behavioural level, not representing the underlying neural activity. Gaining an improved understanding of the relevant neural mechanisms would be beneficial as it may, e.g., inform novel avenues for treatment. In this project, we will aim to deduce a computational neural model of episodic memory. We will take a previously proposed continuous attractor model of memories stored in the hippocampus as a starting point, supplementing it with a module representing the neocortex. We will build on theoretical results and insights from past modelling studies to incorporate a plausible mechanism by which external stimuli may cue the storage or retrieval of episodic memories. When evaluating the model’s behaviour through simulations, we will be particularly interested in investigating neural mechanisms by which a stored memory may become more likely to be recalled by partial cues. These investigations could yield insights into the putative causes of flashbacks in PTSD. Further, we hope to explore the extent to which these exact mechanisms may negatively affect the recall of other unrelated memories. This article was published on 2024-11-22