ANC Workshop - 11/03/2025 Research at the Biomolecular Control Group 1) Deep representation learning for single-cell genotype-phenotype mappingLucas Guirardel - Y2 PhD student, School of Biological SciencesHigh-throughput single-cell experiments variant effect assays compare a larger number of variants than functional assays, but are also less direct, and need computational methods to infer variant impact. We present a supervised representation learning method based on single-cell RNA sequencing data and compare different training objectives. We show that the learned embeddings carry clinically relevant information. These representation models can be coupled to explainable Multiple Instance Learning models to identify cells that carry more information on variant impact. 2) Impact of DNA representations on sequence-to-expression machine learning modelsYuxin Shen - Y3 PhD student, School of Biological SciencesThe growing demand for biological products drives efforts to maximize heterologous protein expression. While one-hot encoding enables highly accurate sequence-to-expression ML models, they often fail to generalize. Here we show that mechanistic sequence features can provide gains on model generalization, improving their utility for sequence design. We also explore strategies to integrate different feature sets, including geometric stacking with a graph neural network. Our findings highlight the value of domain knowledge and feature engineering for accurate expression prediction. 3) Representation learning to analyze time series data from microfluidic experimentsAchille Fraisse - Y2 PhD student, School of InformaticsMicrofluidic devices allow us to track bacteria lives and their elongation and division patterns. Time series of those features can then be generated, and there is a lack of tools to analyze such data. I used an autoencoder model to learn representations of these curves. Then I showed that the encoder was capturing information about the time series such as different growth conditions, or the presence of an antibiotic. Mar 11 2025 13.00 - 14.00 ANC Workshop - 11/03/2025 Lucas Guirardel (PhD student, School of Biological Sciences) Yuxin Shen - (PhD student, School of Biological Sciences) Achille Fraisse - (PhD student, School of Informatics) Event host: Diego Oyarzun G.03, Informatics Forum
ANC Workshop - 11/03/2025 Research at the Biomolecular Control Group 1) Deep representation learning for single-cell genotype-phenotype mappingLucas Guirardel - Y2 PhD student, School of Biological SciencesHigh-throughput single-cell experiments variant effect assays compare a larger number of variants than functional assays, but are also less direct, and need computational methods to infer variant impact. We present a supervised representation learning method based on single-cell RNA sequencing data and compare different training objectives. We show that the learned embeddings carry clinically relevant information. These representation models can be coupled to explainable Multiple Instance Learning models to identify cells that carry more information on variant impact. 2) Impact of DNA representations on sequence-to-expression machine learning modelsYuxin Shen - Y3 PhD student, School of Biological SciencesThe growing demand for biological products drives efforts to maximize heterologous protein expression. While one-hot encoding enables highly accurate sequence-to-expression ML models, they often fail to generalize. Here we show that mechanistic sequence features can provide gains on model generalization, improving their utility for sequence design. We also explore strategies to integrate different feature sets, including geometric stacking with a graph neural network. Our findings highlight the value of domain knowledge and feature engineering for accurate expression prediction. 3) Representation learning to analyze time series data from microfluidic experimentsAchille Fraisse - Y2 PhD student, School of InformaticsMicrofluidic devices allow us to track bacteria lives and their elongation and division patterns. Time series of those features can then be generated, and there is a lack of tools to analyze such data. I used an autoencoder model to learn representations of these curves. Then I showed that the encoder was capturing information about the time series such as different growth conditions, or the presence of an antibiotic. Mar 11 2025 13.00 - 14.00 ANC Workshop - 11/03/2025 Lucas Guirardel (PhD student, School of Biological Sciences) Yuxin Shen - (PhD student, School of Biological Sciences) Achille Fraisse - (PhD student, School of Informatics) Event host: Diego Oyarzun G.03, Informatics Forum
Mar 11 2025 13.00 - 14.00 ANC Workshop - 11/03/2025 Lucas Guirardel (PhD student, School of Biological Sciences) Yuxin Shen - (PhD student, School of Biological Sciences) Achille Fraisse - (PhD student, School of Informatics) Event host: Diego Oyarzun