Friday 24 October - 11am

Speaker: Lucy Farnik (University of Bristol)

Title: Opening the black box: Finding small units of interpretable computation in LLMs

Abstract: LLMs are black boxes. We cannot understand how they make their decisions, which means we cannot guarantee that they are working the way we want. How can we address this? In this talk, I’ll give an overview of the field of AI interpretability and I’ll talk about my latest paper, which introduced a method for finding small units of interpretable computation inside of LLMs. This allows us to see how an LLM takes a small handful of concepts and uses them to decide whether another (often more complex) concept is or isn’t relevant to the situation at hand.

Biography: Lucy Farnik is a PhD student at the University of Bristol. Her research focuses on large language models, specifically LLM robustness and safety. She has collaborated with researchers from Google DeepMind, Oxford, and UC Berkeley, and has published multiple papers at ICML and ICLR. Prior to ML research, she was a senior developer at a US-based startup. You can read more about her at lucyfarnik.github.io