ANC Seminar - 12/11/2024

Peter Flach (Professor of Artificial Intelligence at the University of Bristol)

Event host: Henry Gouk

Title: Artificial Intelligence, Measured for Safety -- Towards an actionable science of AI metrology

Abstract: Artificial Intelligence safety is an area of intense scrutiny, with UK and US both having recently established AI Safety Institutes. However, our understanding of the risks posed by contemporary AI systems is preliminary and incomplete. It doesn't help that the media-fuelled narrative around AI is exceedingly simplistic, usually extrapolating from a fictitious one-dimensional, exponentially accelerating timeline from Kasparov’s defeat against Deep Blue in 1997, more game AI with Watson in Jeopardy (2011) and DeepMind's AlphaGo (2016), to today's GenAI chatbots being able to generate seemingly convincing, human-like text and AI underpinning this year's Nobel prizes in physics and chemistry. . 

Objectively speaking this portrayal is fraught with difficulties, for a wide range of fundamental reasons. One is that, in competitive settings such as the first three, what really matters is not just the observed outcome but a robust estimate of its likelihood: if we re-ran these contests a number of times, what distribution of wins and losses would we expect? Another is that in many cases the task and intended outcomes are ill-defined: what does it mean to accurately predict protein structures? How do we measure human-likeness of text? Does the AI system convey a degree of confidence with its outputs? Can it explain its reasoning, and take corrections or feedback into account? Real-life situations are multi-faceted, and narrowing performance assessment to a single directly observable metric – itself often a mere proxy for what we are really interested in – is misleading if not dangerous. 

In this talk I will describe current and planned research towards an actionable science of AI metrology. I will review classical and recent work in producing calibrated probability estimates which directly addresses issues around confidence and distribution of outcomes. I will then explore the links between performance assessment of machines and human evaluation. Taking inspiration from cognitive science and psychometrics will allow us to come up with more meaningful measuring instruments, standards and benchmarks and move away from the overly simplistic league table approach that has been dominant in machine learning and AI for too long.

Bio: Peter Flach is Professor of Artificial Intelligence at the University of Bristol. An internationally leading scholar in the evaluation and improvement of machine learning models using ROC analysis and calibration, he has also published on mining highly structured data, on knowledge-driven and explainable AI, and on the methodology of data science. He is author of Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012), the latter of which has, to date, sold over 20,000 copies and has established itself as a key reference in machine learning with translations into Russian, Mandarin and Japanese.

Event type: Seminar

Date: Tuesday, 12th November

Time: 11:00

Location: G.03

Speaker(s): Peter Flach

Chair/Host: Henry Gouk