Tuesday 11 November 2025

Host and Speaker: Chris Williams

Title: Comparing Machine Learning Models

Abstract: In machine learning, it is very common that we wish to compare two or more different models on one or more tasks.  Despite this, very few textbooks give advice on how to do this, and the comparison is often badly done. In this talk I will cover various aspects of the problem, including: what is the question we wish to answer?; factors of variation and experimental design; issues with null hypothesis significance testing (NHST); and frequentist and Bayesian tests. The analysis follows closely with that of Benavoli et al (2017), "Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis" JMLR 18(77) 1-36.