Wednesday, 31st May - 4pm Mario Giulianelli : Seminar | ILCC

Title: What comes next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

Abstract

Any unique language production context affords speakers with multiple plausible communicative intents, and any intent can be produced in multiple plausible ways—given the same story prompt, for example, different humans may tell very different stories.

Using multiple-reference datasets, we characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of output strings shaped by a generation system's predicted probability distribution and decoding algorithm to probe its uncertainty. For each test input, we measure the generator's calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model's representation of uncertainty.

(This work will be on arXiv on Monday, I will share the link then.)

Bio

Mario is a PhD candidate at the Institute for Logic, Language and Computation of the University of Amsterdam, where he works in the Dialogue Modelling Group. What he is most excited about is studying human communication strategies using computational models of language understanding and generation. This is the main topic of his PhD, which he addresses with machine learning, information theory, and statistical modelling. His interests also include the analysis, interpretability, and fair evaluation of NLP models, as well as the application of NLP techniques to the study of language variation and change in communities of speakers.

Add to your calendar

vCal iCal