Friday, 2nd May at 11am - Jaap Jumelet

Speaker: Jaap Jumelet (University of Groningen, Center for Language and Cognition)

Talk Title: MultiBLiMP: A Multilingual Benchmark of Linguistic Minimal Pairs

Abstract: We introduce MultiBLiMP, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 120.000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic resources of Universal Dependencies and UniMorph. MultiBLiMP evaluates linguistic abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages.

Bio: Post-doctoral researcher at the University of Groningen, where I work with Arianna Bisazza. I am interested in the intersection of linguistics and NLP: what role can linguistics play in helping us understand LLM behaviour, and what role can LLM behaviour play in gaining novel insights into the structure of language?

Prior to my post-doc I did a PhD at the ILLC, University of Amsterdam, where I worked with Jelle Zuidema and Raquel Fernandez. 

Image of Jaap Jumelet