Speaker: Jaap Jumelet (University of Groningen, Center for Language and Cognition)
Talk Title: MultiBLiMP: A Multilingual Benchmark of Linguistic Minimal Pairs
Abstract: We introduce MultiBLiMP, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 120.000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic resources of Universal Dependencies and UniMorph. MultiBLiMP evaluates linguistic abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages.
Bio: Post-doctoral researcher at the University of Groningen, where I work with Arianna Bisazza. I am interested in the intersection of linguistics and NLP: what role can linguistics play in helping us understand LLM behaviour, and what role can LLM behaviour play in gaining novel insights into the structure of language?
Prior to my post-doc I did a PhD at the ILLC, University of Amsterdam, where I worked with Jelle Zuidema and Raquel Fernandez.