Friday, 4th October - 12.00 Eneko Agirre : Seminar

Title:  LLMs and low-resource languages

Abstract:

Generative AI models are now multilingual, raising new questions about their relative performance across languages and local cultures, specially for communities with less speakers. In this talk I will explore some of those questions and the lessons we learned along the process. Is it possible to build high-performing LLMs for low-resource languages? We have built a high performing open model for Basque accompanied by a fully reproducible end-to-end evaluation suite. Do LLMs think better in English than the local language? Our experiments show that LLMs do not fully exploit their multilingual potential when prompted in non-English languages. Do LLMs know about local culture? We probed the complex interaction between language and global/local knowledge, showing for the first time that local knowledge is transferred from the low-resource to the high-resource language, a sign that prior findings may not hold when evaluated on local topics. The evaluation suite was recognised with a best resource paper award at ACL 2024

Bio:

Eneko Agirre is Full Professor of Informatics and Head of HiTZ Basque Center of Language Technology at the University of the Basque Country, UPV/EHU, in San Sebastian, Spain, and Visiting researcher or professor at New Mexico State, Melbourne, Southern California, Stanford and New York Universities. He has been active in Natural Language Processing and Computational Linguistics since his undergraduate days. He received the Spanish Informatics Research Award in 2021, and is one of the 74 fellows of the Association of Computational Linguistics (ACL). He was President of ACL's SIGLEX, member of the editorial board of Computational Linguistics, Journal of Artificial Intelligence Research and Action Editor for the Transactions of the ACL. He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM). He is a recipient of three Google Research Awards and six best paper awards and nominations, most recent at ACL 2024. Dissertations under his supervision received best PhD awards by EurAI, the Spanish NLP society and the Spanish Informatics Scientific Association. He has over 200 publications across a wide range of NLP and AI topics, as well as having given more than 20 invited talks, mostly international.

https://hitz.eus/eneko