Friday 22 August 2025 - 11.00 | ILCC

Speaker: Vivek Iyer (School of Informatics)

Title: Personalized Reward Models for Multilingual Generation Tasks from Implicit User Preferences

Abstract: Reward models (RMs) evaluate the responses of large language models (LLMs) by measuring how closely they align with human preferences. However, human preferences vary significantly across users. Efforts on personalization of RMs remains underexplored, with initial works exploring personalization with explicit persona preferences. In this work, we introduce the problem of Personalized Reward Modelling from implicitly expressed preferences available in a persona’s usage data. Given privacy issues with accessing real user data, we propose a novel framework to generate synthetic training and evaluation data for this task — that consists of personalized chosen-rejected pairs, coupled with synthetic usage data. We also move forward from the previous English-only focus of previous personalization research by focusing on multilingual and cross-lingual generation tasks like open-ended generation and story transcreation, forming the second major contribution of this work. Our experiments show that even SOTA generative LLMs, like Gemini 2.5 Pro, can only achieve 64\% binary classification accuracy for some tasks, with ablations revealing significant challenges in reasoning over implicit preferences. Finally, we train Personalized Reward Models across 2 tasks and 6 personalization dimensions, and show that we can achieve similar or better performance with low-rank (LoRA) fine-tuning of open source 7B parameter reward models. We intend to publicly release all data, code and models to facilitate further research.

Biography: Vivek Iyer is a third-year PhD student at the University of Edinburgh, supervised by Dr. Alexandra Birch. His interests primarily revolve around creative and open-ended tasks in Machine Translation, such as transcreation, cultural localization and personalization. He is an Apple AI/ML Scholar of 2025, and recently finished an internship at Apple with their Machine Translation team, wherein he worked on Personalized Reward Models, which forms the subject of this talk.