2 December 2019 - Valerio Basile (University of Turin) | AIAI

Title

Computational Linguistics against Hate: Resources, Models, and Evaluation to Monitor and Contrast Abusive Language Online.

Abstract

The explosion of social media represents a cornucopia for the scholar interested in modeling natural language and online human interaction.

Unfortunately, sharks swims in this ocean of data. Hate speech, cyberbullying, misogyny, homophobia, are all phenomena that find their expression in online social media at a worryingly increasing pace.

While legislators try to keep up, often to varying extents, from a computational perspective we need to model this family of problems, and empirically test our models. Natural Language Processing and Machine Learning provide useful tools to create computational models of hate speech and related phenomena, able to predict their presence in unseen data. However, supervised learning relies on manually annotated data, created with procedures that are not only costly, but also more and more problematic the more we look into highly subjective and controversial phenomena. In this talk, I will first give an overview of the current state of the art in hate speech detection, including the results of recent large-scale evaluation campaigns. In the second part of the talk, I will present the results of our recent efforts to harmonize supervised machine learning with human biases and conflicting definitions.

Biography

Valerio Basile is an assistant professor at the Department of Computer Science of the University of Turin. He received his PhD in 2015 from the University of Groningen with a thesis on Natural Language Generation from logical forms, including the creation of the semantically annotated corpus Groningen Meaning Bank. He then worked as a postdoc at Inria Sophia Antipolis on the European project ALOOF on natural language processing and Web-based semantics for domestic robots. In recent years, his research interests shifted to sentiment analysis, in particular on online social media, and the modeling of multilingual abusive language and hate speech.