Friday, 16th June - 11am Arya McCarthy : Seminar | ILCC

Title: Thousand-language learning, projection, and translation

Abstract:

The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level. First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance. Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.

Bio:

Arya McCarthy is a Ph.D. candidate at Johns Hopkins University, working on massively multilingual natural language processing. He is advised by David Yarowsky in the Center for Language and Speech Processing. His work focuses on improving translation and computational modeling of low-resource languages. Primarily, he approaches this through weakly supervised natural language processing at the scale of 1000s of languages. His work is supported by an Amazon Fellowship and Frederick Jelinek Fellowship.

Add to your calendar

vCal iCal