Friday, 14th February - 11am Jakob Prange : Seminar | ILCC

Talk Title: Linguistic Graphs - Small Language Models?

Abstract: Language models such as the GPT series are getting larger and larger. In some tasks and metrics, large language models (LLMs) outperform humans. In many others, they are easily fooled. It is beyond any debate that modern LLM capabilities are extremely impressive, but also that they do not handle language like a human does. Linguistic theories, supported by psychological evidence, maintain that humans process language hierarchically. This hierarchical structure can be encoded computationally as trees or DAGs. In the face of the impressive performance of LLMs, the question then arises whether linguistic graphs can tell us something about the structure of language that sequential neural networks cannot.

To put it more precisely, can combined neuro-symbolic models be better models of language than purely neural ones?

After summarizing several recent publications addressing this problem from different angles, I finally arrive at the question of what a "good model of language" is, or should be.

In Prange et al. (TACL, 2021), we propose novel methods for top-down tree-structured prediction that account for the internal structure of linguistic categories called CCG supertags. Traditionally treated as opaque labels, supertags form an open-ended and sparse distribution. Our best model needs only a fraction of the parameters of state-of-the-art alternatives to match their performance on frequent tags, and additionally recovers a sizeable portion of rare and even unseen ones.

In a different series of studies (Prange et al., NAACL-HLT 2022; Prange and Chersoni, *SEM 2023), we examine if and how different linguistic graph representations can complement and improve the GPT-2 model. We develop several neural graph encoding methods, following the maxim "simple but effective". The final representation requires only a handful of parameters per token and is twice as fast as R-GCN, a popular graph-convolution-based method, while still showing a perplexity advantage compared to the neural baseline.

References:

Jakob Prange, Nathan Schneider, and Vivek Srikumar, 2021.

"Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories."

In: Transactions of the Association for Computational Linguistics (TACL), MIT Press, 9:243-260

URL: https://doi.org/10.1162/tacl_a_00364

Jakob Prange, Nathan Schneider, and Lingpeng Kong, 2022.

"Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling."

In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), ACL, 4375-4391.

URL: https://aclanthology.org/2022.naacl-main.325/

Jakob Prange and Emmanuele Chersoni, 2023.

"Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures."

In: Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM), ACL, 456-468

URL: https://aclanthology.org/2023.starsem-1.40/

Bio: Jakob Prange is a research associate at the Chair for Natural Language Understanding in the Faculty of Applied Informatics at the University of Augsburg, Germany. He previously held a Distinguished Postdoctoral Fellowship at the Hong Kong Polytechnic University after completing his PhD at Georgetown University under Nathan Schneider.

In his research, Jakob integrates symbolic linguistic frameworks into neural models to make them more efficient and interpretable, and applies neural learning and prediction techniques to better understand the relationships between language, context, and representation.

His current goals are to make NLP models smaller and to use them for social good, e.g. by detecting and understanding greenwashing in company reports.