Wednesday 26 November - 11am

Speaker: Le-Minh Nguyen (Japan Advanced Institute of Science and Technology, JAIST)

Title: Efficient Large Language Models: From Speculative Decoding to Model Pruning      

Abstract: Inference with modern Large Language Models (LLMs) is both computationally intensive and resource-demanding, posing challenges for real-world deployment. Two promising research directions—speculative decoding and post-training pruning (PTP)—offer complementary solutions for improving efficiency. However, existing approaches in both areas face notable limitations. Training-based speculative decoding requires a draft model, which is often difficult to obtain and lacks generalizability, while training-free methods typically yield only modest speedups. Similarly, current PTP techniques perform optimally only within narrow sparsity ranges, limiting their robustness across different architectures and compression levels.

To address these challenges, we introduce two frameworks designed to accelerate and optimize LLMs without additional training or performance degradation. SPECTRA is a novel speculative decoding framework that leverages both internal and external speculation to achieve up to 4.08× faster inference, surpassing state-of-the-art training-free methods. OPTIPRUNE, on the other hand, provides a unified pruning strategy effective across all sparsity levels by dynamically adapting between uniform and non-uniform sparsity and integrating relative-importance and Hessian-based criteria.

Together, these methods demonstrate complementary pathways toward efficient and scalable LLM deployment. Extensive experiments across diverse benchmarks and model architectures validate their effectiveness and robustness. 

Biography: Le-Minh Nguyen is currently a Professor of the School of Information Science and the director of the Interpretable AI Center at JAIST. He leads the Machine Learning and Natural Language Understanding Laboratory at JAIST. He is currently taking his sabbatical at Imperial College London, UK (Until April 2026).  His research interests include machine learning & deep learning, natural language processing, legal text processing, and explainable AI. He serves as an action editor of TACL (a leading journal in NLP), a board member of VLSP (Vietnamese language and speech processing), and an editorial board member of AI &Law, Journal of Natural Language Processing (Cambridge). He is a steering committee of Juris-informatics (Jurisin) in Japan – a research area that studies legal issues from informatics.