ICSA Colloquium Talk - 27/02/2025

Title: The Hidden Bloat in Machine Learning Systems

Abstract: In this talk, we explore the problem of bloat in machine learning (ML) systems and present key findings from our study. We begin by analyzing bloat in ML containers, which are widely used in modern cloud computing environments.

Next, we examine code bloat in shared libraries of popular ML frameworks such as TensorFlow and PyTorch. We introduce our approaches for debloating both ML containers and shared libraries. Our study reveals that both ML containers and shared libraries include a substantial amount of unnecessary files and code, which bloat their size and lead to performance degradation, resource waste, and increased security vulnerabilities.

Speaker Bio: Huaifeng Zhang is a Ph.D. student in the Department of Computer Science at Chalmers University of Technology, advised by Prof. Ahmed Ali-Eldin Hassan. His research interests include software engineering, machine learning systems, cloud computing and blockchain. He also held positions at ByteDance and Google.