ICSA Colloquium Talk - 19/05/2025 | ICSA

Title: Allo: Catalyzing Accelerator Design and Programming for Machine Learning

Abstract: As the benefits of technology scaling diminish, specialized hardware accelerators are crucial for performance in emerging machine learning applications. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures. Existing high-level synthesis (HLS) tools often require intrusive source-level changes to attain satisfactory quality of results. While new accelerator design languages (ADLs) aim to enhance or replace HLS, they are typically more effective for simple applications with a single kernel, rather than for hierarchical designs with multiple kernels.In this talk, I will introduce Allo, a composable programming model for efficient hardware accelerator design (published in PLDI’24). Allo decouples hardware customizations, including compute, memory, communication, and data types from algorithm specification, and encapsulates them as a set of verifiable customization primitives. Allo also preserves the hierarchical structure of an input program by combining customizations from different functions in a bottom-up, type-safe manner, enabling both temporal and spatial composition. Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs in all test cases in the PolyBench suite.I will then delve into two case studies demonstrating Allo’s effectiveness on large-scale designs. First, we describe a spatial accelerator for large language models (LLMs) prototyped on an AMD U280 FPGA, achieving a 1.9x speedup and 5.7x improvement in energy efficiency compared to NVIDIA A100 GPUs during generative inference. Second, I will showcase a convolutional neural network (CNN) design deployed on the AMD Ryzen AI Engine that delivers substantial speedups over prior methods. The related papers are published in FCCM’24 and FPGA’25.

Bio: Hongzheng Chen is a fourth-year Ph.D. student at Cornell University supervised by Prof. Zhiru Zhang. His research interests broadly lie in compilers, programming systems, and accelerator architecture for large-scale heterogeneous computing, with an emphasis on optimizing machine learning workloads. He has published over 10 papers on top-tier computer systems & hardware conferences, including ASPLOS, PLDI, SC, and FPGA. His work has received three Best Paper nominations at the FPGA conference, one of which won the Best Paper Award. He was selected as one of the ML and Systems Rising Stars in 2024.