Tuesday 24 March 2026 Host: Antonio VergariSpeaker: Adrian JavaloyTitle: An Embarrassingly Simple Way to Optimize Orthogonal Matrices at ScaleAbstract: Have you ever had a project where you wanted to learn rotation matrices? Did you get discouraged to continue because you didn't know how to do it, or because current optimizers wouldn't scale enough to enable your groundbreaking idea? Say no more, in this talk I will introduce you to our new orthoptimizer (yes, we made up a word). POGO enables the use of modern ML optimizers while ensuring that orthogonal constraints are effectively met. Remarkably, these improvements come at little to no cost, as POGO is fast and GPU-friendly, consisting of only 5 matrix products, and in practice maintains orthogonality at all times. POGO greatly outperforms recent orthoptimizers and shows it can optimize problems with thousands of orthogonal matrices in minutes while alternatives would take hours. Thus, POGO sets a milestone to finally exploit orthogonality constraints in ML at scale. (No excuses, POGO comes in a little and convenient PyTorch library: https://github.com/adrianjav/pogo).Speaker: Leander KurscheidtTitle: MAP-Predictions With Guaranteed ConstraintsAbstract: MAP predictions are very common in machine learning. This can be the most like position of a pedestrian at an intersection or the most-likely value in missing value imputation. But what if we need to meet certain guarantees? The pedestrian might have obstacles where can can not just walk over and the missing value has to adhere to the data-schema. I wills tart general and will talk about which classes of constrained MAP-problems can be solved accurately, efficiently, and scalably, and outline an approach to solving them. However, these have special, specific requirements that must be met. So I flank this talk by briefly discussing an approximate alternative and show some results on position prediction on the stanford drone dataset and constrained missing value imputation.Speaker: Andreas GrivasTitle: Fast and Expressive Multi-Token Prediction with Probabilistic CircuitsAbstract: Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), including byte-level LLMs, which are tokeniser-free but prohibitively slow. However, existing MTP methods often sacrifice expressiveness by assuming independence between future tokens. In this work, we investigate the trade-off between expressiveness and latency in MTP within the framework of probabilistic circuits (PCs). Our framework, named MTPC, allows one to explore different ways to encode the joint distributions over future tokens by selecting different circuit architectures, generalising classical models such as (hierarchical) mixture models, hidden Markov models and tensor networks. We show the efficacy of MTPC by retrofitting existing byte-level LLMs, such as EvaByte. Our experiments show that, when combined with speculative decoding, MTPC significantly speeds up generation compared to MTP with independence assumptions, while guaranteeing to retain the performance of the original verifier LLM. We also rigorously study the optimal trade-off between expressiveness and latency when exploring the possible parameterisations of MTPC, such as PC architectures and partial layer sharing between the verifier and draft LLMs.Speaker: Samuele BortolottiTitle: Reasoning Shortcuts in Neuro-symbolic AIAbstract: Neuro-symbolic (NeSy) AI aims to develop deep neural networks whose predictions comply with prior knowledge encoding, such as safety or structural constraints. As such, it represents one of the most promising avenues for developing reliable and trustworthy AI systems. The core idea behind NeSy AI is to combine neural and symbolic steps: neural networks are typically responsible for mapping low-level inputs into high-level symbolic concepts, while symbolic reasoning infers predictions compatible with the extracted concepts and the prior knowledge. Despite their promise, it was recently shown that -- whenever the concepts are not supervised directly -- NeSy models can be affected by Reasoning Shortcuts (RSs). That is, they can achieve high label accuracy by grounding the concepts incorrectly. RSs can compromise the interpretability of the model’s explanations, performance in out-of-distribution scenarios, and therefore the overall reliability of the system. At the same time, RSs are difficult to detect and prevent unless concept supervision is available, which is typically not the case in practical settings. In this talk, we provide a general introduction to reasoning shortcuts, discussing their causes and consequences in intuitive terms. We then review existing approaches for addressing RSs, including mitigation techniques and awareness strategies, and map their respective benefits and limitations. Mar 24 2026 13.00 - 15.00 Tuesday 24 March 2026 Speakers: Adrian Javaloy, Leander Kurscheidt, Andreas Grivas, Samuele Bortolotti IF, G.03
Tuesday 24 March 2026 Host: Antonio VergariSpeaker: Adrian JavaloyTitle: An Embarrassingly Simple Way to Optimize Orthogonal Matrices at ScaleAbstract: Have you ever had a project where you wanted to learn rotation matrices? Did you get discouraged to continue because you didn't know how to do it, or because current optimizers wouldn't scale enough to enable your groundbreaking idea? Say no more, in this talk I will introduce you to our new orthoptimizer (yes, we made up a word). POGO enables the use of modern ML optimizers while ensuring that orthogonal constraints are effectively met. Remarkably, these improvements come at little to no cost, as POGO is fast and GPU-friendly, consisting of only 5 matrix products, and in practice maintains orthogonality at all times. POGO greatly outperforms recent orthoptimizers and shows it can optimize problems with thousands of orthogonal matrices in minutes while alternatives would take hours. Thus, POGO sets a milestone to finally exploit orthogonality constraints in ML at scale. (No excuses, POGO comes in a little and convenient PyTorch library: https://github.com/adrianjav/pogo).Speaker: Leander KurscheidtTitle: MAP-Predictions With Guaranteed ConstraintsAbstract: MAP predictions are very common in machine learning. This can be the most like position of a pedestrian at an intersection or the most-likely value in missing value imputation. But what if we need to meet certain guarantees? The pedestrian might have obstacles where can can not just walk over and the missing value has to adhere to the data-schema. I wills tart general and will talk about which classes of constrained MAP-problems can be solved accurately, efficiently, and scalably, and outline an approach to solving them. However, these have special, specific requirements that must be met. So I flank this talk by briefly discussing an approximate alternative and show some results on position prediction on the stanford drone dataset and constrained missing value imputation.Speaker: Andreas GrivasTitle: Fast and Expressive Multi-Token Prediction with Probabilistic CircuitsAbstract: Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), including byte-level LLMs, which are tokeniser-free but prohibitively slow. However, existing MTP methods often sacrifice expressiveness by assuming independence between future tokens. In this work, we investigate the trade-off between expressiveness and latency in MTP within the framework of probabilistic circuits (PCs). Our framework, named MTPC, allows one to explore different ways to encode the joint distributions over future tokens by selecting different circuit architectures, generalising classical models such as (hierarchical) mixture models, hidden Markov models and tensor networks. We show the efficacy of MTPC by retrofitting existing byte-level LLMs, such as EvaByte. Our experiments show that, when combined with speculative decoding, MTPC significantly speeds up generation compared to MTP with independence assumptions, while guaranteeing to retain the performance of the original verifier LLM. We also rigorously study the optimal trade-off between expressiveness and latency when exploring the possible parameterisations of MTPC, such as PC architectures and partial layer sharing between the verifier and draft LLMs.Speaker: Samuele BortolottiTitle: Reasoning Shortcuts in Neuro-symbolic AIAbstract: Neuro-symbolic (NeSy) AI aims to develop deep neural networks whose predictions comply with prior knowledge encoding, such as safety or structural constraints. As such, it represents one of the most promising avenues for developing reliable and trustworthy AI systems. The core idea behind NeSy AI is to combine neural and symbolic steps: neural networks are typically responsible for mapping low-level inputs into high-level symbolic concepts, while symbolic reasoning infers predictions compatible with the extracted concepts and the prior knowledge. Despite their promise, it was recently shown that -- whenever the concepts are not supervised directly -- NeSy models can be affected by Reasoning Shortcuts (RSs). That is, they can achieve high label accuracy by grounding the concepts incorrectly. RSs can compromise the interpretability of the model’s explanations, performance in out-of-distribution scenarios, and therefore the overall reliability of the system. At the same time, RSs are difficult to detect and prevent unless concept supervision is available, which is typically not the case in practical settings. In this talk, we provide a general introduction to reasoning shortcuts, discussing their causes and consequences in intuitive terms. We then review existing approaches for addressing RSs, including mitigation techniques and awareness strategies, and map their respective benefits and limitations. Mar 24 2026 13.00 - 15.00 Tuesday 24 March 2026 Speakers: Adrian Javaloy, Leander Kurscheidt, Andreas Grivas, Samuele Bortolotti IF, G.03
Mar 24 2026 13.00 - 15.00 Tuesday 24 March 2026 Speakers: Adrian Javaloy, Leander Kurscheidt, Andreas Grivas, Samuele Bortolotti