A Turing AI Reasoning Workshop on 10 June 2022 organized by ELIAI This one-day Turing AI Reasoning Workshop was hosted by ELIAI and organized by Professor Antoni Vergari and Director Mirella Lapata. The multidisciplinary workshop offered a space to researchers and students at the University of Edinburgh to discuss and share their research on any aspect or application of reasoning. These include: reasoning with knowledge and interactions, visual reasoning and understanding, human cognition, reasoning with natural language and neuro-symbolic reasoning and learning. Five thematic sessions comprising fifteen talks and three panel discussions cross-fertilized the above research fields working on reasoning. Talks spanned the following topics: question answering, causal representation learning, visual and natural language entailment, scene and video understanding, abstract and logical reasoning, probabilistic reasoning under constraints, cognitive aspects of reasoning and generalization in humans and reasoning over incomplete knowledge bases. (Regretfully, not all of the workshop participants are included in the group pic below.) Image The sessions, presenters, talk titles, abstracts, and panels are listed below. Reasoning with Knowledge and Interactions Ricky Zhu Title: Automated Construction and Maintenance of Probabilistic Knowledge Bases from Logs Abstract: Knowledge bases (KBs) are ideal vehicles for tackling many challenges, such as Query Answering, Root Cause Analysis. Given that the world is changing over time, previously acquired knowledge can become outdated. Thus, we need methods to update the knowledge when new information comes and repair any identified faults in the constructed KBs. However, to the best of our knowledge, there are few research works in this area. In this paper, we propose a system called TREAT (Tacit Relation Extraction and Transformation) to automatically construct a probabilistic KB which is continuously evolving such that the knowledge remains probabilistically and logically consistent and up to date. Ionela Mocanu Title: Multi-Agent Epistemic Learning with PAC Semantics Abstract: Since knowledge engineering is an inherently challenging and somewhat unbounded task, machine learning has been widely proposed as an alternative. But most machine learning systems focus on inferring representations with respect to an underlying environment that assumes a single agent. However, in many real-world scenarios, we need to explicitly model multiple agents, where intelligent agents act towards achieving goals either by coordinating with the other agents or by overseeing the opponents moves if in a competitive context. In this sense, agents must reason about the knowledge of the other agents and make decisions based on it. While a number of sophisticated formal logics have been proposed for modelling such contexts, from areas such as philosophy, knowledge representation and game theory, they do not, unfortunately, address the problem of knowledge acquisition. In the current work, we consider the problem of agents having knowledge about the world and other agents, and then acquiring new knowledge (both about the world as well as other agents) in service of answering queries. We put forward a model of implicit learning, or more generally, learning to reason which bypasses the intractable step of producing an explicit representation of the learned knowledge. We show that polynomial-time learnability results can be obtained when reasoning is performed from the perspective of a single agent using the only knowing modal logic. Ram Ramamoorthy Title: Learning Relational and Cross-Modal Representations for Human-Robot Interaction Abstract: This talk concerns human-robot interaction paradigms wherein the human user wishes to give detailed instructions to the robot, imparting ranging from aspects of a high level plan to specifications for forceful interactions in dexterous manipulation tasks. This is important in practice for a variety of reasons including improving safety of human-robot interaction, and addressing the limitations of what can be achieved merely by presentation of positive examples. With this motivation, I will outline results from recent projects aimed at devising structured models, such as variational autoencoders, whose latent spaces and loss functions can be shaped accordingly. We will survey results from the following papers: 1. Y. Hristov, A. Lascarides, S. Ramamoorthy, Interpretable latent spaces for learning from demonstration, Conference on Robot Learning (CoRL), 2018. https://arxiv.org/abs/1807.06583 2. Y. Hristov, D. Angelov, A.Lascarides, M. Burke, S. Ramamoorthy, Disentangled Relational Representations for Explaining and Learning from Demonstration, Conference on Robot Learning (CoRL), 2019. https://arxiv.org/abs/1907.13627, https://sites.google.com/view/explain-n-repeat 3. Y. Hristov, S. Ramamoorthy, Learning from demonstration with weakly supervised disentanglement, In Proc. International Conference on Learning Representations (ICLR), 2021. https://arxiv.org/abs/2006.09107, https://sites.google.com/view/weak-label-lfd 4. C. Innes, S. Ramamoorthy, Elaborating on learned demonstrations with temporal logic specifications, Robotics: Science and Systems (R:SS), 2020. https://arxiv.org/abs/2002.00784, https://sites.google.com/view/ltl-dmp-rss-2020/ Visual Reasoning and Understanding Oisin Mac Aodha Title: The Limits of Self-Supervision in Vision and the Potential for Reasoning "In the Wild" Abstract: The success of our machine learning solutions hinges on the expressiveness of the representations we use for our data. In computer vision, recent self-supervised techniques have begun to close the gap between conventional supervised methods in terms of the effectiveness of the representations they can extract from unlabeled data. However, when we probe these methods on more challenging "fine-grained" datasets, we observe that the performance of these recent methods is still lagging. In this talk, I will present our recent work on evaluating the limits of self-supervised learning in vision and will hint at the possibilities that data captured "in the wild" offers for benchmarking open-ended visual reasoning. Arushi Goel Title: Scene Graph Generation - A Structured and Holistic Representation of Images Abstract: Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data. In this talk, I will discuss a novel framework for SGG training that exploits relation labels based on their informativeness. Our model-agnostic training procedure imputes missing informative relations for less informative samples in the training data and trains an SGG model on the imputed labels along with existing annotations. This approach can successfully be used in conjunction with state-of-the-art SGG methods and improves their performance significantly in multiple metrics on the popular Visual Genome benchmark for studying scene graph generation. Furthermore, we also obtain considerable improvements for unseen triplets in a more challenging zero-shot setting. Davide Moltisanti Title: Understanding Adverbs in Videos Abstract: Given a video showing a person performing an action, we are interested in understanding how the action is performed (e.g. chop quickly/finely, etc). Current methods for this underexplored task model adverbs as invertible action modifiers in a joint visual-text embedding space. However, these methods do not guide the model to look for salient visual cues in the video to learn how actions are performed. We thus suspect models learn spurious data correlations rather than actually learning the visual signature of an adverb. We first aim to demonstrate this, showing that when videos are altered (e.g. objects are masked, playback is edited) adverb recognition performance does not drop considerably. To address this limitation, we then plan to design a mixture-of-experts method that is trained to look for specific visual cues, e.g. the model should look at temporal dynamics for speed adverbs (e.g. quickly/slowly) or at spatial regions for completeness adverbs (e.g. fully/partially). Panel #1 Logical and Visual Reasoning Human Cognition and Reasoning Bonan Zhao Title: Bootstrap Learning of Complex Causal Concepts with Adaptor Grammars Abstract: Human learning and generalisation benefit from bootstrapping: we arrive at complex concepts by starting small and building upon past successes. Extending previous work on Bayesian-symbolic modelling of concept learning, we propose a computational account of human-like causal conceptual bootstrapping based on combinatory logic and adaptor grammars. In an intractably large hypothesis space, our model tackles the search problem for complex concepts via its native caching mechanism and facilitatory learning curriculums. In a series of experiments, we demonstrate that people indeed succeed in identifying a compound causal concept only after experiencing training data in a “helpful” order, where they first form an initial concept and then bootstrap learning the compound ground truth by reusing this newly-acquired concept. We show that a caching mechanism like that used in adaptor grammars is key to explaining human-like bootstrapping patterns in causal generalisation under facilitatory curriculums and the hindering pattern under a misleading "learning trap" curriculum. Alex Doumas Title: Relation Learning and Cross-Domain Transfer in Humans and Machines Abstract: People readily generalise prior knowledge to novel situations and stimuli. Advances in machine learning and artificial intelligence have begun to approximate and even surpass human performance in specific domains, but machine learning systems struggle to generalise information to untrained situations. We present a model that demonstrates human-like extrapolatory generalisation by learning and explicitly representing an open-ended set of relations characterising regularities within the domains it is exposed to. We show that the model when trained to play one video game (e.g., Breakout) generalises to a new game (e.g., Pong) with different rules, dimensions, and characteristics in a single shot. Second, the model can learn representations from different domains (e.g., 3D shape images) that immediately support learning in disparate domains like video games and psychological tasks. By exploiting well-established principles from cognitive psychology and neuroscience, the model learns structured (i.e., symbolic) representations without feedback, and without requiring knowledge of the relevant relations to be given a priori. The model's ability to generalise between different domains demonstrates the flexible transfer afforded by a capacity to learn not only statistical relations, but also structured relations that are useful for characterising the domain to be learned. We show that this kind of flexible, relational generalisation is only possible because the model is capable of representing relations explicitly, a capacity that is notably absent in extant statistical machine learning algorithms. Oghenerukevwe Kwakpovwe Title: Constraints on the Development of Inductive Biases Abstract: Young children from as young as 10 months old can acquire semantic knowledge of natural kind objects as they grow. Children can inductively generalise these categories to label novel stimuli. Insights into this process and how it develops over time are fundamental to understanding children’s cognition as well as the development of mental functions that make use of it such as memory and language - making this a very interesting research area from the perspective of a cognitive scientist for a better understanding of L1 language acquisition. This project aims to build a computational model of the developmental trajectory of biases that arise in the process of inductive generalisation that can help us to understand some of the potential constraints on the process and how these change over time as more data is given to an agent. It will do this using a Bayesian program induction. Tadeg Quillien Title: How Do People Make Causal Judgments? Abstract: Everything that happens has a multitude of causes, but people make causal judgments effortlessly. How and why do people highlight one particular cause (e.g. the lightning bolt that set the forest ablaze) out of the set of factors that contributed to the event (the oxygen in the air, the dry weather…)? I argue that people make causal judgments by simulating counterfactuals -- possible ways that the event could have happened. They tend to imagine counterfactual possibilities that are both a priori likely and similar to what actually happened. Then, they judge that a factor C caused effect E if C and E are highly correlated across these counterfactual possibilities. In a reanalysis of existing empirical data, and a set of new experiments, I find that this theory uniquely accounts for people’s causal intuitions. Panel #2 Human Reasoning and Cognition Neuro-Symbolic Reasoning and Learning Andrea Valenti Title: ChemAlgebra: a Benchmark for Algebraic Machine Reasoning through Chemical Reactions Prediction Abstract: Transformer architectures are currently the state-of-the-art models in a number of applications. These impressive performances on machine learning tasks suggest that Transformers could be suitable candidates for machine reasoning tasks. Current benchmarks for reasoning (mostly based on natural language) tend to contain spurious correlations that can be exploited by deep learning models to get artificially good results. On the other hand, chemical reactions prediction is a much more suitable task, as it is less ambiguous and harder to shortcut. In this presentation, we introduce ChemAlgebra, a new benchmark for a rigorous assessment of the reasoning capabilities of deep learning models. Siddharth N. Title: Drawing out of Distribution with Neuro-Symbolic Generative Models Abstract: Learning general-purpose representations from perceptual inputs is a hallmark of human intelligence. For example, people can write out numbers or characters, or even draw doodles, by characterising these tasks as different instantiations of the same generic underlying process -- compositional arrangements of different forms of pen strokes. Crucially, learning to do one task, say writing, implies reasonable competence at another, say drawing, on account of this shared process. We present Drawing out of Distribution (DooD), a neuro-symbolic generative model of stroke-based drawing that can learn such general-purpose representations. In contrast to prior work, DooD operates directly on images, requires no supervision or expensive test-time inference, and performs unsupervised amortised inference with a symbolic stroke model that better enables both interpretability and generalisation. We evaluate DooD on its ability to generalise across both data and tasks. We first perform zero-shot transfer from one dataset (e.g. MNIST) to another (e.g. Quickdraw), across five different datasets, and show that DooD clearly outperforms different baselines. An analysis of the learnt representations further highlights the benefits of adopting a symbolic stroke model. We then adopt a subset of the Omniglot challenge tasks, and evaluate its ability to generate new exemplars (both unconditionally and conditionally), and perform one-shot classification, showing that DooD matches the state of the art. Taken together, we demonstrate that DooD does indeed capture general-purpose representations across both data and task, and takes a further step towards building general and robust concept-learning systems. Antonio Vergari Title: Semantic Probabilistic Layers for Neuro-Symbolic Learning Abstract: We design a predictive layer for reasoning over structured-output prediction (SOP) tasks that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and symbolic constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks including hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction. Kobby Nuamah Title: Deep Algorithmic Question Answering: Towards a Compositionally Hybrid AI for Algorithmic Reasoning Abstract: An important aspect of artificial intelligence (AI) is the ability to reason in a step-by-step "algorithmic" manner that can be inspected and verified for its correctness. This is especially important in the domain of question answering (QA). While neural network models with end-to-end training pipelines perform well in specific applications such as image classification and language modelling, they cannot, on their own, successfully perform algorithmic reasoning, especially if the task spans multiple domains. We discuss a few notable exceptions and point out how they are still limited when the QA problem is widened to include other intelligence-requiring tasks. Our claim is that the challenge of algorithmic reasoning in QA can be effectively tackled with a "systems" approach to AI which features a hybrid use of symbolic and sub-symbolic methods including deep neural networks. In this talk, I propose an approach to algorithm reasoning for QA, "Deep Algorithmic Question Answering", based on three desirable properties: interpretability, generalizability, and robustness which such an AI system should possess. Additionally, I will discuss how we are trying to achieve these objectives in our work on the FRANK question answering system. Natural Language and Explanations Rimvydas Rubavicius Title: Interactive Symbol Grounding with Complex Referential Expressions Abstract: We present a procedure for learning to ground symbols from a sequence of stimuli consisting of an arbitrarily complex noun phrase (e.g. ``all but one green square above both red circles'') and its designation in the visual scene. Our distinctive approach combines: a) lazy few-shot learning to relate open-class words like \texttt{green} and \texttt{above} to their visual percepts; and b) symbolic reasoning with closed-class word categories like quantifiers and negation. We use this combination to estimate new training examples for grounding symbols that occur {\em within} a noun phrase but aren't designated by that noun phrase (e.g, \texttt{red} in the above example), thereby potentially gaining data efficiency. We evaluate the approach in a visual reference resolution task, in which the learner starts out unaware of concepts that are part of the domain model and how they relate to visual percepts. Xin Du Title: On the Role of Representation Learning for Treatment Effect Estimation Abstract: Learning causal effects from observational data greatly benefits a variety of domains such as health care, education, and sociology. For instance, one could estimate the impact of a new drug on specific individuals to assist clinical planning and improve the survival rate. In this work, we focus on studying the problem of estimating the Conditional Average Treatment Effect (CATE) from observational data. The challenges for this problem are two-fold: on the one hand, we have to derive a causal estimator to estimate the causal quantity from observational data, in the presence of confounding bias; on the other hand, we have to deal with the identification of the CATE when the distributions of covariates over the treatment group units and the control units are imbalanced. To overcome these challenges, we propose a neural network framework called Adversarial Balancing-based representation learning for Causal Effect Inference (ABCEI), based on recent advances in representation learning. To ensure the identification of the CATE, ABCEI uses adversarial learning to balance the distributions of covariates in the treatment and the control group in the latent representation space, without any assumptions on the form of the treatment selection/assignment function. Panel #3 Neuro-Symbolic Reasoning and Learning, Natural Language and Human Interaction This article was published on 2024-11-22