IPAB Workshop - 30/01/2025

Speaker: Peize Li 

Title: 6-DoF Grasp Detection with Visual Foundation Model

Abstract: Robotic grasping remains a critical task for enabling effective robot manipulation in complex environments. As robots undertake diverse tasks, there have been increasing needs for grasping models that generalize across objects and works with flexible user inputs. Traditional methods relying on 3D structural data lack the understanding of overall scene context and flexibility in cluttered or unpredictable settings. Recent advances in deep learning models, including the application of Visual Foundation Models (VFMs) and transformer-based architectures, have introduced new possibilities in grasp detection. This research explores the integration of VFMs in the grasp detection task, developing the Graspformer framework to improve robotic manipulation tasks through enhanced visual and spatial understanding. 

 

Speaker: Eric Liu

Title: Optimizing Information Dynamics: A Study on Network Topology and Graph Reduction

Abstract: In network science, analysing the pathways and dynamics of information diffusion is crucial, particularly in the context of social media where interactions are complex and multifaceted. This project focuses on a robust model to identify key nodes and edges in information diffusion by integrating various weights over distinct social media behaviours along with the underlying network topology. In order for processing large network dataset while keeping its connection property, the project incorporates graph reduction techniques to manage the computational complexity inherent in large-scale network analyses. This approach ensures that the diffusion model remains efficient and scalable. The ultimate goal is to enhance the predictive accuracy of information spread, providing theoretical and practical insights in designing more effective communication strategies across digital platforms.

 

Speaker: Felix Ingham

Title: Vision vs. Audio Architectures for Passive Sonar Datasets: A Comparative Analysis of Different Training Strategies

Abstract: The underwater sonar environment is inherently complex, presenting unique challenges due to the need for sophisticated feature extraction. While traditional approaches have relied heavily on domain expertise for feature engineering, the success of transformer and CNN architectures in other domains provides opportunities to explore data-driven methods for the sonar domain.

In this talk, I will examine the performance of vision-based architectures applied to Mel-frequency cepstral coefficients (MFCCS) and audio-based models applied to raw time series. The discussion will focus on the impact of pre-training versus training from scratch, the influence of pre-training datasets, and the effectiveness of various training strategies. I will also briefly highlight ongoing work exploring parameter-efficient fine-tuning techniques, such as LoRA, and their potential benefits for adapting models to the sonar domain.