profile photo

Chris Agia

I am a 4th-year PhD student in Computer Science at Stanford University advised by Jeannette Bohg and Marco Pavone. I'm a member of the Stanford Artificial Intelligence Laboratory, the Center for Research on Foundation Models, and work jointly with the Interactive Perception and Robot Learning Lab and the Autonomous Systems Lab.

My research is structured around the following core thrusts:

  1. Endowing robots with the ability to plan for a wide range of tasks;
  2. Increasing the reliability of AI-driven robots during deployment;
  3. Establishing connections between model behavior and data;
  4. Facilitating effective interaction between robots and humans.

I set aside weekly time for research discussion and mentorship. If you feel this could be of benefit, please book an open slot here.

I'm also a Clear Ventures Deeptech Fellow interested in startups. Beyond research, I enjoy sports (Stanford Club Men's Soccer) and music!

I've had the pleasure of working with Jiajun Wu in my first-year at Stanford. Before that, I graduated from the Engineering Science, Robotics program at the University of Toronto, where I was advised by Florian Shkurti at RVL and the Vector Institute. I've been fortunate to collaborate with Liam Paull at MILA, David Meger and Gregory Dudek at McGill, Goldie Nejat at the University of Toronto, and colleagues from Microsoft Research and Meta AI Research. In industry, I had the opportunity to work on multi-agent reinforcement learning in mixed reality environments at Microsoft, perception and localization for self-driving vehicles at Huawei Noah's Ark Lab with Bingbing Liu, and language-agnostic ABI simulators at Google Cloud.

cagia[at]cs.stanford.edu  /  CV  /  LinkedIn  /  GitHub  /  Scholar

..Education
clean-usnob Ph.D in Computer Science
Department of Computer Science, Stanford University
Sep 2021 | Stanford, CA

School of Engineering Fellowship
2 x TA for Principles of Robot Autonomy 1
Upcoming TA for Principles of Robot Autonomy 2
clean-usnob B.A.Sc in Engineering Science, Robotics
Faculty of Applied Science and Engineering, University of Toronto
Sep 2016 - May 2021 | Toronto, ON

President's Scholarship Program
NSERC Undergraduate Research Award
Dean's Honour List - 2018-2021

..Research
Preprints
Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics
Yixuan Huang, Christopher Agia, Jimmy Wu, Tucker Hermans, Jeannette Bohg
In submission, 2024
arXiv / Project Site

How can we plan to solve unseen, long-horizon tasks from a single, partial-view point cloud of the scene, and how can we do so without access to long-horizon training data? Points2Plans leverages transformer-based relational dynamics to learn the symbolic and geometric effects of robot skills, then compose the skills at test time to generate a long-horizon symbolic and geometric plan.

Journal Papers
clean-usnob Text2Motion: From Natural Language Instructions to Feasible Plans
Kevin Lin*, Christopher Agia*, Toki Migimatsu, Marco Pavone, Jeannette Bohg
Special Issue: Large Language Models in Robotics, Autonomous Robots (AR), 2023
arXiv / Project Site / Journal

Pretrained large language models can be readily used to obtain high-level robot plans from natural lanugage instructions, but should these plans be executed without verifying them on the geometric-level? We propose Text2Motion, a language-based planner that tests if LLM-generated plans (a) satisfy user instructions and (b) are geometric feasibility prior to executing them.

clean-usnob Semantic Anomaly Detection with Large Language Models
Amine Elhafsi, Rohan Sinha, Christopher Agia, Edward Schmerling, Issa A. D. Nesnas, Marco Pavone
Special Issue: Large Language Models in Robotics, Autonomous Robots (AR), 2023
arXiv / Project Site / Journal

System-level failures are not due to failures of any individual component of the autonomy stack but system-level deficiencies in semantic reasoning. Such edge cases, dubbed semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. We introduce a runtime monitor based on large language models to recognize failure-inducing semantic anomalies.

clean-usnob Lightweight Semantic-aided Localization with Spinning LiDAR Sensor
Yuan Ren, Bingbing Liu, Ran Cheng, Christopher Agia
[Patented]. IEEE Transactions on Intelligent Vehicles (T-IV), 2021
PDF / IEEExplore

How can semantic information be leveraged to improve localization accuracy in changing environments? We present a robust LiDAR-based localization algorithm that exploits both semantic and geometric properties of the scene with an adaptive fusion strategy.

A Sim-to-Real Pipeline for Deep Reinforcement Learning for Autonomous Robot Navigation in Cluttered Rough Terrain
Han Hu*, Kaicheng Zhang*, Aaron Hao Tan, Michael Ruan, Christopher Agia, Goldie Nejat
IEEE Robotics and Automation Letters (RA-L) at IROS, 2021 | Prague, CZ
PDF / Video / IEEExplore

Deep Reinforcement Learning is effective for learning robot navigation policies in rough terrain and cluttered simulated environments. In this work, we introduce a series of techniques that are applied in the policy learning phase to enhance transferability to real-world domains.

Conference Papers
clean-usnob Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Christopher Agia, Rohan Sinha, Jingyun Yang, Zi-ang Cao, Rika Antonova, Marco Pavone, Jeannette Bohg
To appear at the Conference on Robot Learning (CoRL), 2024
arXiv / Project Site

Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. In this work, we present Sentinel, a runtime monitor that detects unknown failures (requiring no data of failures) of generative robot policies at deployment time.

clean-usnob clean-usnob Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction
Jakob Thumm, Christopher Agia, Marco Pavone, Matthias Althoff
To appear at the Conference on Robot Learning (CoRL), 2024
arXiv / Project Site / YouTube

How can we integrate human preferences into robot plans in a zero-shot manner, i.e., without requiring tens of thousands of data points of human feedback? We propose Text2Interaction, a planning framework that invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safe controller.

Real-Time Anomaly Detection and Reactive Planning with Large Language Models
Rohan Sinha, Amine Elhafsi, Christopher Agia, Edward Schmerling, Marco Pavone
Robotics: Science and Systems (RSS), 2024 | Delft, Netherlands
Outstanding Paper Award
arXiv / Project Site / NVIDIA Media / TechXplore

How can we mitigate the computational expense and latency of LLMs for real-time anomaly detection and reactive planning? We propose a two-stage reasoning framework, whereby fast a LLM embedding model flags potential observational anomalies while a slower generative LLM assesses the safety-criticality of flagged anomalies and selects a safety-preserving fallback plan.

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky*, Karl Pertsch*, ..., Christopher Agia, ..., Sergey Levine, Chelsea Finn
Robotics: Science and Systems (RSS), 2024 | Delft, Netherlands
arXiv / Project Site / Documentation

We introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories (or 350 hours of interaction data), collected across 564 scenes and 86 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability.

Modeling Considerations for Developing Deep Space Autonomous Spacecraft and Simulators
Christopher Agia, Guillem Casadesus Vila, Saptarshi Bandyopadhyay, David S. Bayard, Kar-Ming Cheung, Charles H. Lee, Eric Wood, Ian Aenishanslin, Steven Ardito, Lorraine Fesq, Marco Pavone, Issa A. D. Nesnas
IEEE Aerospace Conference (AeroConf), 2024 | Montana, US
PDF / arXiv / Project Site / YouTube

Future space exploration missions to unknown worlds will require robust reasoning, planning, and decision-making capabilities, enabled by the right choice of onboard models. In this work, we aim to understand what onboard models a spacecraft needs for fully autonomous space exploration.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration
IEEE International Conference on Robotics and Automation (ICRA), 2024 | Yokohama, Japan
Best Paper Award
arXiv / Project Site / Blogpost / Code

Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train “generalist” X-robot policy that can be adapted efficiently to new robots, tasks, and environments?

STAP: Sequencing Task-Agnostic Policies
Christopher Agia*, Toki Migimatsu*, Jiajun Wu, Jeannette Bohg
IEEE International Conference on Robotics and Automation (ICRA), 2023 | London, UK
PDF / arXiv / Project Site / Code

Solving sequential manipulation tasks requires coordinating geometric dependencies between actions. We develop a scalable framework for training skills independently, and then combine the skills at planning time to solve unseen long-horizon tasks. Planning is formulated as a maximization problem over the expected success of the skill sequence, which we demonstrate is well-approximated by the product of Q-values.

clean-usnob Taskography: Evaluating Robot Task Planning over Large 3D Scene Graphs
Christopher Agia*, Krishna Murthy Jatavallabhula*, Mohamed Khodeir, Ondrej Miksik, Mustafa Mukadam, Vibhav Vineet, Liam Paull, Florian Shkurti
Conference on Robot Learning (CoRL), 2021 | London, UK
PDF / Poster / arXiv / Project Site / Code

3D Scene Graphs (3DSGs) are informative abstractions of our world that unify symbolic, semantic, and metric scene representations. We present a benchmark for robot task planning over large 3DSGs and evaluate classical and learning-based planners; showing that real-time planning requires 3DSGs and planners to be jointly adapted to better exploit 3DSG hierarchies.

clean-usnob Latent Attention Augmentation for Robust Autonomous Driving Policies
Ran Cheng*, Christopher Agia*, David Meger, Florian Shkurti, Gregory Dudek
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021 | Prague, CZ
PDF / IEEExplore

Pretraining visual representations for robotic reinforcement learning can improve sample efficiency and policy performance. In this paper, we take an alternate approach and propose to augment the state embeddings of a self-driving agent with attention in the latent space, accelerating the convergence of Actor-Critic algorithms.

clean-usnob S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds
Ran Cheng*, Christopher Agia*, Yuan Ren, Bingbing Liu
Conference on Robot Learning (CoRL), 2020 | Cambridge, US
PDF / Talk / Video / arXiv

Small-scale semantic reconstruction methods have had little success in large outdoor scenes as a result of exponential increases in sparsity, and a computationally expensive design. We propose a sparse convolutional network architecture based on the Minkowski Engine, achieving state-of-the-art results for semantic scene completion in 2D/3D space from LiDAR point clouds.

clean-usnob Depth Prediction for Monocular Direct Visual Odometry
Ran Cheng, Christopher Agia, David Meger, Gregory Dudek
Conference on Computer and Robotic Vision (CRV), 2020 | Ottawa, CA
PDF / Talk / Poster / IEEExplore

Direct methods are able to track motion with considerable long-term accuracy. However, scale inconsistent estimates arise from random or unit depth initialization. We integrate dense depth prediction with the Direct Sparse Odometry system to accelerate convergence in the windowed bundle-adjustment and promote estimates with consistent scale.

Theses
clean-usnob Contextual Graph Representations for Task-driven 3D Perception and Planning
Christopher Agia, Florian Shkurti
Division of Engineering Science, University of Toronto, 2020 | Toronto, CA
PDF

We evaluate the suitability of existing simulators for research at the intersection of task planning and 3D scene graphs and construct a benchmark for comparison of symbolic planners. Furthermore, we explore the use of Graph Neural Networks to harness invariances in the relational structure of planning domains and learn representations that afford faster planning.

Patents

Several components of my industry research projects were patented alongside submitting to conference / journal venues.

clean-usnob Systems and Methods for Generating a Road Surface Semantic Segmentation Map from a Sequence of Point Clouds
Christopher Agia, Ran Cheng, Yuan Ren, Bingbing Liu
Application No. 17/676,131. U.S. Patent and Trademark Office, 2022

Relates to processing point clouds for autonomous driving of a vehicle. More specifically, relates to processing a sequence of point clouds to generate a birds-eye-view (BEV) image of an environment of the vehicle which includes pixels associated with road surface labels.

clean-usnob Methods and Systems for Semantic Scene Completion for Sparse 3D Data
Ran Cheng*, Christopher Agia*, Yuan Ren, Bingbing Liu
Application No. 17/492,261. U.S. Patent and Trademark Office, 2022

Relates to methods and systems for generating semantically completed 3D data from sparse 3D data such as point clouds.

clean-usnob Visiting Research Scholar
NASA Jet Propulsion Laboratory, Mobility and Robotics Systems
Jun 2023 - Sep 2023 | Pasadena, California

Research on deep space robotic autonomy.

clean-usnob Software Engineering Intern
Microsoft, Mixed Reality and Robotics
May 2021 - Aug 2021 | Redmond, Washington

Research & development at the intersection of mixed reality, artificial intelligence, and robotics. Created a process unlocking the training and HL2 deployment of multi-agent reinforcement learning scenarios in shared digital spatial-semantic representations with Scene Understanding.

clean-usnob Robotics & ML Researcher
Vector Institute, Robot Vision and Learning Lab | Advised by Prof. Florian Shkurti
Department of Computer Science, University of Toronto
May 2020 - Apr 2021 | Toronto, ON

Research in artificial intelligence and robotics. Topics include task-driven perception via learning map representations for downstream control tasks with graph neural networks, and visual state abstraction for Deep Reinforcement Learning based self-driving control.

clean-usnob Software Engineering Intern
Google, Cloud
May 2020 - Aug 2020 | San Francisco, CA

Designed a Proxy-Wasm ABI Test Harness and Simulator that supports both low-level and high-level mocking of interactions between a Proxy-Wasm extension and a simulated host environment, allowing developers to test plugins in a safe and controlled environment.

clean-usnob Robotics & ML Research Intern
Mobile Robotics Lab | Supervised by Prof. David Meger, Prof. Gregory Dudek
School of Computer Science, McGill University
Jan 2020 - May 2020 | Toronto, ON

Machine learning and robotics research on the topics of Visual SLAM and Deep Reinforcement Learning in collaboration with the Mobile Robotics Lab.

clean-usnob Deep Learning Research Intern
Huawei Technologies, Noah's Ark Research Lab
May 2019 - May 2020 | Toronto, ON

Research and development for autonomous systems (self-driving technology). Research focus and related topics: 2D/3D semantic scene completion, LiDAR-based segmentation, road estimation, visual odometry, depth estimation, and learning-based localization.

clean-usnob Autonomy Engineer - Object Detection
aUToronto, Object Detection Team | SAE/GM Autodrive Challenge
Aug 2019 - Apr 2020 | Toronto, ON

Developed a state-of-the-art deep learning pipeline for real-time 3D detection and tracking of vehicles, pedestrians and cyclists from multiple sensor input.

clean-usnob Robotics Research Intern
Autonomous Systems and Biomechatronics Lab | Advised by Prof. Goldie Nejat
Department of Mechanical and Industrial Engineering, University of Toronto
May 2018 - Aug 2018 | Toronto, ON

Search and rescue robotics - research on the topics of Deep Reinforcement Learning and Transfer Learning for autonomous robot navigation in rough and hazardous terrain. ROS (Robot Operating System) software development for various mobile robots.

clean-usnob Software Engineering Intern
General Electric, Grid Solutions
May 2017 - Aug 2017 | Markham, ON

Created customer-end software tools used to accelerate the transition/setup process of new protection and control systems upon upgrade. Designed the current Install-Base and Firmware Revision History databases used by GE internal service teams.

“Give the pupils something to do, not something to learn;
and the doing is of such a nature as to demand thinking; learning naturally results.”
- John Dewey

I've worked on a number of exciting software, machine learning, and deep learning projects. Their applications cover a range of industries: Robotics, Graphics, Health Care, Finance, Transportation, Logistics, to name a few!

The majority of these projects were accomplished in teams! The results also reflect the efforts of the many talented individuals I've had the opportunity to collaborate with and learn from over the years.

Links to the source code are embedded in the project titles.

clean-usnob Instruction Prediction as a Constructive Task for Imitation and Adaptation
Stanford University, CS330 Deep Multi-task and Meta Learning

Can natural language substitute as abstract planning medium for solving long-horizon tasks when obtaining additional demonstrations is prohibitively expensive? We show: (a) policies trained to predict actions and instructions (multi-task) improves performance by 30%; (b) policies can be adapted to novel tasks (meta learning) solely from language instructions. Project report / Poster

clean-usnob Controllable and Image-Free StyleGAN Retraining for Expansive Domain Transfer
Stanford University, CS348i Computer Graphics in the Era of AI

StyleGAN has a remarkable capacity to generate photrealistic images in a controllable manner thanks to its disentangled latent space. However, such architectures can be difficult and costly to train, and domain adaptation methods tend to forego sample diversity and image quality. We prescribe a set of ammendments to StyleGAN-NADA which improve on the pitfalls of text-driven (image-free) domain adaptation of pretrained StyleGANs. Project report / Presentation

clean-usnob Bayesian Temporal Convolutional Networks
University of Toronto, CSC413 Neural Networks and Deep Learning

In this project, we explore the application of variational inference via Bayes by Backprop to the increasingly popular temporal convolutional networks (TCNs) architecture for time series predictive forecasting. Comparisons are made to the effective state-of-the-art in a series of ablation studies. Project report

clean-usnob SfMLearner on Mars
University of Toronto, ROB501 Computer Vision for Robotics

Adapted the SfMLearner framework from Unsupervised Learning of Depth and Ego-Motion from Video to The Canadian Planetary Emulation Terrain Energy-Aware Rover Navigation Dataset (dataset webpage), and evaluated its feasibility for tracking in low-textured martian-like environments from monochrome image sequences. Project report

clean-usnob 3D Shape Reconstruction
University of Toronto, APS360 Applied Fundamentals of Machine Learning

An empirical study of various 3D Convolutional Neural Network architectures for predicting the full voxel geometry of objects given their partial signed distance field encodings (from the ShapeNetCore database). Project report

clean-usnob Autonomous Packing Robot
University of Toronto, AER201 Robot Competition

Designed, built, and programmed a robot that systematically sorts and packs up to 50 pills/minute to assist those suffering from dimentia. An efficient user interface was created to allow a user to input packing instructions. Team placed 3rd/50. Detailed project documentation / Youtube video

clean-usnob Automated Robotic Garbage Collection
Canadian Engineering Competition 2019, Programming Challenge

Based on the robotics Sense-Plan-Act Paradigm, we created an AI program to handle high-level (path planning, goal setting) and low-level (path following, object avoidance, action execution) tasks for an automated waste collection system to be used in fast food restaurants. 4th place Canada. Presentation

clean-usnob Hospital Triage System
Ontario Engineering Competition 2019, Programming Challenge

Developed a machine learning software solution to predict the triage score of emergency patients, allocate available resources to patients, and track key hospital performance metrics to reduce emergency wait times. 1st place Ontario. Presentation / Team photo

clean-usnob Warehouse Logistics Planning
UTEK Engineering Competition 2019, Programming Challenge

Created a logistics planning algorithm that assigned mobile robots to efficiently retrieve warehouse packages. Our solution combined traditional algorithms such as A* Path Planning with heuristic-based clustering. 1st place UofT. Presentation / Team photo

clean-usnob Smart Intersection - Yonge and Dundas
University of Toronto, MIE438 Robot Design

We propose a traffic intersection model which uses computer vision to estimate lane congestion and manage traffic flow accordingly. A mockup of our proposal was fabricated to display the behaviour and features of our system. Detailed report / YouTube video

clean-usnob Insurance Fraud Detection
CIBC Data Studio Hackathon, Programming Challenge

Developed an unsupervised learning system utilizing Gaussian Mixture Models to identify insurance claim anomalies for CIBC.

clean-usnob Solar Array Simulation
Blue Sky Solar Racing, Strategic Planning Team

Created a simulator that ranks the performance of any solar array CAD model by predicting the instantaneous energy generated under various daylight conditions.

clean-usnob Gomoku AI Engine
University of Toronto, Class Competition

Developed an AI program capable of playing Gomoku against both human and virtual opponents. The software's decision making process is determined by experimentally tuned heuristics which were designed to emulate that of a human opponent.

clean-usnob Word Pairing - Semantic Similarity
University of Toronto, Class Competition

Programmed an intelligent system that approximates the semantic similarity between any two pair of words by parsing data from large novels and computing cosine similarities and Euclidean spaces between vector descriptors of each word.