Visual Learning and Reasoning for Robotics

Full-day workshop at RSS 2021

Virtual Conference

July 13, 2021, Pacific Time (PT)

Welcome! This workshop includes three live events:
  • Invited Talks (25 min talk + 5 min Q&A)
  • Spotlight Talks (4 min talk + 2 min Q&A)
  • Panel Discussion (60 min)
To attend the workshop, please use the pheedloop platform provided by RSS 2021.

For the panel discussion, you can also post questions at this link.


Time (PT) Invited Speaker Title
10:15 - 10:30 -
Opening Remarks
| Video |
10:30 - 11:00

Andrew Davison
Imperial College London
Representations for Spatial AI
| Video |
11:00 - 11:30

Raquel Urtasun
University of Toronto / Waabi
Interpretable Neural Motion Planning
| Video |
11:30 - 12:00 Spotlight Talks
ZePHyR: Zero-shot Pose Hypothesis Rating
Brian Okorn (Carnegie Mellon University); Qiao Gu (Carnegie Mellon University)*; Martial Hebert (Carnegie Mellon University); David Held (Carnegie Mellon University)
| PDF | Video |

ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer
Eslam Bakr (Valeo)*; Ahmad ElSallab (Valeo Deep Learning Research)
| PDF | Video |

Lifelong Interactive 3D Object Recognition for Real-Time Robotic Manipulation
Hamed Ayoobi (University of Groningen)*; S. Hamidreza Kasaei (University of Groningen); Ming Cao (University of Groningen); Rineke Verbrugge (University of Groningen); Bart Verheij (University of Groningen)
| PDF | Video |

Predicting Diverse and Plausible State Foresight For Robotic Pushing Tasks
Lingzhi Zhang (University of Pennsylvania)*; Shenghao Zhou (University of Pennsylvania); Jianbo Shi (University of Pennsylvania)
| PDF | Video |

Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos
Haoyu Xiong (University of Toronto, Vector Institute)*; Quanzhou Li (University of Toronto, Vector Institute); Yun-Chun Chen (University of Toronto, Vector Institute); Homanga Bharadhwaj (University of Toronto, Vector Institute); Samarth Sinha (University of Toronto, Vector Institute); Animesh Garg (University of Toronto, Vector Institute, NVIDIA)
| PDF | Video |

12:00 - 12:30

Abhinav Gupta
CMU / Facebook AI Research
No RL, No Simulation
| Video |
12:30 - 1:00

Shuran Song
Columbia University
Unfolding the Unseen: Deformable Cloth Perception and Manipulation
| Video |
1:00 - 2:30 - Break
2:30 - 3:00

Saurabh Gupta
Learning to Move and Moving to Learn
| Video |
3:00 - 3:30

Sergey Levine
UC Berkeley / Google
Scalable Robotic Learning
| Video |
3:30 - 4:00 Spotlight Talks
3D Neural Scene Representations for Visuomotor Control
Yunzhu Li (MIT)*; Shuang Li (MIT); Vincent Sitzmann (MIT); Pulkit Agrawal (MIT); Antonio Torralba (MIT)
| PDF | Video |

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
Nicklas A Hansen (UC San Diego)*; Hao Su (UC San Diego); Xiaolong Wang (UC San Diego)
| PDF | Video |

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers
Ruihan Yang (UC San Diego)*; Minghao Zhang (Tsinghua University); Nicklas A Hansen (UC San Diego); Huazhe Xu (UC Berkeley); Xiaolong Wang (UC San Diego)
| PDF | Video |

Interaction Prediction and Monte-Carlo Tree Search for Robot Manipulation in Clutter
Baichuan Huang (Rutgers University)*; Abdeslam Boularias (Rutgers University); Jingjin Yu (Rutgers University)
| PDF | Video |

A Simple Method for Complex In-Hand Manipulation
Tao Chen (MIT)*; Jie Xu (MIT); Pulkit Agrawal (MIT)
| PDF | Video |

4:00 - 5:00 Invited Speakers
Panel Discussion
| Video |


Visual perception is essential for achieving robot autonomy in the real world. To perform complex robot tasks in unknown environments, a robot needs to actively acquire knowledge through physical interactions and conduct sophisticated reasoning of the observed objects. This invites a series of research challenges in developing computational tools to close the perception-action loop. Given the recent advances in computer vision and deep learning, we look for new potential solutions for performing real-world robotic tasks in an effective and computationally efficient manner.

We focus on the two parallel themes in this workshop:

Call for Papers

We're inviting submissions! If you're interested in (remotely) presenting a spotlight talk, please submit a short paper (or extended abstract) to CMT. We suggest extended abstracts of 2 pages in the RSS format. A maximum of 4 pages will be considered. References will not count towards the page limit. The review process is double-blind. Significant overlap with work submitted to other venues is acceptable, but it must be explicitly stated at the time of submission.

Important Dates:


Kuan Fang
Stanford University

David Held

Yuke Zhu
UT Austin / NVIDIA

Dinesh Jayaraman
Univ. of Pennsylvania

Animesh Garg
Univ. of Toronto / NVIDIA

Lin Sun
Magic Leap

Yu Xiang

Greg Dudek
McGill / Samsung

Past Workshops


For further information, please contact us at rssvlrr [AT] gmail [DOT] com