The Curious Robot: Learning Visual Representations via Physical Interactions

Lerrel Pinto,Dhiraj Gandhi,Yuanfeng Han,Yong‐Lae Park,A. Gupta

Published 2016 in European Conference on Computer Vision

ABSTRACT

What is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation learning does not require millions of semantic labels. We argue that biological agents use physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web). For example, babies push objects, poke them, put them in their mouth and throw them to learn representations. Towards this goal, we build one of the first systems on a Baxter platform that pushes, pokes, grasps and observes objects in a tabletop environment. It uses four different types of physical interactions to collect more than 130K datapoints, with each datapoint providing supervision to a shared ConvNet architecture allowing us to learn visual representations. We show the quality of learned representations by observing neuron activations and performing nearest neighbor retrieval on this learned representation. Quantitatively, we evaluate our learned ConvNet on image classification tasks and show improvements compared to learning without external data. Finally, on the task of instance retrieval, our network outperforms the ImageNet network on recall@1 by 3%

PUBLICATION RECORD

Publication year
2016
Venue
European Conference on Computer Vision
Publication date
2016-04-05
Fields of study
Computer Science
Identifiers
DOI 10.1007/978-3-319-46475-6_1 arXiv 1604.01360
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

GENERATIVE ADVERSARIAL NETS
2018cited by this paper
Generative Image Modeling Using Style and Structure Adversarial Networks
2016cited by this paper
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
2016cited by this paper
Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards
2016cited by this paper
A convex polynomial force-motion model for planar sliding: Identification and application
2016cited by this paper
Deep spatial autoencoders for visuomotor learning
2015cited by this paper
Learning to See by Moving
2015cited by this paper
Unsupervised Visual Representation Learning by Context Prediction
2015cited by this paper
Adapting Deep Visuomotor Representations with Weak Pairwise Constraints
2015cited by this paper
Learning Image Representations Tied to Ego-Motion
2015cited by this paper
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
2015cited by this paper
Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments
2015cited by this paper
Dense Optical Flow Prediction from a Static Image
2015cited by this paper
Unsupervised Learning of Visual Representations using Videos
2015cited by this paper
Learning contact-rich manipulation skills with guided policy search
2015cited by this paper
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
2015cited by this paper
Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours
2015influential reference
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
2014cited by this paper
A data-driven statistical framework for post-grasp manipulation
2014cited by this paper
Designing deep networks for surface normal estimation
2014cited by this paper
3D ShapeNets: A deep representation for volumetric shapes
2014cited by this paper
Real-time grasp detection using convolutional neural networks
2014cited by this paper
Auto-Encoding Variational Bayes
2013cited by this paper
Data-Driven Grasp Synthesis—A Survey
2013cited by this paper
Deep learning for detecting robotic grasps
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
A Framework for Push-Grasping in Clutter
2011cited by this paper
A large-scale hierarchical multi-view RGB-D object dataset
2011cited by this paper
Object identification with tactile sensors using bag-of-features
2009cited by this paper
Exoskeletal Force-Sensing End-Effectors With Embedded Optical Fiber-Bragg-Grating Sensors
2009cited by this paper
Deep learning from temporal coherence in video
2009cited by this paper
Deep Boltzmann Machines
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009influential reference
Greedy Layer-Wise Training of Deep Networks
2006cited by this paper
Using Experience for Assessing Grasp Reliability
2004cited by this paper
Active vision
2004cited by this paper
Robotic grasping and contact: a review
2000cited by this paper
Boltzmann machines
1998cited by this paper
Robot Grasp Synthesis Algorithms: A Survey
1996cited by this paper
Stable Pushing: Mechanics, Controllability, and Planning
1995cited by this paper
Automatic planning of robot pushing operations
1993cited by this paper
Object Handling Using Two Arms Without Grasping
1993cited by this paper
Reducing uncertainty of objects by robot pushing
1990cited by this paper
Task-level planning of pick-and-place robot motions
1989cited by this paper
Constructing Force- Closure Grasps
1988cited by this paper
Planning Collision- Free Motions for Pick-and-Place Operations
1983cited by this paper
MOVEMENT-PRODUCED STIMULATION IN THE DEVELOPMENT OF VISUALLY GUIDED BEHAVIOR.
1963cited by this paper
2009 Ieee 8th International Conference on Development and Learning
year unknowncited by this paper

CITED BY

CAVER: Curious Audiovisual Exploring Robot
2025cites this paper
Learning for embodiment and embodiment for learning
2025cites this paper
Tactile Robotics: An Outlook
2025cites this paper
Unsupervised Discovery of Objects Physical Properties Through Maximum Entropy Reinforcement Learning
2025cites this paper
Push, See, Predict: Emergent Perception Through Intrinsically Motivated Play
2025cites this paper
PUGS: Zero-Shot Physical Understanding with Gaussian Splatting
2025cites this paper
Object Pose Estimation through Dexterous Touch
2025cites this paper
Embodied Intelligence: A Synergy of Morphology, Action, Perception and Learning
2025cites this paper
Analyzing Visual Attention in Virtual Crime Scene Investigations Using Eye-Tracking and VR: Insights for Cognitive Modeling
2025cites this paper
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
2024cites this paper
Learning dexterity from human hand motion in internet videos
2024cites this paper
Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
2024cites this paper
Investigating Representations for Vision And Touch in Contact Rich Robot Scooping Tasks
2024cites this paper
Aligning Cyber Space With Physical World: A Comprehensive Survey on Embodied AI
2024influential citation
Planning and learning to perceive in partially unknown environments
2024cites this paper
Physical Property Understanding from Language-Embedded Feature Fields
2024cites this paper
EdgeOL: Efficient in-situ Online Learning on Edge Devices
2024cites this paper
Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play
2023cites this paper
Planning for Learning Object Properties
2023cites this paper
MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation
2023cites this paper
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
2023cites this paper
MimicTouch: Learning Human's Control Strategy with Multi-Modal Tactile Feedback
2023cites this paper
Learning to Act for Perceiving in Partially Unknown Environments
2023cites this paper
ALP: Action-Aware Embodied Learning for Perception
2023cites this paper
Combining Vision and Tactile Sensation for Video Prediction
2023cites this paper
Inferring Fluid Dynamics via Inverse Rendering
2023cites this paper
ENTL: Embodied Navigation Trajectory Learner
2023cites this paper
A Noise Rate Estimation Method for Image Classification with Label Noise
2023cites this paper
VideoDex: Learning Dexterity from Internet Videos
2022cites this paper
Taxim: An Example-based Simulation Model for GelSight Tactile Sensors and its Sim-to-Real Applications
2022cites this paper
Beyond Object Recognition: A New Benchmark towards Object Concept Learning
2022cites this paper
Real-World Robot Learning with Masked Visual Pre-training
2022cites this paper
A Review on Machine Learning Styles in Computer Vision—Techniques and Future Directions
2022cites this paper
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
2022cites this paper
On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies
2022cites this paper
Learning Algorithm in Two-Stage Selective Prediction
2022cites this paper
Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
2022cites this paper
Heuristic grasping of convex objects using 3D imaging and tactile sensing in uncalibrated grasping scenarios
2022cites this paper
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
2022cites this paper
Action-Conditioned Contrastive Policy Pretraining
2022cites this paper
Action space noise optimization as exploration in deterministic policy gradient for locomotion tasks
2022cites this paper
EspialCog: General, Efficient and Robust Mobile User Implicit Authentication in Noisy Environment
2022cites this paper
IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes
2021cites this paper
Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning
2021cites this paper
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning
2021cites this paper
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
2021cites this paper
Enabling On-Device Self-Supervised Contrastive Learning with Selective Data Contrast
2021cites this paper
Editorial: ViTac: Integrating Vision and Touch for Multimodal and Cross-Modal Perception
2021cites this paper
Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery
2021cites this paper
Curious Representation Learning for Embodied Intelligence
2021cites this paper
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale
2021cites this paper
Single Image Depth Estimation: An Overview
2021cites this paper
SiT: Self-supervised vIsion Transformer
2021cites this paper
Reinforcement Learning with Prototypical Representations
2021cites this paper
Deep Learning in Robotics: Survey on Model Structures and Training Strategies
2021cites this paper
Data-Driven Robotic Grasping in the Wild
2021cites this paper
Effects of Motion-Relevant Knowledge From Unlabeled Video to Human–Object Interaction Detection
2021cites this paper
Firefighting robot with deep learning and machine vision
2021cites this paper
ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations
2021cites this paper
Electronic Skins for Intelligent Soft Robots
2020cites this paper
Electronic skins and machine learning for intelligent soft robots
2020cites this paper
Learning to See before Learning to Act: Visual Pre-training for Manipulation
2020cites this paper
Multi-Task Reinforcement Learning with Soft Modularization
2020cites this paper
State-Only Imitation Learning for Dexterous Manipulation
2020cites this paper
E NABLING E FFICIENT O N -D EVICE S ELF - SUPERVISED C ONTRASTIVE L EARNING BY D ATA S ELECTION
2020cites this paper
Evaluating Learned State Representations for Atari
2020cites this paper
Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency
2020cites this paper
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation
2020cites this paper
Swoosh! Rattle! Thump! - Actions that Sound
2020cites this paper
3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators
2020cites this paper
Self-supervised representation learning by predicting visual permutations
2020cites this paper
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions
2020cites this paper
Rotated Ring, Radial and Depth Wise Separable Radial Convolutions
2020cites this paper
Rigid-Soft Interactive Learning for Robust Grasping
2020cites this paper
Embodied tactile perception and learning
2020cites this paper
NDSGD: A Practical Method to Improve Robustness of Deep Learning Model on Noisy Dataset
2020cites this paper
MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning
2020cites this paper
Watching the World Go By: Representation Learning from Unlabeled Videos
2020cites this paper
Multimodal Representation Learning for Robotic Cross-Modality Policy Transfer
2020cites this paper
Learning from Noisy Labels with Noise Modeling Network
2020cites this paper
Can multisensory training aid visual learning? A computational investigation.
2019cites this paper
J un 2 01 8 State Representation Learning for Control : An Overview
2019cites this paper
Invariant Feature Mappings for Generalizing Affordance Understanding Using Regularized Metric Learning
2019cites this paper
Intelligent Autonomous Things on the Battlefield
2019cites this paper
Beyond Supervised Learning: A Computer Vision Perspective
2019cites this paper
Multigrid Predictive Filter Flow for Unsupervised Learning on Videos
2019cites this paper
Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces
2019cites this paper
Human Visual Understanding for Cognition and Manipulation - A primer for the roboticist
2019cites this paper
DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions
2019cites this paper
ViTa-SLAM: A Bio-inspired Visuo-Tactile SLAM for Navigation while Interacting with Aliased Environments
2019cites this paper
PyRobot: An Open-source Robotics Framework for Research and Benchmarking
2019cites this paper
Towards Adversarial Training for Mobile Robots
2019cites this paper
A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms
2019cites this paper
Estimating Mass Distribution of Articulated Objects through Physical Interaction
2019cites this paper
Intelligent Control Navigation Emerging on Multiple Mobile Robots Applying Social Wound Treatment
2019cites this paper
Neural Re-Simulation for Generating Bounces in Single Images
2019cites this paper
Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
2019cites this paper
Deep unsupervised state representation learning with robotic priors: a robustness analysis
2019cites this paper
Self-supervised Representation Learning Using 360° Data
2019cites this paper
A Weakly Supervised Multi-task Ranking Framework for Actor–Action Semantic Segmentation
2019cites this paper