Aligning 3D models to RGB-D images of cluttered scenes

Saurabh Gupta,Pablo Arbeláez,Ross B. Girshick,Jitendra Malik

Published 2015 in Computer Vision and Pattern Recognition

ABSTRACT

The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library. We approach this problem by first detecting and segmenting object instances in the scene and then using a convolutional neural network (CNN) to predict the pose of the object. This CNN is trained using pixel surface normals in images containing renderings of synthetic objects. When tested on real data, our method outperforms alternative algorithms trained on real data. We then use this coarse pose estimate along with the inferred pixel support to align a small number of prototypical models to the data, and place into the scene the model that fits best. We observe a 48% relative improvement in performance at the task of 3D detection over the current state-of-the-art [34], while being an order of magnitude faster.

PUBLICATION RECORD

Publication year
2015
Venue
Computer Vision and Pattern Recognition
Publication date
2015-06-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2015.7299105
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation
2015cited by this paper
FPM: Fine Pose Parts-Based Model with 3D CAD Models
2014cited by this paper
3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction
2014cited by this paper
Learning hierarchical sparse features for RGB-(D) object recognition
2014cited by this paper
3DNN: 3D Nearest Neighbor
2014cited by this paper
Simultaneous Detection and Segmentation
2014influential reference
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
2014cited by this paper
Scene Parsing with Object Instances and Occlusion Ordering
2014cited by this paper
Instance Segmentation of Indoor Scenes Using a Coverage Loss
2014cited by this paper
What Are You Talking About? Text-to-Image Coreference
2014cited by this paper
Unsupervised feature learning for 3D scene labeling
2014cited by this paper
Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models
2014cited by this paper
Viewpoints and keypoints
2014cited by this paper
Sliding Shapes for 3D Object Detection in Depth Images
2014influential reference
Scene understanding with complete scenes and structured representations
2014influential reference
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
2013cited by this paper
CPMC-3D-O2P: Semantic segmentation of RGB-D images using CPMC and Second Order Pooling
2013cited by this paper
Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
2013cited by this paper
Detailed 3D Representations for Object Recognition and Modeling
2013cited by this paper
Support Surface Prediction in Indoor Scenes
2013influential reference
SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
2013cited by this paper
Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
2013cited by this paper
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
2013cited by this paper
Semantic Segmentation with Second-Order Pooling
2012cited by this paper
An interactive approach to semantic modeling of indoor scenes with an RGBD camera
2012cited by this paper
Detection-based object labeling in 3D scenes
2012cited by this paper
RGB-(D) scene labeling: Features and algorithms
2012cited by this paper
Semantic segmentation using regions and parts
2012cited by this paper
Analyzing 3D Objects in Cluttered Images
2012cited by this paper
Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor
2012cited by this paper
Indoor Segmentation and Support Inference from RGBD Images
2012cited by this paper
Semantic Labeling of 3D Point Clouds for Indoor Scenes
2011cited by this paper
A category-level 3-D object dataset: Putting the Kinect to work
2011cited by this paper
A large-scale hierarchical multi-view RGB-D object dataset
2011cited by this paper
Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry
2010cited by this paper
Object Detection with Discriminatively Trained Part Based Models
2010cited by this paper
Histograms of oriented gradients for human detection
2005cited by this paper
Efficient variants of the ICP algorithm
2001cited by this paper
Human Face Detection in Visual Scenes
1995cited by this paper

CITED BY

Autoencoder Models for Point Cloud Environmental Synthesis from WiFi Channel State Information: A Preliminary Study
2025cites this paper
SDA-Net: A Global Feature Point Cloud Completion Network Based on Serialization and Dual Attention
2025cites this paper
Language-Embedded 6D Pose Estimation for Tool Manipulation
2025cites this paper
A Point Cloud Completion Network Integrating Mamba and Transformer Architectures
2025cites this paper
Structure preserving point cloud completion and classification with coarse-to-fine information
2025cites this paper
CompletionMamba: Taming State Space Model for Point Cloud Completion
2025cites this paper
3-D Dynamic Multitarget Detection Algorithm Based on Cross-View Feature Fusion
2024cites this paper
CIS2VR: CNN-based Indoor Scan to VR Environment Authoring Framework
2024cites this paper
Opening Articulated Structures in the Real World
2024cites this paper
Cross-Video Pedestrian Tracking Algorithm with a Coordinate Constraint
2024cites this paper
DBSCAN and Yolov5 based 3D object detection and its adaptation to a mobile platform
2024cites this paper
A Survey of Point Cloud Completion
2024cites this paper
AR-assisted assembly method based on instance segmentation
2024cites this paper
Image attention transformer network for indoor 3D object detection
2024influential citation
ScanToVR: An RGB-D to VR Reconstruction Framework
2023influential citation
An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
2023cites this paper
Facilitating cell segmentation with the projection-enhancement network
2023cites this paper
Model2Scene: Learning 3D Scene Representation via Contrastive Language-CAD Models Pre-training
2023cites this paper
Augmented Reality System Based on Real-Time Object 6D Pose Estimation
2023cites this paper
Development of Vision Guided Real-Time Trajectory Planning System for Autonomous Ground Refuelling Operations using Hybrid Dataset
2023cites this paper
The Projection-Enhancement Network (PEN)
2023cites this paper
Shape Anchor Guided Holistic Indoor Scene Understanding
2023cites this paper
Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects
2023cites this paper
Progress and perspectives of point cloud intelligence
2023cites this paper
SMA-Net: Deep learning-based identification and fitting of CAD models from point clouds
2022cites this paper
Towards 3D Scene Understanding by Referring Synthetic Models
2022cites this paper
LGP-Net: Local Geometry Preserving Network for Point Cloud Completion
2022cites this paper
Face attribute analysis from structured light: an end-to-end approach
2022cites this paper
Scene Reconstruction with Functional Objects for Robot Autonomy
2022cites this paper
Point Cloud Scene Completion With Joint Color and Semantic Estimation From Single RGB-D Image
2022cites this paper
Point Cloud Completion Network Applied to Vehicle Data
2022cites this paper
MLFT-Net: Point Cloud Completion Using Multi-Level Feature Transformer
2022cites this paper
Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
2022cites this paper
Multistage Adaptive Point-Growth Network for Dense Point Cloud Completion
2022cites this paper
KT-Net: Knowledge Transfer for Unpaired 3D Shape Completion
2021cites this paper
Unsupervised Learning of the 4D Audio-Visual World from Sparse Unconstrained Real-World Samples
2021influential citation
CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition
2021cites this paper
Planar Shape Based Registration for Multi-modal Geometry
2021cites this paper
GASCN: Graph Attention Shape Completion Network
2021cites this paper
Point Projection Network: A Multi-View-Based Point Completion Network with Encoder-Decoder Architecture
2021cites this paper
A survey of LiDAR and camera fusion enhancement
2021cites this paper
Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly
2021cites this paper
Unreal mask: one-shot multi-object class-based pose estimation for robotic manipulation using keypoints with a synthetic dataset
2021cites this paper
ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion
2021cites this paper
Deep learning with small datasets: using autoencoders to address limited datasets in construction management
2021cites this paper
Robotic Grasping With Multi-View Image Acquisition and Model-Based Pose Estimation
2021cites this paper
Unsupervised 3D Shape Coverage Estimation with Applications to Colonoscopy
2021cites this paper
Part-Level Car Parsing and Reconstruction in Single Street View Images
2021cites this paper
An efficient network for category-level 6D object pose estimation
2021cites this paper
A multilevel fusion network for 3D object detection
2021cites this paper
Recent advances in 3D object detection based on RGB-D: A survey
2021cites this paper
MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching
2021cites this paper
A novel detection fusion network for solid waste sorting
2020cites this paper
CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning
2020cites this paper
Visual Perception with Synthetic Data
2020cites this paper
Fully Convolutional Geometric Features for Category-level Object Alignment
2020cites this paper
Geometry to the Rescue: 3D Instance Reconstruction from a Cluttered Scene
2020cites this paper
Deep learning-based mobile augmented reality for task assistance using 3D spatial mapping and snapshot-based RGB-D data
2020cites this paper
Mixing deep learning with classical vision for object recognition
2020cites this paper
Robotic Manipulation Based on 3D Vision: A Survey
2020cites this paper
WatchPose: A View-Aware Approach for Camera Pose Data Collection in Industrial Environments
2020cites this paper
Indoor 3D Scene Understanding Using Depth Sensors
2020cites this paper
Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation
2020cites this paper
Representations and Benchmarking of Modern Visual SLAM Systems
2020cites this paper
CAD-Deform: Deformable Fitting of CAD Models to 3D Scans
2020cites this paper
Data-Driven Indoor Scene Modeling from a Single Color Image with Iterative Object Segmentation and Model Retrieval
2020cites this paper
The Visual Segmentation of Scene Information and Applications in Predictive Haptics
2020cites this paper
View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors
2020cites this paper
Learning 3D Part Assembly from a Single Image
2020cites this paper
Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects
2020cites this paper
CPS: Class-level 6D Pose and Shape Estimation From Monocular Images
2020cites this paper
SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans
2020cites this paper
A survey on deep geometry learning: From a representation perspective
2020cites this paper
Robotic Grasping Using Semantic Segmentation and Primitive Geometric Model Based 3D Pose Estimation
2020cites this paper
A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators
2020influential citation
Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review
2020cites this paper
Unsupervised Learning in Space and Time: A Modern Approach for Computer Vision using Graph-based Techniques and Deep Neural Networks
2020influential citation
Deep Coupled ISTA Network for Multi-Modal Image Super-Resolution
2020cites this paper
Capturing the Geometry of Object Categories from Video Supervision
2020cites this paper
S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds
2020cites this paper
Structural Deep Metric Learning for Room Layout Estimation
2020cites this paper
Rgbd Based Generative Adversarial Network For 3D Semantic Scene Completion
2020cites this paper
Single Shot 6D Object Pose Estimation
2020cites this paper
Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology
2019cites this paper
RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques
2019influential citation
Real-World Robotic Perception and Control Using Synthetic Data
2019cites this paper
3 D Shape Completion and Canonical Pose Estimation with Structured Neural Networks
2019cites this paper
PVFE: Point-Voxel Feature Encoders for 3D Object Detection
2019cites this paper
Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts
2019cites this paper
L3DOC: Lifelong 3D Object Classification
2019cites this paper
3D object detection: Learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images
2019cites this paper
3D model retrieval and pose estimation for indoor images by simulating scene context
2019cites this paper
End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans
2019cites this paper
Coupled Ista Network for Multi-modal Image Super-resolution
2019cites this paper
Silhouette Guided Point Cloud Reconstruction beyond Occlusion
2019cites this paper
Generalized Feedback Loop for Joint Hand-Object Pose Estimation
2019cites this paper
Embodied Visual Recognition
2019cites this paper
Deep Reinforcement Learning of Volume-Guided Progressive View Inpainting for 3D Point Scene Completion From a Single Depth Image
2019cites this paper
Instance- and Category-level 6D Object Pose Estimation
2019cites this paper
Speedup 3-D Texture-Less Object Recognition Against Self-Occlusion for Intelligent Manufacturing
2019cites this paper