Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Saurabh Gupta,Ross B. Girshick,Pablo Arbeláez,Jitendra Malik

Published 2014 in European Conference on Computer Vision

ABSTRACT

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.

PUBLICATION RECORD

Publication year
2014
Venue
European Conference on Computer Vision
Publication date
2014-07-21
Fields of study
Computer Science
Identifiers
DOI 10.1007/978-3-319-10584-0_23 arXiv 1407.5736
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Fast Edge Detection Using Structured Forests
2016cited by this paper
Scene Parsing with Object Instances and Occlusion Ordering
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Fast Edge Detection Using Structured Forests
2014cited by this paper
Simultaneous Detection and Segmentation
2014influential reference
Caffe: Convolutional Architecture for Fast Feature Embedding
2014cited by this paper
Multiscale Combinatorial Grouping
2014influential reference
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013influential reference
Object Detection in RGB-D Indoor Scenes 1
2013cited by this paper
Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
2013cited by this paper
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
2013cited by this paper
Building Part-Based Object Detectors via 3D Geometry
2013cited by this paper
Indoor Semantic Segmentation using depth information
2013cited by this paper
CPMC-3D-O2P: Semantic segmentation of RGB-D images using CPMC and Second Order Pooling
2013influential reference
Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
2013cited by this paper
Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
2013influential reference
Support Surface Prediction in Indoor Scenes
2013influential reference
Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
2013cited by this paper
Structured Forests for Fast Edge Detection
2013influential reference
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
2013influential reference
Learning Hierarchical Features for Scene Labeling
2013cited by this paper
Indoor Segmentation and Support Inference from RGBD Images
2012influential reference
Discriminatively Trained Sparse Code Gradients for Contour Detection
2012influential reference
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor
2012cited by this paper
Convolutional-Recursive Deep Learning for 3D Object Classification
2012cited by this paper
Unsupervised Feature Learning for RGB-D Based Object Recognition
2012cited by this paper
RGB-(D) scene labeling: Features and algorithms
2012cited by this paper
Semantic Labeling of 3D Point Clouds for Indoor Scenes
2011cited by this paper
Contour Detection and Hierarchical Image Segmentation
2011cited by this paper
A category-level 3-D object dataset: Putting the Kinect to work
2011cited by this paper
A large-scale hierarchical multi-view RGB-D object dataset
2011cited by this paper
Real-time human pose recognition in parts from single depth images
2011cited by this paper
Object Detection with Discriminatively Trained Part Based Models
2010influential reference
LIBLINEAR: A Library for Large Linear Classification
2008cited by this paper
Random Forests
2001cited by this paper
Joint Induction of Shape Features and Tree Classifiers
1997cited by this paper
Backpropagation Applied to Handwritten Zip Code Recognition
1989cited by this paper

CITED BY

Semantic segmentation for food waste classification using RGB-D imaging
2026cites this paper
A Multi-Task Attention-Driven SegNet for Lung Infection Segmentation and Classification From HRCT Images
2026cites this paper
Beyond Color: Advanced RGB-D data augmentation for robust semantic segmentation in crop farming scenes
2026cites this paper
An Autonomous Robotic System for Object Retrieval and Delivery: Enhancing Independence for Users Living with Disability and Older Adults
2026cites this paper
Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection
2026cites this paper
Depth as Prior Knowledge for Object Detection
2026cites this paper
An Effective Deep Vector Field Design for Active Contour-Based Image Segmentation
2026cites this paper
TerrFlat: Physics-Driven Geometry Representation for Structure-Aware Freespace Detection
2026cites this paper
MM5: Multimodal image capture and dataset generation for RGB, depth, thermal, UV, and NIR
2026cites this paper
MAPE-ViT: multimodal scene understanding with novel wavelet-augmented Vision Transformer
2025cites this paper
Autonomous manipulator for depalletizing mixedly piled products of various kinds via object recognition including deep-learning-based boundary detection
2025cites this paper
DA-Fusion: Deformable Attention-Based RGB-D Fusion Transformer for Unseen Object Instance Segmentation
2025cites this paper
A comparison of visual representations for real-world reinforcement learning in the context of vacuum gripping
2025cites this paper
High-Order Multi-Scale Attention and Vertical Discriminator Enhanced CLIP for Monocular Depth Estimation
2025influential citation
Feature Matching in the Dark: Homography-Based RGB-IR Feature Transformation for Low-Light Vision
2025cites this paper
Sorting of box-shaped objects based on multi-modal information
2025cites this paper
DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation
2025cites this paper
Discriminative Correspondence Estimation for Unsupervised RGB-D Point Cloud Registration
2025cites this paper
CFCI-Net: Cross-Modality Feature Calibration and Integration Network for RGB-D Semantic Segmentation
2025cites this paper
Deep learning based 3D segmentation in computer vision: A survey
2025cites this paper
The nexus of intelligent transportation: A lightweight Bi-input fusion detection model for autonomous-rail rapid transit
2025cites this paper
MS2Edge: Towards Energy-Efficient and Crisp Edge Detection with Multi-Scale Residual Learning in SNNs
2025cites this paper
Attention-based three-branch network for RGB-D indoor semantic segmentation
2025cites this paper
A Wafer Defect Detection Method for Unbalanced Data
2025cites this paper
GANet: geometry-aware network for RGB-D semantic segmentation
2025cites this paper
Breast Cancer Detection Redefined: Integrating Xception and EfficientNet-B5 for Superior Mammography Imaging
2025cites this paper
MVIP - A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition
2025cites this paper
Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks
2025cites this paper
Sufficient learning: mining denser high-quality pixel-level labels for edge detection
2025cites this paper
A Novel Uncertainty-Aware Evidential Multimodal Deep Learning for RGB-D Household Object Recognition
2025cites this paper
Real-time RGB-D Semantic Segmentation With Scale-invariant Depth Encoding and Noise-robust Fusion
2025influential citation
DBNet: A depth-guided and boundary-aware network for amodal instance segmentation
2025cites this paper
ECMRN: Efficient cross-modal reparameterization network for RGB-D tasks via prompt tuning
2025cites this paper
Multi-prior guided depth map super-resolution based on a diffusion model
2025cites this paper
EyeNet++: A Multiscale and Multidensity Approach for Outdoor 3-D Semantic Segmentation Inspired by the Human Visual Field
2025cites this paper
DeepDCT-VO: 3D directional coordinate transformation for low-complexity monocular visual odometry using deep learning
2025cites this paper
GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding
2025cites this paper
A Review of Semantic Segmentation Methods Based on RGB-D Images
2025cites this paper
CaMuViD: Calibration-Free Multi-View Detection
2025cites this paper
Remote sensing image protection using CTRSU-Net, SegNet + and ensemble learning
2025cites this paper
Specificity-Guided Cross-Modal Feature Reconstruction for RGB-Infrared Object Detection
2025cites this paper
CCANet: Cross-Modality Comprehensive Feature Aggregation Network for Indoor Scene Semantic Segmentation
2025influential citation
Mask2Edge: Masking dependencies and dynamically capturing pixel differences in edge detection
2025cites this paper
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
2025cites this paper
FTransDeepLab: Multimodal Fusion Transformer-Based DeepLabv3+ for Remote Sensing Semantic Segmentation
2025cites this paper
Segmenting Anything in the Dark via Depth Perception
2025cites this paper
Developing a CNN and NLP Integrated System for Translating American Sign Language Into Human Language
2025cites this paper
Multimodal plane instance segmentation with the Segment Anything Model
2025cites this paper
Adaptive RGB-D Semantic Segmentation with Skip-Connection Fusion for Indoor Staircase and Elevator Localization
2025cites this paper
Learning Frequency-Domain Fusion for Multimodal Remote Sensing Semantic Segmentation
2025cites this paper
HybEdge: Explicit hybrid architecture for edge discontinuity detection
2025cites this paper
C2PD: Continuity-Constrained Pixelwise Deformation for Guided Depth Super-Resolution
2025cites this paper
Feature enhancement and coarse-to-fine detection for RGB-D tracking
2024cites this paper
PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection
2024cites this paper
Simple Scalable Multimodal Semantic Segmentation Model
2024cites this paper
UniVision: A Unified Framework for Vision-Centric 3D Perception
2024cites this paper
COMBINING IMAGE AND POINT CLOUD SEGMENTATION TO IMPROVE HERITAGE UNDERSTANDING
2024cites this paper
3-D Dynamic Multitarget Detection Algorithm Based on Cross-View Feature Fusion
2024cites this paper
Indoor semantic segmentation based on Swin-Transformer
2024cites this paper
Learning Occluded Branch Depth Maps in Forest Environments Using RGB-D Images
2024cites this paper
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation
2024cites this paper
Road Surface Defect Detection—From Image-Based to Non-Image-Based: A Survey
2024cites this paper
Robot Unknown Objects Instance Segmentation Based on Collaborative Weight Assignment RGB–Depth Fusion Strategy
2024cites this paper
EPM-Net: Efficient Feature Extraction, Point-Pair Feature Matching for Robust 6-D Pose Estimation
2024cites this paper
Semantic segmentation for virtual-real fusion data processing in nonferrous metal process industry
2024cites this paper
LiDAR-Camera Fusion for Video Panoptic Segmentation without Video Training
2024cites this paper
Depth-Based Intervention Detection in the Neonatal Intensive Care Unit Using Vision Transformers
2024influential citation
CDMANet: central difference mutual attention network for RGB-D semantic segmentation
2024cites this paper
A survey on sign language recognition from perspectives of traditional and deep-learning methods
2024cites this paper
CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation
2024cites this paper
SEHSNet: Stage Enhancement and Hierarchical Supervision Network for edge detection
2024cites this paper
Transformer fusion for indoor RGB-D semantic segmentation
2024cites this paper
A Portable Object Detection System for Visually Impaired Individuals in Outdoor Environments
2024cites this paper
Volumetric Mapping with Panoptic Refinement using Kernel Density Estimation for Mobile Robots
2024cites this paper
Predictive intention recognition using deep learning for collaborative assembly
2024cites this paper
EFINet: Efficient Feature Interaction Network for Real-Time RGB-D Semantic Segmentation
2024cites this paper
Object recognition with human-in-the-loop assistance using error information
2024cites this paper
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
2024cites this paper
Attention-based fusion network for RGB-D semantic segmentation
2024cites this paper
Gait Analysis Using Single Waist-Mounted RGB-D Camera and Dual Foot-Mounted IMUs
2024cites this paper
A Benchmark Dataset for Evaluating Spatial Perception in Multimodal Large Models
2024cites this paper
DBSCAN and Yolov5 based 3D object detection and its adaptation to a mobile platform
2024cites this paper
Efficient Multimodal Fusion for Hand Pose Estimation With Hourglass Network
2024cites this paper
Single-Photon 3D Imaging with Equi-Depth Photon Histograms
2024cites this paper
Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots
2024cites this paper
Dual-modal non-local context guided multi-stage fusion for indoor RGB-D semantic segmentation
2024cites this paper
EdgeNAT: Transformer for Efficient Edge Detection
2024cites this paper
Bi-directional complementary cascade lightweight network for edge detection
2024cites this paper
Cycle Pixel Difference Network for Crisp Edge Detection
2024cites this paper
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection
2024cites this paper
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer
2024cites this paper
Triple fusion and feature pyramid decoder for RGB-D semantic segmentation
2024cites this paper
Generating crisp boundaries using multi-scale features and mixed loss function
2024cites this paper
Reliable Deep Learning-Based Analysis of Production Areas Using RGB-D Data and Model Confidence Calibration
2024cites this paper
Recurrent Multiscale Feature Modulation for Geometry Consistent Depth Learning
2024cites this paper
A curved path extraction method using RGB-D multimodal data for single-edge guided navigation in irregularly shaped fields
2024cites this paper
Deep Learning Based Semantic Segmentation for BIM Model Generation from RGB-D Sensors
2024cites this paper
A Review of Traffic Scene Reconstruction Based on Images and Point Clouds
2024cites this paper
Adaptive Trust Model for Multi-Agent Teaming Based on Reinforcement-Learning-Based Fusion
2024cites this paper
Dynamic Weighted Fusion and Progressive Refinement Network for Visible-Depth-Thermal Salient Object Detection
2024cites this paper