Mask6D: Masked Pose Priors for 6D Object Pose Estimation

Yihong Dong,Ying Peng,Muqiao Yang,Songtao Lu,Qingjiang Shi

Published 2021 in IEEE International Conference on Acoustics, Speech, and Signal Processing

ABSTRACT

Robust 6D object pose estimation in cluttered or occluded conditions using monocular RGB images remains a challenging task. One reason is that current pose estimation networks struggle to extract discriminative, pose-aware features using 2D feature backbones, especially when the available RGB information is limited due to target occlusion in cluttered scenes. To mitigate this, we propose a novel pose estimation-specific pre-training strategy named Mask6D. Our approach incorporates pose-aware 2D-3D correspondence maps and visible mask maps as additional modal information, which is combined with RGB images for the reconstruction-based model pre-training. Essentially, this 2D-3D correspondence maps a transformed 3D object model to 2D pixels, reflecting the pose information of the target in camera coordinate system. Meanwhile, the integrated visible mask map can effectively guide our model to disregard cluttered background information. In addition, an object-focused pre-training loss function is designed to further facilitate our network to remove the background interference. Finally, we fine-tune our pre-trained pose prior-aware network via conventional pose training strategy to realize the reliable pose prediction. Extensive experiments verify that our method outperforms previous end-to-end pose estimation methods.

PUBLICATION RECORD

Publication year
2021
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Publication date
2021-06-05
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.1109/ICASSP48485.2024.10448345 arXiv 2401.05431
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Bayesian Algorithms for Kronecker-Structured Sparse Vector Recovery With Application to IRS-MIMO Channel Estimation
2023cited by this paper
Learning Features of Intra-Consistency and Inter-Diversity: Keys Toward Generalizable Deepfake Detection
2023cited by this paper
Segment Anything
2023cited by this paper
Enhancing Recommender Systems with Large Language Model Reasoning Graphs
2023cited by this paper
Learning a 3D Morphable Face Reflectance Model from Low-Cost Data
2023cited by this paper
pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting
2023cited by this paper
Self-Supervised Facial Action Unit Detection with Region and Relation Learning
2023influential reference
Neural Relational Inference with Fast Modular Meta-learning
2023cited by this paper
MoLE : Mixture Of Language Experts For Multi-Lingual Automatic Speech Recognition
2023cited by this paper
Leveraging Large Language Models for Pre-trained Recommender Systems
2023cited by this paper
Causal Effect Estimation: Recent Advances, Challenges, and Opportunities
2023cited by this paper
Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models
2023cited by this paper
EasyTPP: Towards Open Benchmarking Temporal Point Processes
2023cited by this paper
Multi-Aspect Enhanced Convolutional Neural Networks for Knowledge Graph Completion
2023cited by this paper
SA-ES: Subspace Activation Evolution Strategy for Black-Box Adversarial Attacks
2023cited by this paper
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
2023cited by this paper
IFQA: Interpretable Face Quality Assessment
2022influential reference
Unsupervised Time-Series Representation Learning with Iterative Bilinear Temporal-Spectral Fusion
2022influential reference
mSLAM: Massively multilingual joint pre-training for speech and text
2022cited by this paper
Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes
2022cited by this paper
Unsupervised Continual Semantic Adaptation Through Neural Rendering
2022cited by this paper
HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences
2022cited by this paper
FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment
2022cited by this paper
3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow
2022cited by this paper
TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis
2022cited by this paper
Exposing Face Forgery Clues via Retinex-Based Image Enhancement
2022cited by this paper
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
2022influential reference
Detecting Deepfakes with Self-Blended Images
2022cited by this paper
Self-Supervised Speech Representation Learning: A Review
2022cited by this paper
Abandoning the Bayer-Filter to See in the Dark
2022influential reference
Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning
2022cited by this paper
Semantics Driven Embedding Learning for Effective Entity Alignment
2022cited by this paper
FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech
2022cited by this paper
Tracing Evolving Networks Using Tensor Factorizations vs. ICA-Based Approaches
2022cited by this paper
TC-SKNet with GridMask for Low-complexity Classification of Acoustic scene
2022cited by this paper
A Novel Micro-Expression Recognition Approach Using Attention-Based Magnification-Adaptive Networks
2022influential reference
Adversarial Texture for Fooling Person Detectors in the Physical World
2022cited by this paper
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
2022cited by this paper
Data-driven spatio-temporal dynamic brain connectivity analysis using fALFF: Application to sensorimotor task data
2022cited by this paper
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
2022cited by this paper
Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond
2022cited by this paper
A comprehensive overview of knowledge graph completion
2022cited by this paper
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
2022cited by this paper
Structure-Aware Sparse Bayesian Learning-Based Channel Estimation for Intelligent Reflecting Surface-Aided MIMO
2022cited by this paper
Towards Lightweight Black-Box Attacks against Deep Neural Networks
2022cited by this paper
Graph Matching with Bi-level Noisy Correspondence
2022cited by this paper
How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted Asr? an Extensive Benchmark on Air Traffic Control Communications
2022cited by this paper
Pixel2Mesh++: 3D Mesh Generation and Refinement From Multi-View Images
2022influential reference
Improving entity alignment via attribute and external knowledge filtering
2022cited by this paper
Time-Series Representation Learning via Temporal and Contextual Contrasting
2021cited by this paper
XCiT: Cross-Covariance Image Transformers
2021cited by this paper
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions
2021cited by this paper
Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding
2021cited by this paper
The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
2021cited by this paper
AVA: Adversarial Vignetting Attack against Visual Recognition
2021cited by this paper
Multi-Granularity Feature Interaction and Relation Reasoning for 3D Dense Alignment and Face Reconstruction
2021cited by this paper
HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization
2021cited by this paper
How Attentive are Graph Attention Networks?
2021cited by this paper
Graph Infomax Adversarial Learning for Treatment Effect Estimation with Networked Observational Data
2021cited by this paper
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
2021influential reference
Video Swin Transformer
2021cited by this paper
Restoring Extremely Dark Images in Real Time
2021influential reference
Neural Temporal Point Processes: A Review
2021cited by this paper
An Empirical Study of Training Self-Supervised Vision Transformers
2021cited by this paper
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
LoFTR: Detector-Free Local Feature Matching with Transformers
2021cited by this paper
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
2021cited by this paper
AdvHaze: Adversarial Haze Attack
2021cited by this paper
On Generating Transferable Targeted Perturbations
2021cited by this paper
A Survey on Curriculum Learning
2021cited by this paper
ViViT: A Video Vision Transformer
2021cited by this paper
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
2021cited by this paper
Incorporating Convolution Designs into Visual Transformers
2021cited by this paper
Mesh Graphormer
2021cited by this paper
Generalizing Face Forgery Detection with High-frequency Features
2021cited by this paper
SSRCNN: A Semi-Supervised Learning Framework for Signal Recognition
2021cited by this paper
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2021cited by this paper
The Multilingual TEDx Corpus for Speech Recognition and Translation
2021cited by this paper
Backdoor Scanning for Deep Neural Networks through K-Arm Optimization
2021cited by this paper
Transformer Embeddings of Irregularly Spaced Events and Their Participants
2021cited by this paper
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification
2021cited by this paper
Adversarial Examples Can Be Effective Data Augmentation for Unsupervised Machine Learning
2021cited by this paper
Adaptive Curriculum Learning
2021cited by this paper
Emotions and Video Sharing Behavior on Facebook of Young Generation
2021influential reference
Unrestricted Adversarial Attacks on ImageNet Competition
2021cited by this paper
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
2021cited by this paper
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
2021cited by this paper
A Survey on Channel Estimation and Practical Passive Beamforming Design for Intelligent Reflecting Surface Aided Wireless Communications
2021cited by this paper
A Graph Regularized Point Process Model For Event Propagation Sequence
2021cited by this paper
WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition
2021cited by this paper
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
2021cited by this paper
SwinIR: Image Restoration Using Swin Transformer
2021influential reference
w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
2021cited by this paper
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
2021cited by this paper
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
2021cited by this paper
RAGA: Relation-aware Graph Attention Networks for Global Entity Alignment
2021cited by this paper
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
2021cited by this paper
Multi-attentional Deepfake Detection
2021influential reference
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021influential reference
TS2Vec: Towards Universal Representation of Time Series
2021cited by this paper

CITED BY

Monocular RGB 6D object pose estimation for augmented reality: a survey
2026cites this paper
VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation
2025cites this paper
Machine Learning Techniques for Pose Based Human Activity Recognition
2025cites this paper
Enhancing Face Forgery Detection with Augmented Feature Distillation
2025cites this paper
Autonomous Plug-in Charging for Wheeled Mobile Manipulator in Unstructured Environments
2025cites this paper
Complex-Valued Transformer for Short-Wave Signal Recognition
2023cites this paper
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool
2023cites this paper
Throughput Maximization Using Deep Complex Networks for Industrial Internet of Things
2023cites this paper
D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network Using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
2023cites this paper
Building Blocks for a Complex-Valued Transformer Architecture
2023cites this paper
MAE-Based Self-Supervised Pretraining Algorithm for Heart Rate Estimation of Radar Signals
2023cites this paper
Few-Shot Specific Emitter Identification via Deep Metric Ensemble Learning
2022cites this paper
Automatic Modulation Classification using Graph Convolutional Neural Networks for Time-frequency Representation
2022cites this paper