Dual-Scale Attention Networks for Efficient Monocular Depth Estimation

Zhen He,Zhongqi Sun,Jialong Yang,Changkun Du,Yuanqing Xia

Published 2025 in Cybersecurity and Cyberforensics Conference

ABSTRACT

This paper proposes an innovative self-supervised monocular depth estimation algorithm-Dual-Scale Attention Module (DSAM). This method combines the advantages of Convolutional Neural Networks (CNNs) and Transformers by adapting the CNN architecture and introducing a spatial-channel synergistic attention mechanism (UniSA) for multi-scale feature processing, significantly improving the accuracy and robustness of depth estimation. Specifically, the CNN adaptation enhances local feature extraction and expands the receptive field by stacking depth-separable dilated convolutions with different dilation rates. Compared to existing self-supervised monocular depth estimation methods, DSAM demonstrates stronger adaptability in complex scenes and dynamic objects, achieving significant progress in capturing fine-grained depth variations and handling abrupt depth changes. Using a self-supervised learning framework, our method does not rely on manually labeled depth data and shows excellent performance across multiple datasets. Experimental results show that DSAM outperforms existing methods on several key metrics, especially with significant performance improvements on the KITTI dataset. The contributions of this paper lie in proposing a new dual-scale attention mechanism, a self-supervised depth estimation framework, and adapting the CNN architecture, providing innovative solutions for feature extraction, feature fusion, and global context modeling in depth estimation tasks.

PUBLICATION RECORD

Publication year
2025
Venue
Cybersecurity and Cyberforensics Conference
Publication date
2025-07-28
Fields of study
Not labeled
Identifiers
DOI 10.23919/CCC64809.2025.11178391
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
2024cited by this paper
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
2024cited by this paper
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
2023cited by this paper
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
2023cited by this paper
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
2022cited by this paper
Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
2022cited by this paper
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
2022cited by this paper
MonoFormer: Towards Generalization of self-supervised monocular depth estimation with Transformers
2022cited by this paper
R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating
2021cited by this paper
HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation
2020cited by this paper
Enhancing Self-Supervised Monocular Depth Estimation with Traditional Visual Odometry
2019cited by this paper
Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge
2019cited by this paper
3D Packing for Self-Supervised Monocular Depth Estimation
2019cited by this paper
Digging Into Self-Supervised Monocular Depth Estimation
2018cited by this paper
GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
2018cited by this paper
Unsupervised Learning of Depth and Ego-Motion from Video
2017cited by this paper
Deeper Depth Prediction with Fully Convolutional Residual Networks
2016cited by this paper
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
2016cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Discrete-Continuous Depth Estimation from a Single Image
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
2014influential reference
Vision meets robotics: The KITTI dataset
2013cited by this paper
Learning Depth from Single Monocular Images
2005cited by this paper

CITED BY

No citing papers are available for this paper.