Dual-Scale Attention Networks for Efficient Monocular Depth Estimation

Zhen He,Zhongqi Sun,Jialong Yang,Changkun Du,Yuanqing Xia

Published 2025 in Cybersecurity and Cyberforensics Conference

ABSTRACT

This paper proposes an innovative self-supervised monocular depth estimation algorithm-Dual-Scale Attention Module (DSAM). This method combines the advantages of Convolutional Neural Networks (CNNs) and Transformers by adapting the CNN architecture and introducing a spatial-channel synergistic attention mechanism (UniSA) for multi-scale feature processing, significantly improving the accuracy and robustness of depth estimation. Specifically, the CNN adaptation enhances local feature extraction and expands the receptive field by stacking depth-separable dilated convolutions with different dilation rates. Compared to existing self-supervised monocular depth estimation methods, DSAM demonstrates stronger adaptability in complex scenes and dynamic objects, achieving significant progress in capturing fine-grained depth variations and handling abrupt depth changes. Using a self-supervised learning framework, our method does not rely on manually labeled depth data and shows excellent performance across multiple datasets. Experimental results show that DSAM outperforms existing methods on several key metrics, especially with significant performance improvements on the KITTI dataset. The contributions of this paper lie in proposing a new dual-scale attention mechanism, a self-supervised depth estimation framework, and adapting the CNN architecture, providing innovative solutions for feature extraction, feature fusion, and global context modeling in depth estimation tasks.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-24 of 24 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1