Learning Cross-Modal Visuomotor Policies for Autonomous Drone Navigation

Published 2025 in IEEE Robotics and Automation Letters

ABSTRACT

Developing effective vision-based navigation algorithms adapting to various scenarios is a significant challenge for autonomous drone systems, with vast potential in diverse real-world applications. This paper proposes a novel visuomotor policy learning framework for monocular autonomous navigation, combining cross-modal contrastive learning with deep reinforcement learning (DRL) to train a visuomotor policy. Our approach first leverages contrastive learning to extract consistent, task-focused visual representations from high-dimensional RGB images as depth images, and then directly maps these representations to action commands with DRL. This framework enables RGB images to capture structural and spatial information similar to depth images, which remains largely invariant under changes in lighting and texture, thereby maintaining robustness across various environments. We evaluate our approach through simulated and physical experiments, showing that our visuomotor policy outperforms baseline methods in both effectiveness and resilience to unseen visual disturbances. Our findings suggest that the key to enhancing transferability in monocular RGB-based navigation lies in achieving consistent, well-aligned visual representations across scenarios, which is an aspect often lacking in traditional end-to-end approaches.

PUBLICATION RECORD

Publication year
2025
Venue
IEEE Robotics and Automation Letters
Publication date
2025-06-01
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.1109/LRA.2025.3559824
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

NPE-DRL: Enhancing Perception Constrained Obstacle Avoidance With Nonexpert Policy Guided Reinforcement Learning
2025influential reference
Mobile Robot Collision Avoidance Based on Deep Reinforcement Learning with Motion Constraints
2024cited by this paper
Real-Time Multi-Drone Detection and Tracking for Pursuit-Evasion With Parameter Search
2024cited by this paper
Design, Modeling, and Control of a Coaxial Drone
2024cited by this paper
BEVNav: Robot Autonomous Navigation via Spatial-Temporal Contrastive Learning in Bird's-Eye View
2024cited by this paper
Present and Future of SLAM in Extreme Environments: The DARPA SubT Challenge
2024cited by this paper
Gaussian Splatting SLAM
2023cited by this paper
Partially-Observable Monocular Autonomous Navigation for UAV through Deep Reinforcement Learning
2023cited by this paper
Learning Multipursuit Evasion for Safe Targeted Navigation of Drones
2023cited by this paper
Champion-level drone racing using deep reinforcement learning
2023cited by this paper
Learning Deep Sensorimotor Policies for Vision-Based Autonomous Drone Racing
2022cited by this paper
Towards monocular vision-based autonomous flight through deep reinforcement learning
2022cited by this paper
Collaborative Target Search With a Visual Drone Swarm: An Adaptive Curriculum Embedded Multistage Reinforcement Learning Approach
2022cited by this paper
TC-SfM: Robust Track-Community-Based Structure-From-Motion
2022cited by this paper
Catalyzing next-generation Artificial Intelligence through NeuroAI
2022cited by this paper
Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning
2020cited by this paper
Decoupling Representation Learning from Reinforcement Learning
2020cited by this paper
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
2020cited by this paper
CURL: Contrastive Unsupervised Representations for Reinforcement Learning
2020cited by this paper
Learning Robust Representations via Multi-View Information Bottleneck
2020cited by this paper
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
2019cited by this paper
Learning to Fly by MySelf: A Self-Supervised CNN-Based Approach for Autonomous Navigation
2018cited by this paper
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2016cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper

CITED BY

Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration
2026cites this paper
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation
2025cites this paper
Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding
2025cites this paper
Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
2025cites this paper
Vision-Based Learning for Drones: A Survey
2023cites this paper