Acoustic communications are the most exploited technology in the so-called Internet of Underwater Things (IoUT). UnderWater (UW) environments are often characterized by harsh propagation features, limited bandwidth, fast-varying channel conditions, and long propagation delay. On the other hand, IoUT nodes are usually battery-powered devices with limited processing capabilities. Accordingly, it is necessary to design optimization algorithms to address the challenging propagation features while balancing them with the limited device capabilities. To address the constraints of the nodes in energy and processing resources, it is crucial to adjust the transmission parameters based on the channel conditions while also developing communication procedures that are both lightweight and energy-efficient. In this work, we introduce a novel Multi-Player Multi-Armed Bandit (MP-MAB) framework for modulation adaptation in Multi-Hop IoUT Acoustic Networks. As opposed to widely used, computation-demanding Deep Reinforcement Learning (DRL) techniques, MP-MAB algorithms are simple and lightweight and allow to iteratively make decisions by selecting one among multiple choices, or arms. The framework is fully-distributed and is able to dynamically select the best modulation technique at each IoUT node by leveraging on high-level statistics (e.g., network throughput), without the need to exploit hard-to-extract channel features (e.g., channel state). We evaluate the performance of the proposed framework using the DESERT UW simulator and compare it with state-of-the-art centralized solutions based on Deep Reinforcement Learning (DRL) for cognitive and heterogeneous networks, namely DRL-MCS, DRL-AM, PPO, SAC, as well as with a multiple-agent, distributed version of the PPO. The results highlight that, despite its simplicity and fully-distributed nature, the proposed framework achieves superior performance in UW networks in terms of throughput, convergence speed, and energy efficiency. Compared to DRL-MCS and DRL-AM, our approach improves network throughput by up to 33% and 20%, respectively, and reduces energy consumption by up to 18% and 16%. When compared to PPO, SAC, and Multi-PPO, the proposed solution achieves up to 11%, 34%, and 38% higher throughput, and up to 7%, 17%, and 33% lower energy consumption, respectively.
Bandits Under the Waves: A Fully-Distributed Multi-Armed Bandit Framework for Modulation Adaptation in the Internet of Underwater Things
F. Busacca,L. Galluccio,S. Palazzo,Andrea Panebianco,R. Raftopoulos
Published 2026 in IEEE Transactions on Network and Service Management
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
IEEE Transactions on Network and Service Management
- Publication date
Unknown publication date
- Fields of study
Computer Science, Engineering, Environmental Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-32 of 32 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1