In a cluster, a state manager is often used to maintain the states of all hosts and provide the states to schedulers for task scheduling. With the evolution of cloud computing, cluster scales have been expanding. In a large-scale cluster, the state manager, constrained by its limited processing capacity, tends to synchronize with hosts periodically. As a result, resources released by tasks remain marked as occupied in the state manager's view and cannot be reallocated until the next synchronization, which is harmful to the utilization of cluster resources. We refer to these unsynchronized but released resources as Shadow Resources (SR). According to our theoretical analysis, for a cluster with 4k hosts that appears fully loaded in the state manager's view, the shadow resources can account for up to 6.22%. Most existing studies overlook shadow resources, while a few attempt to mitigate their impact by introducing host task queues. However, this approach relies on idealized task runtime prediction and increases system complexity, making it challenging to implement in practice. To overcome the challenges of collecting fleeting shadow resources and utilizing synchronized resources effectively, we propose Shadow Resource Management Architecture (SRMA), which can be embedded into a large-scale cluster under global management. SRMA logically partitions the cluster and collects the shadow resources within each partition through frequent state synchronization. An external SR load balancer, employing the Max-Resource partition First (MRF) algorithm, assigns tasks to SR schedulers, which allocate the collected shadow resources for the tasks. Experiments show that SRMA effectively utilizes shadow resources, achieving near-ideal resource utilization without synchronization delays. It also outperforms other related architectures in task latency and throughput time and provides a benefit equivalent to an increase of 8.5% in the resources of the original cluster.
Is Your Cluster Truly Fully Loaded? Exploring Shadow Resources in Host State Synchronization
Jiawen Liu,Yuehao Xu,Zhijun Ding
Published 2025 in IEEE International Conference on Cloud Computing
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
IEEE International Conference on Cloud Computing
- Publication date
2025-07-07
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-36 of 36 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1