A cluster deduplication system can coordinate the work of multiple nodes, which can better alleviate the disk index bottleneck existing in the large-scale data backup system. However, there is a problem of isolated islands of information among nodes during data deduplication. When the servers use the query mode to route data, a large amount of system overhead is required to ensure a high deduplication rate and low throughput rate. At the same time, while the servers cannot obtain a higher deduplication rate if the servers adopt the stateless routing method. Data routing strategy can greatly affect the overall performance of the system. The concept of data frequency is proposed in this paper, and the classified routing strategy is designed. In the metadata server, a byte-shaped Bloom filter for recording the occurrence frequency of data blocks is maintained to record the occurrence frequency of data blocks. The values in the Bloom filter are counted. Then the frequency of the data blocks is compared with the configured threshold value to determine whether the data is "hot data". We use stateful routing to send "clod data" to the storage nodes and use stateless routing to send the hot data to the storage nodes. Experimental results show that the classifying routing algorithm based on the frequency of data can greatly reduce the overhead of the system while guaranteeing the deduplication rate of the deduplication system as well as improve system throughput and real-time processing capabilities. Compared with the fully stateful routing scheme, our method only loses less than 2% of the deduplication rate, which reduces the communication query overhead by more than 25% and improves the real-time processing capability of the storage system.
Research on Routing Strategy in Cluster Deduplication System
Qin-lu He,G. Bian,Weiqi Zhang,Fan Zhang,Shengqiang Duan,Fenglang Wu
Published 2021 in IEEE Access
ABSTRACT
PUBLICATION RECORD
- Publication year
2021
- Venue
IEEE Access
- Publication date
Unknown publication date
- Fields of study
Computer Science, Engineering
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-43 of 43 references · Page 1 of 1
CITED BY
Showing 1-5 of 5 citing papers · Page 1 of 1