COPR -- Efficient, large-scale log storage and retrieval

Julian Reichinger,Thomas Krismayer,Jan Rellermeyer

Published 2024 in Unknown venue

ABSTRACT

Modern, large scale monitoring systems have to process and store vast amounts of log data in near real-time. At query time the systems have to find relevant logs based on the content of the log message using support structures that can scale to these amounts of data while still being efficient to use. We present our novel Compressed Probabilistic Retrieval algorithm (COPR), capable of answering Multi-Set Multi-Membership-Queries, that can be used as an alternative to existing indexing structures for streamed log data. In our experiments, COPR required up to 93% less storage space than the tested state-of-the-art inverted index and had up to four orders of magnitude less false-positives than the tested state-of-the-art membership sketch. Additionally, COPR achieved up to 250 times higher query throughput than the tested inverted index and up to 240 times higher query throughput than the tested membership sketch.

PUBLICATION RECORD

Publication year
2024
Venue
Unknown venue
Publication date
2024-02-28
Fields of study
Computer Science
Identifiers
arXiv 2402.18355
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying
2021influential reference
Classification assessment methods
2020cited by this paper
Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics
2020cited by this paper
Techniques for Inverted Index Compression
2019influential reference
Fast and scalable minimal perfect hashing for massive key sets
2017influential reference
MILC: Inverted List Compression in Memory
2017cited by this paper
Fast Scalable Construction of (Minimal Perfect Hash) Functions
2016influential reference
Compact Data Structures - A Practical Approach
2016influential reference
Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems
2016cited by this paper
When Bloom Filters Are No Longer Compact: Multi-Set Membership Lookup for Network Applications
2016cited by this paper
NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison
2013cited by this paper
Compressed static functions with applications
2013cited by this paper
Practical perfect hashing in nearly optimal space
2013cited by this paper
National Institute of Standards and Technologyにおける超伝導研究及び生活
2001cited by this paper
Direct Construction of Minimal Acyclic Subsequential Transducers
2000cited by this paper
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)
2000cited by this paper
Binary Interpolative Coding for Effective Index Compression
2000influential reference
A fast string searching algorithm
1977cited by this paper
Space/time trade-offs in hash coding with allowable errors
1970cited by this paper
This paper is included in the Proceedings of the 15th USENIX Symposium on Operating Systems Design
year unknowncited by this paper

CITED BY

No citing papers are available for this paper.