Techniques for Inverted Index Compression

Published 2019 in ACM Computing Surveys

ABSTRACT

The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent performance requirements imposed by the heavy load of queries, the inverted index stores billions of integers that must be searched efficiently. In this scenario, index compression is essential because it leads to a better exploitation of the computer memory hierarchy for faster query processing and, at the same time, allows reducing the number of storage machines. The aim of this article is twofold: first, surveying the encoding algorithms suitable for inverted index compression and, second, characterizing the performance of the inverted index through experimentation.

PUBLICATION RECORD

Publication year
2019
Venue
ACM Computing Surveys
Publication date
2019-08-28
Fields of study
Computer Science
Identifiers
DOI 10.1145/3415148 arXiv 1908.10598
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Inverted Index Compression
2019cited by this paper
A Semi-Supervised Approach to Message Stance Classification
2019cited by this paper
An Experimental Study of Index Compression and DAAT Query Processing Methods
2019cited by this paper
Fast Dictionary-Based Compression for Inverted Indexes
2019cited by this paper
Huffman Coding
2019influential reference
On Slicing Sorted Integer Sequences
2019cited by this paper
Index Compression Using Byte-Aligned ANS Coding and Two-Dimensional Contexts
2018influential reference
Compact inverted index storage using general‐purpose compression libraries
2018cited by this paper
On Optimally Partitioning Variable-Byte Codes
2018cited by this paper
Space and Time-Efficient Data Structures for Massive Datasets
2018cited by this paper
Elias Revisited: Group Elias SIMD Coding
2018cited by this paper
Dynamic Elias-Fano Representation
2017cited by this paper
Roaring bitmaps: Implementation of an optimized software library
2017influential reference
Clustered Elias-Fano Indexes
2017influential reference
Stream VByte: Faster byte-oriented integer compression
2017cited by this paper
Faster BlockMax WAND with Variable-sized Blocks
2017cited by this paper
ANS-Based Index Compression
2017cited by this paper
Compressing Integer Sequences
2016cited by this paper
Universal indexes for highly repetitive document collections
2016cited by this paper
Consistently faster and smaller compressed bitmaps with Roaring
2016cited by this paper
Compressing Graphs and Indexes with Recursive Graph Bisection
2016cited by this paper
Vectorized VByte Decoding
2015influential reference
Scalability Challenges in Web Search Engines
2015influential reference
The use of asymmetric numeral systems as an accurate replacement for Huffman coding
2015cited by this paper
Optimal Space-time Tradeoffs for Inverted Indexes
2015cited by this paper
Compression, SIMD, and Postings Lists
2014cited by this paper
Better bitmap performance with Roaring bitmaps
2014influential reference
Partitioned Elias-Fano indexes
2014influential reference
An Introduction to Information Retrieval
2013influential reference
Unicorn: A System for Searching the Social Graph
2013cited by this paper
Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding
2013influential reference
DACs: Bringing direct access to variable-length codes
2013influential reference
Decoding billions of integers per second through vectorization
2012influential reference
Searching web data: An entity retrieval and high-performance indexing model
2012cited by this paper
Quasi-succinct indices
2012cited by this paper
Frame of Reference
2012cited by this paper
SkimpyStash: RAM space skimpy key-value store on flash-based storage
2011influential reference
SIMD-based decoding of posting lists
2011cited by this paper
Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units
2011influential reference
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
2010cited by this paper
Fast integer compression using SIMD instructions
2010cited by this paper
Index compression using 64-bit words
2010influential reference
VI.6 Leonardo of Pisa (known as Fibonacci)
2010cited by this paper
Information Retrieval: Implementing and Evaluating Search Engines
2010influential reference
Tournament Coding of Integer Sequences
2009influential reference
Asymmetric numeral systems
2009influential reference
Re-Pair Compression of Inverted Lists
2009cited by this paper
Inverted index compression and query processing with optimized document ordering
2009cited by this paper
Search Engines - Information Retrieval in Practice
2009cited by this paper
Challenges in building large-scale information retrieval systems: invited talk
2009cited by this paper
Performance of compressed inverted list caching in search engines
2008cited by this paper
Rank and select revisited and extended
2007cited by this paper
Sorting Out the Document Identifier Assignment Problem
2007cited by this paper
Lazy, adaptive rid-list intersection, and its application to index anding
2007cited by this paper
Inverted files for text search engines
2006cited by this paper
Super-Scalar RAM-CPU Cache Compression
2006cited by this paper
Binary codes for locally homogeneous sequences
2006influential reference
Codes for the World Wide Web
2005influential reference
Compression and Coding Algorithms
2005cited by this paper
PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES
2005influential reference
Enhanced Byte Codes with Restricted Prefix Properties
2005cited by this paper
Binary codes for non-uniform sources
2005influential reference
The Webgraph framework II: codes for the World-Wide Web
2004influential reference
Inverted Index Compression Using Word-Aligned Binary Codes
2004influential reference
Selecting the Golomb Parameter in Rice Coding
2004cited by this paper
Compressing Inverted Files
2004cited by this paper
Chapter 3 – Universal Codes
2003cited by this paper
Efficient IR-Style Keyword Search over Relational Databases
2003cited by this paper
Efficient query evaluation using a two-level retrieval process
2003cited by this paper
(S, C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases
2003influential reference
Inverted file compression through document identifier reassignment
2003cited by this paper
Index compression through document reordering
2002influential reference
Low Redundancy in Static Dictionaries with Constant Query Time
2002influential reference
Compression of inverted indexes For fast query evaluation
2002cited by this paper
Offline dictionary-based compression
2000cited by this paper
Binary Interpolative Coding for Effective Index Compression
2000cited by this paper
Compressing Integers for Fast File Access
1999influential reference
Managing Gigabytes: Compressing and Indexing Documents and Images
1999influential reference
Compact pat trees
1998cited by this paper
Arithmetic coding revisited
1998cited by this paper
Compressing relations and indexes
1998cited by this paper
Compressed inverted files with reduced decoding overheads
1998influential reference
On the implementation of minimum-redundancy prefix codes
1996influential reference
Self-indexing inverted files for fast text retrieval
1996influential reference
Exploiting clustering in inverted file compression
1996cited by this paper
Parameterised compression for sparse bitmaps
1992cited by this paper
Some practical universal noiseless coding techniques, part 3, module PSl14,K+
1991cited by this paper
Construction of optimal graphs for bit-vector compression
1989cited by this paper
Compression of concordances in full-text retrieval systems
1988influential reference
Succinct static data structures
1988cited by this paper
Robust transmission of unbounded strings using Fibonacci representations
1987influential reference
Data compression
1987cited by this paper
Arithmetic coding for data compression
1987influential reference
Improved hierarchical bit-vector compression in document retrieval systems
1986cited by this paper
Novel Compression of Sparse Bit-Strings — Preliminary Report
1985influential reference
Data compression via textual substitution
1982cited by this paper
A Compression Method for Clustered Bit-Vectors
1978influential reference
Huffman Coding in Bit-Vector Compression
1978cited by this paper
Generalized Kraft Inequality and Arithmetic Coding
1976cited by this paper
Source coding algorithms for fast data compression
1976cited by this paper

CITED BY

Forward Index Compression for Learned Sparse Retrieval
2026cites this paper
Construction of distinct k-mer color sets via set fingerprinting
2026cites this paper
Fast Pseudoalignment Queries on Compressed Colored de Bruijn Graphs
2025cites this paper
Compact Data Structures for Collections of Sets
2025cites this paper
Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search
2025cites this paper
Caching Document Identifiers to Speedup Query Processing in Search Servers
2025cites this paper
Score-Fitted Indexes and Constant Length Indexes for Information Retrieval
2025cites this paper
Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis
2025cites this paper
Columnar Formatted Inverted Index for Highly-Paralleled, Vectorized Query Processing
2025cites this paper
BitTuner: A Toolbox for Automatically Configuring Learned Data Compressors
2025cites this paper
Kaminari: a resource-frugal index for approximate colored k-mer queries
2025cites this paper
Towards Reliable Configuration Management in Clouds: A Lightweight Consistency Validation Mechanism for Virtual Private Clouds
2025cites this paper
Empirical Asymptotic Growth of Dynamic Pruning Mechanisms
2025cites this paper
Fast Indexing for Temporal Information Retrieval
2025cites this paper
SE-MSLC: Semantic Entropy-Driven Keyword Analysis and Multi-Stage Logical Combination Recall for Search Engine
2025cites this paper
Efficient Inverted Index-based Approximate Retrieval over High-dimensional Learned Sparse Representations
2024cites this paper
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
2024cites this paper
A novel InfluxDB-based inverted index method
2024cites this paper
Learned Data Compression: Challenges and Opportunities for the Future
2024influential citation
Binary Interpolative Coding Revisited
2024cites this paper
Improved Learned Sparse Retrieval with Corpus-Specific Vocabularies
2024cites this paper
Two-level massive string dictionaries
2024cites this paper
A Comparative Analysis of the Lossless Data Compression Methods for Unsparsed Tabular Data
2024cites this paper
COPR -- Efficient, large-scale log storage and retrieval
2024influential citation
Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs*
2024cites this paper
Partitioned Inverted Index Compression Using Hierarchical Dirichlet Process
2024cites this paper
PLA-index: A k-mer Index Exploiting Rank Curve Linearity
2024cites this paper
Where the patterns are: repetition-aware compression for colored de Bruijn graphs⋆
2024cites this paper
Compression and In-Situ Query Processing for Fine-Grained Array Lineage
2024cites this paper
A novel hashing-inverted index for secure content-based retrieval with massive encrypted speeches
2024cites this paper
Fulgor: a fast and compact k-mer index for large-scale matching and color queries
2024influential citation
Exploiting Data-pattern-aware Vertical Partitioning to Achieve Fast and Low-cost Cloud Log Storage
2024cites this paper
PLA-complexity of k-mer multisets
2024cites this paper
Trie and LOUDS hybrid model for efficient e-commerce processing in cloud environment
2024cites this paper
Profiling and Visualizing Dynamic Pruning Algorithms
2023cites this paper
Tradeoff Options for Bipartite Graph Partitioning
2023influential citation
LogGrep: Fast and Cheap Cloud Log Storage by Exploiting both Static and Runtime Patterns
2023cites this paper
Spectrum preserving tilings enable sparse and modular reference indexing
2023cites this paper
SDOT: Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-Based Three-Tier Forensic Classification Framework
2023cites this paper
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
2023cites this paper
Modelo para definir índices de corrupción en convocatorias de contratación en Colombia basado en Big Data y procesamiento del lenguaje natural
2023cites this paper
Many are Better than One: Algorithm Selection for Faster Top-K Retrieval
2023cites this paper
Analyzing and Improving the Scalability of In-Memory Indices for Managed Search Engines
2023cites this paper
Trie-Compressed Adaptive Set Intersection
2023cites this paper
Meta-colored compacted de Bruijn graphs
2023cites this paper
Bridging Dense and Sparse Maximum Inner Product Search
2023cites this paper
Dataset Discovery and Exploration: A Survey
2023cites this paper
Lossy Compression Options for Dense Index Retention
2023cites this paper
Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse Representations
2022cites this paper
POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression
2022cites this paper
Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory
2022cites this paper
Trie-Compressed Intersectable Sets
2022cites this paper
Efficient Immediate-Access Dynamic Indexing
2022cites this paper
Accelerating Learned Sparse Indexes Via Term Impact Decomposition
2022cites this paper
iRun: Horizontal and Vertical Shape of a Region-Based Graph Compression
2022cites this paper
Locality-preserving minimal perfect hashing of k-mers
2022influential citation
MMH-index: Enhancing Apache Lucene with High-Performance Multi-Modal Indexing and Searching
2022cites this paper
Compressing bipartite graphs with a dual reordering scheme
2022cites this paper
Space-Efficient Random Walks on Streaming Graphs
2022cites this paper
TencentCLS: The Cloud Log Service with High Query Performances
2022cites this paper
CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases
2022cites this paper
On weighted k-mer dictionaries
2022influential citation
BioMDSE: A Multimodal Deep Learning-Based Search Engine Framework for Biofilm Documents Classifications
2022cites this paper
Sparse and skew hashing of K-mers
2022cites this paper
Exploring Data Analytics Without Decompression on Embedded GPU Systems
2022cites this paper
Adaptive Succinctness
2021cites this paper
Cost-Effective Updating of Distributed Reordered Indexes
2021cites this paper
Fast direct access to variable length codes
2021cites this paper
Faster Index Reordering with Bipartite Graph Partitioning
2021cites this paper
G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression
2021cites this paper
Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm
2021cites this paper
Fast and Compact Set Intersection through Recursive Universe Partitioning
2021cites this paper
Rank/Select Queries over Mutable Bitmaps
2020cites this paper
TADOC: Text analytics directly on compression
2020cites this paper
Examining the Additivity of Top-k Query Processing Innovations
2020cites this paper
Efficient and Effective Query Auto-Completion
2020influential citation
Compressed Indexes for Fast Search of Semantic Data
2019cites this paper