Faster Compact Top-k Document Retrieval

Published 2012 in Data Compression Conference

ABSTRACT

An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA'12] takes O(m+k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1.5n-3n bytes, with O(m + (k+log log n)log log n) time, on typical texts. The index is up to 25 times faster than the best previous compressed solutions, and requires at most 5% more space in practice (and in some cases as little as one half). Apart from replacing classical by compressed data structures, our main idea is to replace suffix tree sampling by frequency thresholding to achieve compression.

PUBLICATION RECORD

Publication year
2012
Venue
Data Compression Conference
Publication date
2012-11-22
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1109/DCC.2013.43 arXiv 1211.5353
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Top-k document retrieval in optimal time and linear space
2012influential reference
Space-Efficient Top-k Document Retrieval
2012influential reference
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
2011cited by this paper
Space-Efficient Data-Analysis Queries on Grids
2011cited by this paper
Improved Compressed Indexes for Full-Text Document Retrieval
2011cited by this paper
Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval
2011cited by this paper
Alphabet-Independent Compressed Text Indexing
2011influential reference
Succinct Trees in Practice
2010cited by this paper
Top-k Ranked Document Search in General Text Databases
2010cited by this paper
Efficient index for retrieving top-k most frequent documents
2010cited by this paper
Fully-functional succinct trees
2010influential reference
Information Retrieval: Implementing and Evaluating Search Engines
2010cited by this paper
Space-Efficient Framework for Top-k String Retrieval Problems
2009influential reference
Directly Addressable Variable-Length Codes
2009cited by this paper
Practical Rank/Select Queries over Arbitrary Sequences
2008cited by this paper
Compressed full-text indexes
2007cited by this paper
Practical Entropy-Compressed Rank/Select Dictionary
2006cited by this paper
Position-Restricted Substring Searching
2006cited by this paper
PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES
2005cited by this paper
High-order entropy-compressed text indexes
2003cited by this paper
Efficient algorithms for document retrieval problems
2002influential reference
An analysis of the Burrows-Wheeler transform
2001cited by this paper
A novel autosomal dominant distal myopathy with early respiratory failure
2001cited by this paper
Suffix arrays: a new method for on-line string searches
1993cited by this paper
A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors
1993cited by this paper
Log-logarithmic worst-case range queries are possible in space ⊕(N)
1983cited by this paper
Linear Pattern Matching Algorithms
1973cited by this paper

CITED BY

The Capocelli Prize
2019cites this paper
Lempel-Ziv compressed structures for document retrieval
2019influential citation
Efficient Semantic Ranking for Top-K Document Retrieval Based on SG-Reversed Index
2018cites this paper
Inverted Treaps
2017cites this paper
Practical Compact Indexes for Top-k Document Retrieval
2017cites this paper
Time-Optimal Top-k Document Retrieval
2017cites this paper
The Quantile Index - Succinct Self-Index for Top-k Document Retrieval
2017cites this paper
Elias-Fano meets Single-Term Top-k Document Retrieval
2017cites this paper
Improved Range Minimum Queries
2016cites this paper
Document retrieval on repetitive string collections
2016influential citation
Improved Single-Term Top-k Document Retrieval
2015influential citation
General Document Retrieval in Compact Space
2015cites this paper
Space-Efficient Frameworks for Top-k String Retrieval
2014cites this paper
Score-safe term-dependency processing with hybrid indexes
2014cites this paper
Efficient Compressed Indexing for Approximate Top-k String Retrieval
2014influential citation
New space/time tradeoffs for top-k document retrieval on sequences
2014cites this paper
Space-efficient data structures for string searching and retrieval
2014cites this paper
Compact Indexes for Flexible Top- k k Retrieval
2014cites this paper
Document Retrieval on Repetitive Collections
2014cites this paper
Plug and Play with Succinct Data Structures
2014cites this paper
Spaces, Trees, and Colors
2013cites this paper
Practical Compressed Suffix Trees
2013cites this paper
Space-Efficient Data Structures for Information Retrieval
2013cites this paper
Top-k Document Retrieval in Compact Space and Near-Optimal Time
2013cites this paper
Optimal Top-k Document Retrieval
2013cites this paper
From Theory to Practice: Plug and Play with Succinct Data Structures
2013influential citation
Top-k Document Retrieval in External Memory
2013cites this paper
Indexes for Document Retrieval with Relevance
2013cites this paper