On the Difficulty of Nearest Neighbor Search

Published 2012 in International Conference on Machine Learning

ABSTRACT

Fast approximate nearest neighbor(NN) search in large databases is becoming popular. Several powerful learning-based formulations have been proposed recently. However, not much attention has been paid to a more fundamental question: how difficult is (approximate) nearest neighbor search in a given data set? And which data properties affect the difficulty of nearest neighbor search and how? This paper introduces the first concrete measure called Relative Contrast that can be used to evaluate the influence of several crucial data characteristics such as dimensionality, sparsity, and database size simultaneously in arbitrary normed metric spaces. Moreover, we present a theoretical analysis to prove how the difficulty measure (relative contrast) determines/affects the complexity of Local Sensitive Hashing, a popular approximate NN search method. Relative contrast also provides an explanation for a family of heuristic hashing algorithms with good practical performance based on PCA. Finally, we show that most of the previous works in measuring NN search meaningfulness/difficulty can be derived as special asymptotic cases for dense vectors of the proposed measure.

PUBLICATION RECORD

Publication year
2012
Venue
International Conference on Machine Learning
Publication date
2012-06-26
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1206.6411
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A new approach to interdomain routing based on secure multi-party computation
2012cited by this paper
Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality
2012cited by this paper
Iterative quantization: A procrustean approach to learning binary codes
2011cited by this paper
The Concentration of Fractional Distances
2007influential reference
An Investigation of Practical Approximate Nearest Neighbor Algorithms
2004cited by this paper
Locality-sensitive hashing scheme based on p-stable distributions
2004cited by this paper
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
2001influential reference
When Is ''Nearest Neighbor'' Meaningful?
1999influential reference
Similarity Search in High Dimensions via Hashing
1999influential reference

CITED BY

MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search
2026cites this paper
AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems
2026cites this paper
Evaluating and Generating Query Workloads for High Dimensional Vector Similarity Search
2025cites this paper
VIBE: Vector Index Benchmark for Embeddings
2025cites this paper
Toward Efficient and Scalable Design of In-Memory Graph-Based Vector Search
2025cites this paper
Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data
2025cites this paper
Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art
2025cites this paper
Understanding Time Series Anomaly State Detection through One-Class Classification
2024cites this paper
Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space
2024influential citation
Implementing and Evaluating E2LSH on Storage
2024cites this paper
Efficient Approximate Maximum Inner Product Search Over Sparse Vectors
2024cites this paper
A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"
2023influential citation
Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search
2023cites this paper
AdANNS: A Framework for Adaptive Semantic Search
2023cites this paper
Scalable and space-efficient Robust Matroid Center algorithms
2023cites this paper
Cover Trees Revisited: Exploiting Unused Distance and Direction Information
2023cites this paper
Fast and Scalable Mining of Time Series Motifs with Probabilistic Guarantees
2022cites this paper
DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing
2022cites this paper
PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search
2021cites this paper
The role of local dimensionality measures in benchmarking nearest neighbor search
2021influential citation
High-Dimensional Similarity Search for Scalable Data Science
2021cites this paper
Breaking the curse of dimensionality: hierarchical Bayesian network model for multi-view clustering
2021cites this paper
Random projection-based auxiliary information can improve tree-based nearest neighbor search
2021cites this paper
A Note on Graph-Based Nearest Neighbor Search
2020cites this paper
PM-LSH
2020cites this paper
R2LSH: A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces
2020cites this paper
Algorithm Engineering for High-Dimensional Similarity Search Problems (Invited Talk)
2020cites this paper
An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems
2020cites this paper
Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement
2020cites this paper
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms
2019cites this paper
Fast distributed video deduplication via locality-sensitive hashing with similarity ranking
2019cites this paper
Improvement in Precision Value in Content Based Image Retrieval System using Hashing Method
2019cites this paper
Kernel methods for high dimensional data analysis
2019cites this paper
Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
2018cites this paper
Large‐scale retrieval for medical image analytics: A comprehensive review
2018cites this paper
Towards An efficient unsupervised feature selection methods for high-dimensional data
2018cites this paper
Graph Inference with Applications to Low-Resource Audio Search and Indexing
2017cites this paper
Deep Convolutional Neural Networks and Maximum-Likelihood Principle in Approximate Nearest Neighbor Search
2017cites this paper
Diversity Regularized Latent Semantic Match for Hashing
2017cites this paper
Binary Adaptive Embeddings From Order Statistics of Random Projections
2017cites this paper
Hash Bit Selection for Nearest Neighbor Search
2017cites this paper
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
2017cites this paper
On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness
2017influential citation
Learning to Hash for Indexing Big DataVA Survey Thispaperprovidesreaderswithasystematicunderstandingofinsights,pros,andcons of the emerging indexing and search methods for Big Data.
2016cites this paper
Fast Video Deduplication via Locality Sensitive Hashing with Similarity Ranking
2016cites this paper
Structure Sensitive Hashing With Adaptive Product Quantization
2016cites this paper
Near-Isometric Binary Hashing for Large-scale Datasets
2016cites this paper
Multilinear Hyperplane Hashing
2016cites this paper
Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement (v1.0)
2016cites this paper
An Optimal Greedy Approximate Nearest Neighbor Method in Statistical Pattern Recognition
2015cites this paper
Learning to Hash for Indexing Big Data—A Survey
2015cites this paper
Transformed Residual Quantization for Approximate Nearest Neighbor Search
2015cites this paper
Sublinear Partition Estimation
2015cites this paper
Hubness-Based Clustering of High-Dimensional Data
2015cites this paper
Learning Compact Binary Codes for Hash-Based Fingerprint Indexing
2015cites this paper
Measurement to Intelligence: Feature Extraction, Modeling and Predictive Analysis of Asymmetric Conflict Events
2014cites this paper
Mixed image-keyword query adaptive hashing over multilabel images
2014cites this paper
A Sparse Embedding and Least Variance Encoding Approach to Hashing
2014cites this paper
LSH vs Randomized Partition Trees: Which One to Use for Nearest Neighbor Search?
2014influential citation
Hashing for Similarity Search: A Survey
2014cites this paper
Which Space Partitioning Tree to Use for Search?
2013cites this paper
Reciprocal Hash Tables for Nearest Neighbor Search
2013cites this paper
Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis
2013cites this paper
Action recognition in visual sensor networks: a data fusion perspective
2013cites this paper
Hash Bit Selection: A Unified Solution for Selection Problems in Hashing
2013cites this paper
Density Sensitive Hashing
2012cites this paper
Optimal Parameters for Locality-Sensitive Hashing
2012cites this paper
Submodular video hashing: a unified framework towards video pooling and indexing
2012cites this paper
Learning to Hash for Indexing Big Data V A Survey This paper provides readers with a systematic understanding of insights, pros, and cons of the emerging indexing and search methods for Big Data.
year unknowncites this paper