A Fast Greedy Algorithm for Outlier Mining

Published 2005 in Pacific-Asia Conference on Knowledge Discovery and Data Mining

ABSTRACT

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

PUBLICATION RECORD

Publication year
2005
Venue
Pacific-Asia Conference on Knowledge Discovery and Data Mining
Publication date
2005-07-26
Fields of study
Computer Science
Identifiers
DOI 10.1007/11731139_67 arXiv cs/0507065
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Mathematical Theory of Communication
2006cited by this paper
An Optimization Model for Outlier Detection in Categorical Data
2005influential reference
Feature bagging for outlier detection
2005cited by this paper
A Unified Subspace Outlier Ensemble Framework for Outlier Detection in High Dimensional Spaces
2005cited by this paper
Mining class outliers: concepts, algorithms and applications in CRM
2004cited by this paper
A Frequent Pattern Discovery Method for Outlier Detection
2004cited by this paper
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
2003cited by this paper
Discovering cluster-based local outliers
2003influential reference
Cross-Outlier Detection
2003cited by this paper
LOCI: fast outlier detection using the local correlation integral
2003cited by this paper
Outlier Detection Integrating Semantic Knowledge
2002cited by this paper
FindOut: Finding Outliers in Very Large Datasets
2002cited by this paper
Outlier Detection Using Replicator Neural Networks
2002cited by this paper
A comparative study of RNN for outlier detection in data mining
2002influential reference
Two-phase clustering process for outliers detection
2001cited by this paper
Estimating the Support of a High-Dimensional Distribution
2001cited by this paper
Efficient algorithms for mining outliers from large data sets
2000influential reference
Distance-based outliers: algorithms and applications
2000cited by this paper
LOF: identifying density-based local outliers
2000cited by this paper
Support vector domain description
1999cited by this paper
Fast Computation of 2-Dimensional Depth Contours
1998cited by this paper
UCI Repository of machine learning databases
1998cited by this paper
Identification of Outliers
1988cited by this paper
Outliers in Statistical Data
1979cited by this paper

CITED BY

Automated anomaly detection for categorical data by repurposing a form filling recommender system
2024cites this paper
Outlier detection for partially labeled categorical data based on conditional information entropy
2024cites this paper
Robust Outlier Detection Method Based on Local Entropy and Global Density
2023cites this paper
Optimising workflow execution for energy consumption and performance
2023cites this paper
Attribute-weighted outlier detection for mixed data based on parallel mutual information
2023cites this paper
Granular-Ball Clustering Based Neighbourhood Outliers Detection Method
2023cites this paper
Automatic spectrum recognition system for charge state analysis in electron cyclotron resonance ion sources
2023cites this paper
Automation of cleaning and ensembles for outliers detection in questionnaire data
2022cites this paper
Online Extremism Detection: A Systematic Literature Review With Emphasis on Datasets, Classification Techniques, Validation Methods, and Tools
2021cites this paper
New Synonyms Extraction Model Based on a Novel Terms Weighting Scheme
2021cites this paper
FAST-ODT: A Lightweight Outlier Detection Scheme for Categorical Data Sets
2021cites this paper
A Systematic Literature Review on Outlier Detection in Wireless Sensor Networks
2020cites this paper
A local-gravitation-based method for the detection of outliers and boundary points
2020cites this paper
Map Attribute Validation using Historic Floating Car Data and Anomaly Detection Techniques
2020cites this paper
Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping
2020influential citation
The Effect of the Multi-Layer Text Summarization Model on the Efficiency and Relevancy of the Vector Space-based Information Retrieval
2020cites this paper
Auto-Detection of Click-Frauds using Machine Learning
2020cites this paper
Outlier Detection Forest for Large-Scale Categorical Data Sets
2019cites this paper
Research Issues in Outlier Detection
2019cites this paper
Introduction
2019cites this paper
Outlier Detection in Categorical Data
2019cites this paper
A Review on Outlier Detection Approaches
2019cites this paper
Unsupervised representation learning for anomaly detection on neuroimaging. Application to epilepsy lesion detection on brain MRI. (Apprentissage de représentation non supervisé pour la détection d'anomalies en neuroimagerie. Application à la détection de lésions épileptogènes en Imagerie IRM)
2019cites this paper
An efficient framework of utilizing the latent semantic analysis in text extraction
2019influential citation
A New Approach to Detect At-Risk Learning Communities in Social Networks
2019cites this paper
Anomaly Detection Methods for Categorical Data
2019cites this paper
There and back again: Outlier detection between statistical reasoning and data mining algorithms
2018cites this paper
Automatic Hand Sign Recognition: Identify Unusuality Through Latent Cognizance
2018cites this paper
GeoSClean: Secure Cleaning of GPS Trajectory Data Using Anomaly Detection
2018cites this paper
An Efficient Representation-Based Method for Boundary Point and Outlier Detection
2018cites this paper
Identification of Rare Diseases:An Outlier Analysis Approach
2018cites this paper
Unsupervised Anomaly Detection on Multi-Process Event Time Series
2018cites this paper
A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery
2018cites this paper
A Methodology for Automatised Outlier Detection in High-Dimensional Datasets: An Application to Euro Area Banks’ Supervisory Data
2018cites this paper
A novel methodology for clinical semantic annotations assessment
2018cites this paper
On the influence of categorical features in ranking anomalies using mixed data
2018influential citation
Differentially Private Outlier Detection in a Collaborative Environment
2018cites this paper
A hybrid approach for mismatch data reduction in datasets and guide data mining
2017cites this paper
A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data
2017cites this paper
Detecting Special Lecturers Using Information theory-based Outlier Detection Method
2017influential citation
Mining Cuboid Outliers in Information Networks
2017cites this paper
USING DYNAMIC BAYESIAN NETWORKS FOR THE AUTOMATED VISUAL INSPECTION AND ANALYSIS OF AN INDUSTRIAL LASER PROCESS ESAUTHOR: ALBERTO OGBECHIE CONDÉS IADVISORS: PEDRO LARRAÑAGA MUGICA CONCHA BIELZA LOZOYA
2017cites this paper
Detection of Outlier in Uncategorical Dataset using Hybrid Algorithm
2016cites this paper
Comparative Study of Improving Classifiers Accuracies
2016cites this paper
Anomaly Detection in Distributed Dataflow Systems
2016cites this paper
A Survey on Efficient Anomaly Detection in Network Data Stream ( Outlier )
2016cites this paper
ZERO++: Harnessing the Power of Zero Appearances to Detect Anomalies in Large-Scale Data Sets
2016cites this paper
A fast outlier detection for categorical datasets
2016cites this paper
Collaborative Differentially Private Outlier Detection for Categorical Data
2016cites this paper
Multivariate Computing and Robust Estimating for Outlier and Novelty in Data and Imaging Sciences
2015cites this paper
Associating absent frequent itemsets with infrequent items to identify abnormal transactions
2015cites this paper
To Detect Outlier for Categorical Data Streaming
2015cites this paper
An optimized dimensionality reduction model for high-dimensional data based on Restricted Boltzmann Machines
2015cites this paper
Comparision of Classifiers Accuracies from FAVF and NOFI for Categorical Data
2015cites this paper
OUTLIER DETECTION THROUGH FAST THRESHOLD CLUSTERING ALGORITHM (FTCA)
2015cites this paper
Batch and Online Implicit Weighted Gaussian Processes for Robust Novelty Detection
2015cites this paper
Data collection and analytics strategies of social networking websites
2015cites this paper
A ranking-based algorithm for detection of outliers in categorical data
2014cites this paper
Detecção de anomalias utilizando métodos paramétricos e múltiplos classificadores
2014cites this paper
A review of novelty detection
2014cites this paper
A data mining approach to improve the automated quality of data
2014cites this paper
A Unified Framework for Outlier Detection in Trace Data Analysis
2014cites this paper
On detecting spatial categorical outliers
2014cites this paper
Mining implict outlier purchasing behaviors from fan group marketing data
2014cites this paper
A Review on Detection of Outliers Over High Dimensional Streaming Data Using Cluster Based Hybrid Approach
2014cites this paper
Chapter 1 HIGH-DIMENSIONAL OUTLIER DETECTION : THE SUBSPACE METHOD
2014cites this paper
A nascent approach to mine outliers using compression
2014cites this paper
Rough K-means Outlier Factor Based on Entropy Computation
2014influential citation
Spatial Outlier Detection Approaches and Methods: A Survey
2014cites this paper
Clustering categorical data in projected spaces
2013cites this paper
A Fast Outlier Detection Method for Big Data
2013cites this paper
Learning styles vs suitable courses
2013cites this paper
A general approach for automating outliers identification in categorical data
2013cites this paper
Comparison between Two Approach Based on Threshold and
2013cites this paper
Cloud Model-based Outlier Detection Algorithm for Categorical Data
2013cites this paper
Outlier Analysis of Categorical Data Using Infrequency
2013cites this paper
Prediction and Anomaly Detection Techniques for Spatial Data
2013cites this paper
Outlier detection for information networks
2013cites this paper
Outlier analysis of categorical data using FuzzyAVF
2013cites this paper
Outlier Analysis
2013cites this paper
A simple and effective outlier detection algorithm for categorical data
2013cites this paper
Detecting Outliers in High Dimensional Categorical Data through Feature Selection
2013cites this paper
Outlier Analysis of Categorical Data using NAVF
2013cites this paper
Comparative Study of Various Techniques on Outlier Detection
2013cites this paper
A Model to Find Outliers in Mixed-Attribute Datasets using Mixed Attribute Outlier Factor
2013cites this paper
Unsupervised feature selection for outlier detection in categorical data using mutual information
2012cites this paper
CLOVER: a faster prior-free approach to rare-category detection
2012cites this paper
OUTLIER ANALYSIS OUTLIER ANALYSIS
2012cites this paper
Association rules based algorithm for identifying outlier transactions in data stream
2012cites this paper
An algorithm for mining outliers in categorical data through ranking
2012cites this paper
An Outlier Mining Algorithm Based on Dissimilarity
2012cites this paper
An Outlier Mining Algorithm Based on Attribute Entropy
2011cites this paper
Data Mining Techniques for Outlier Detection
2011cites this paper
Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
2011cites this paper
An efficient strategy to detect outlier transactions for knowledge mining
2011cites this paper
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
2010cites this paper
Modelação e análise da vida útil (metrológica) de medidores tipo indução de energia elétrica ativa
2010cites this paper
An outlier mining algorithm based on characteristic attribute subspace
2010cites this paper
A proposed outliers identification algorithm for categorical data sets
2010cites this paper
Implementing best practices for fraud detection on an online advertising platform
2010cites this paper