Novelty Detection in Sequential Data by Informed Clustering and Modeling

Published 2021 in Unknown venue

ABSTRACT

Novelty detection in discrete sequences is a challenging task, since deviations from the process generating the normal data are often small or intentionally hidden. Novelties can be detected by modeling normal sequences and measuring the deviations of a new sequence from the model predictions. However, in many applications data is generated by several distinct processes so that models trained on all the data tend to over-generalize and novelties remain undetected. We propose to approach this challenge through decomposition: by clustering the data we break down the problem, obtaining simpler modeling task in each cluster which can be modeled more accurately. However, this comes at a trade-off, since the amount of training data per cluster is reduced. This is a particular problem for discrete sequences where state-of-the-art models are data-hungry. The success of this approach thus depends on the quality of the clustering, i.e., whether the individual learning problems are sufficiently simpler than the joint problem. While clustering discrete sequences automatically is a challenging and domain-specific task, it is often easy for human domain experts, given the right tools. In this paper, we adapt a state-of-the-art visual analytics tool for discrete sequence clustering to obtain informed clusters from domain experts and use LSTMs to model each cluster individually. Our extensive empirical evaluation indicates that this informed clustering outperforms automatic ones and that our approach outperforms state-of-the-art novelty detection methods for discrete sequences in three real-world application scenarios. In particular, decomposition outperforms a global model despite less training data on each individual cluster.

PUBLICATION RECORD

Publication year
2021
Venue
Unknown venue
Publication date
2021-03-05
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2103.03943
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

LDA Ensembles for Interactive Exploration and Categorization of Behaviors
2020influential reference
Scalable auto-encoders for gravitational waves detection from time series data
2020cited by this paper
Classification-Based Anomaly Detection for General Data
2020cited by this paper
Time-Series Anomaly Detection Service at Microsoft
2019cited by this paper
Pattern-Based Anomaly Detection in Mixed-Type Time Series
2019cited by this paper
A comparative evaluation of novelty detection algorithms for discrete sequences
2019cited by this paper
Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
2019cited by this paper
Visual Analytics of Anomalous User Behaviors: A Survey
2019cited by this paper
Deep Learning for Anomaly Detection: A Survey
2019cited by this paper
A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
2018cited by this paper
Visual analytics for event detection: Focusing on fraud
2018cited by this paper
Towards better analysis of machine learning models: A visual analytics perspective
2017cited by this paper
hdbscan: Hierarchical density based clustering
2017cited by this paper
Recurrent Neural Network Language Models for Open Vocabulary Event-Level Cyber Anomaly Detection
2017cited by this paper
A deep learning enabled subspace spectral ensemble clustering approach for web anomaly detection
2017cited by this paper
Identifying Suspicious User Behavior with Neural Networks
2017cited by this paper
LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems
2016cited by this paper
Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark
2015cited by this paper
A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks
2015cited by this paper
Special Section on Visual Analytics: Anomaly detection for visual analytics of power consumption data
2014cited by this paper
Density-Based Clustering Based on Hierarchical Density Estimates
2013cited by this paper
What Yelp Fake Review Filter Might Be Doing?
2013cited by this paper
Generation of a new IDS test dataset: Time to retire the KDD collection
2013cited by this paper
Anomaly Detection for Discrete Sequences: A Survey
2012cited by this paper
Multi-Domain Learning: When Do Domains Matter?
2012cited by this paper
User Modelling for Exclusion and Anomaly Detection: A Behavioural Intrusion Detection System
2010cited by this paper
Outside the Closed World: On Using Machine Learning for Network Intrusion Detection
2010cited by this paper
Latent Dirichlet Allocation
2009influential reference
Anomaly detection: A survey
2009cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
Isolation Forest
2008cited by this paper
A Appendix : EM algorithm for mixture of maxent models
2008cited by this paper
Comparative Evaluation of Anomaly Detection Techniques for Sequence Data
2008cited by this paper
Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions
2007cited by this paper
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences
2006cited by this paper
Sequence Data Mining
2005cited by this paper
Efficient Modeling of Discrete Events for Anomaly Detection Using Hidden Markov Models
2005cited by this paper
Decomposition Methodology for Knowledge Discovery and Data Mining
2005cited by this paper
PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth
2001cited by this paper
Estimating the Support of a High-Dimensional Distribution
2001cited by this paper
Visualization of navigation patterns on a Web site using model-based clustering
2000cited by this paper
Linear and Order Statistics Combiners for Pattern Classification
1999cited by this paper
Multi-Net Systems
1999cited by this paper
Detecting intrusions using system calls: alternative data models
1999cited by this paper
Structurally adaptive modular networks for nonstationary environments
1999cited by this paper
Long Short-Term Memory
1997influential reference
Graphical models for discovering knowledge
1996cited by this paper
Use of an Artificial Neural Network for Data Analysis in Clinical Decision-Making: The Diagnosis of Acute Coronary Occlusion
1990cited by this paper
Evaluation of Adaptive Mixtures of Competing Experts
1990cited by this paper
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
1987cited by this paper
Applied Regression Analysis
1968cited by this paper

CITED BY

Novelty Detection of Text Using Spectral Graphs and Visualization
2023cites this paper
AA-forecast: anomaly-aware forecast for extreme events
2022cites this paper