Using the minimum description length to discover the intrinsic cardinality and dimensionality of time series

Bing Hu,T. Rakthanmanon,Yuan Hao,Scott Evans,Stefano Lonardi,Eamonn J. Keogh

Published 2014 in Data mining and knowledge discovery

ABSTRACT

Many algorithms for data mining or indexing time series data do not operate directly on the raw data, but instead they use alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this work, we investigate the problem of discovering the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to automatically discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length framework. Extensive empirical tests show that our method is simpler, more general and more accurate than previous methods, and has the important advantage of being essentially parameter-free.

PUBLICATION RECORD

Publication year
2014
Venue
Data mining and knowledge discovery
Publication date
2014-02-15
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1007/s10618-014-0345-2
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Similarity-Based Prognostics Approach for Remaining Useful Life Prediction
2014influential reference
The impact of motion dimensionality and bit cardinality on the design of 3D gesture recognizers
2013cited by this paper
MDL-based time series clustering
2012cited by this paper
MDL-Based Analysis of Time Series at Multiple Time-Scales
2012cited by this paper
iSAX 2.0: Indexing and Mining One Billion Time Series
2010cited by this paper
Unsupervised Discovery of Abnormal Activity Occurrences in Multi-dimensional Time Series, with Applications in Wearable Systems
2010cited by this paper
Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs
2010cited by this paper
Using the minimum description length principle for global reconstruction of dynamic systems from noisy time series.
2009cited by this paper
Anomaly detection: A survey
2009cited by this paper
Discretization of Time Series Dataset with a Genetic Search
2009cited by this paper
Finding anomalous periodic time series
2009influential reference
Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data.
2009cited by this paper
Recurrent neural networks for remaining useful life estimation
2008cited by this paper
Querying and mining of time series data: experimental comparison of representations and distance measures
2008cited by this paper
Break Detection for a Class of Nonlinear Time Series Models
2008cited by this paper
Knee Point Detection in BIC for Detecting the Number of Clusters
2008influential reference
A similarity-based prognostics approach for Remaining Useful Life estimation of engineered systems
2008cited by this paper
Streaming Time Series Summarization Using User-Defined Amnesic Functions
2008influential reference
The TS-tree: efficient time series search and retrieval
2008cited by this paper
- 1-A COMPARISON OF THREE DATA-DRIVEN TECHNIQUES FOR PROGNOSTICS
2008cited by this paper
MicroRNA Target Detection and Analysis for Genes Related to Breast Cancer Using MDLcompress
2007cited by this paper
Experiencing SAX: a novel symbolic representation of time series
2007influential reference
MDL Histogram Density Estimation
2007cited by this paper
Disk aware discord discovery: finding unusual time series in terabyte sized datasets
2007influential reference
Surface melting derived from microwave radiometers: a climatic indicator in Antarctica
2007cited by this paper
Intelligent Fault Diagnosis and Prognosis for Engineering Systems
2006cited by this paper
A Better Alternative to Piecewise Linear Time Series Segmentation
2006influential reference
Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising
2006cited by this paper
The AAVSO Data Validation Project
2006cited by this paper
Parameter-free spatial data mining using MDL
2005cited by this paper
Finding outlier light curves in catalogues of periodic variable stars
2005influential reference
A minimum description length principle for perception
2005cited by this paper
Optimizing time series discretization for knowledge discovery
2005influential reference
Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
2004influential reference
Rate Distortion and Denoising of Individual Data Using Kolmogorov Complexity
2004cited by this paper
Attribute-Value Selection Based on Minimum Description Length
2004cited by this paper
A hidden Markov model segmentation procedure for hydrological and environmental time series
2004influential reference
Segmenting time series with a hybrid neural networks - hidden Markov model
2002cited by this paper
A Hidden Markov Model Segmentation Procedure for Hydrological and Enviromental Time Series
2002cited by this paper
Finding Motifs in Time Series
2002cited by this paper
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
2002cited by this paper
Theory & Methods: Tree‐based wavelet regression for correlated data using the minimum description length principle
2002cited by this paper
An online algorithm for segmenting time series
2001influential reference
A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases
2000cited by this paper
Climate change science
2000cited by this paper
Managing gigabytes (2nd ed.): compressing and indexing documents and images
1999cited by this paper
Managing Gigabytes: Compressing and Indexing Documents and Images
1999cited by this paper
Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series))
1997cited by this paper
Ideal spacial adaptation via wavelet shrinkage
1994cited by this paper
Ideal spatial adaptation by wavelet shrinkage
1994cited by this paper
Density estimation by stochastic complexity
1992cited by this paper
Some Experiments in Applying Inductive Inference Principles to Surface Reconstruction
1989cited by this paper
Stochastic Complexity in Statistical Inquiry
1989cited by this paper
Passive microwave images of the polar regions and research applications
1977cited by this paper
An Information Measure for Classification
1968cited by this paper
Advances in Minimum Description
year unknowncited by this paper
2011 11th IEEE International Conference on Data Mining Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL
year unknowncited by this paper

CITED BY

Properties and predicted functions of large genes and proteins of apicomplexan parasites
2024cites this paper
Is My Neural Net Driven by the MDL Principle?
2023cites this paper
Load oscillation pattern detection for NILM based on scale space decomposition
2023cites this paper
Mobile behavior trusted certification based on multivariate behavior sequences
2021cites this paper
The minimum description length principle for pattern mining: a survey
2020cites this paper
Knowledge Transfer for Rotary Machine Fault Diagnosis
2020cites this paper
Breakpoint detection in non-stationary runoff time series under uncertainty
2020cites this paper
Unsupervised Idealization of Nano-Electronic Sensors Recordings with Concept Drifts: A Compressive Feature Learning Approach for Non-Stationary Single-Molecule Data Analysis
2020cites this paper
Behavior Analysis for Electronic Commerce Trading Systems: A Survey
2019cites this paper
An Integrated Event Summarization Approach for Complex System Management
2019influential citation
Information-Theoretical Criteria for Characterizing the Earliness of Time-Series Data
2019cites this paper
Optimizing dynamic time warping’s window width for time series data mining applications
2018cites this paper
Dynamic Asset Allocation - Identifying Regime Shifts in Financial Time Series to Build Robust Portfolios
2018cites this paper
Dynamic Asset Allocation
2018cites this paper
Time Series Piecewise Linear Representation Based on Trend Feature Points
2017cites this paper
Degree-Pruning Dynamic Programming Approaches to Central Time Series Minimizing Dynamic Time Warping Distance
2017cites this paper
Design and Evaluation of Statistical Parametric Techniques in Expressive Text-To-Speech: Emotion and Speaking Styles Transplantation
2016cites this paper
Greedy Gaussian segmentation of multivariate time series
2016cites this paper
Robust and Accurate Anomaly Detection in ECG Artifacts Using Time Series Motif Discovery
2015cites this paper