Many algorithms for data mining or indexing time series data do not operate directly on the raw data, but instead they use alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this work, we investigate the problem of discovering the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to automatically discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length framework. Extensive empirical tests show that our method is simpler, more general and more accurate than previous methods, and has the important advantage of being essentially parameter-free.
Using the minimum description length to discover the intrinsic cardinality and dimensionality of time series
Bing Hu,T. Rakthanmanon,Yuan Hao,Scott Evans,Stefano Lonardi,Eamonn J. Keogh
Published 2014 in Data mining and knowledge discovery
ABSTRACT
PUBLICATION RECORD
- Publication year
2014
- Venue
Data mining and knowledge discovery
- Publication date
2014-02-15
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-57 of 57 references · Page 1 of 1
CITED BY
Showing 1-19 of 19 citing papers · Page 1 of 1