Asymptotics of discrete MDL for online prediction

Published 2005 in IEEE Transactions on Information Theory

ABSTRACT

Minimum description length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning processes which are independent and identically distributed (i.i.d.) by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e., observations come in one by one, and the predictor is allowed to update its state of mind after each time step. We identify two ways of predicting by MDL for this setup, namely, a static and a dynamic one. (A third variant, hybrid MDL, will turn out inferior.) We will prove that under the only assumption that the data is generated by a distribution contained in the model class, the MDL predictions converge to the true values almost surely. This is accomplished by proving finite bounds on the quadratic, the Hellinger, and the Kullback-Leibler loss of the MDL learner, which are, however, exponentially worse than for Bayesian prediction. We demonstrate that these bounds are sharp, even for model classes containing only Bernoulli distributions. We show how these bounds imply regret bounds for arbitrary loss functions. Our results apply to a wide range of setups, namely, sequence prediction, pattern classification, regression, and universal induction in the sense of algorithmic information theory among others.

PUBLICATION RECORD

Publication year
2005
Venue
IEEE Transactions on Information Theory
Publication date
2005-06-08
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1109/TIT.2005.856956 arXiv cs/0506022
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Convergence rates of posterior distributions for non-i.i.d. observations
2007cited by this paper
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
2006cited by this paper
Strong Asymptotic Assertions for Discrete MDL in Regression and Classification
2005cited by this paper
Adaptive Online Prediction by Following the Perturbed Leader
2005cited by this paper
Elements of Information Theory
2005cited by this paper
On the Convergence Speed of MDL Predictions for Bernoulli Sequences
2004cited by this paper
Suboptimal behavior of Bayes and MDL in classification under misspecification
2004cited by this paper
On the Convergence of MDL Density Estimation
2004cited by this paper
Universal Artificial Intelligence
2004influential reference
Convergence of Discrete MDL for Sequential Prediction
2004cited by this paper
Convergence and Loss Bounds for Bayesian Sequence Prediction
2003cited by this paper
Optimality of universal Bayesian prediction for general loss and alphabet
2003cited by this paper
Sequence Prediction Based on Monotone Complexity
2003influential reference
The similarity metric
2001cited by this paper
Convergence rates for density estimation with Bernstein polynomials
2001cited by this paper
Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities
2001cited by this paper
Limit Theorems of Probability Theory
2000cited by this paper
Minimum description length induction, Bayesianism, and Kolmogorov complexity
1999cited by this paper
New Error Bounds for Solomonoff Prediction
1999cited by this paper
Estimation of mixture models
1999cited by this paper
The minimum description length principle and reasoning under uncertainty
1998cited by this paper
The Minimum Description Length Principle in Coding and Modeling
1998cited by this paper
Fisher information and stochastic complexity
1996cited by this paper
Information and Randomness
1994cited by this paper
An Introduction to Kolmogorov Complexity and Its Applications
1993influential reference
How to use expert advice
1993cited by this paper
On The Limit Theorems of Probability Theory
1992influential reference
Rechnender Raum
1991cited by this paper
Minimum complexity density estimation
1991influential reference
Information-theoretic asymptotics of Bayes methods
1990cited by this paper
On the relation between descriptional complexity and algorithmic probability
1981cited by this paper
Modeling By Shortest Data Description*
1978cited by this paper
Complexity-based induction systems: Comparisons and convergence theorems
1978cited by this paper
THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS
1970cited by this paper
An Information Measure for Classification
1968cited by this paper
A Formal Theory of Inductive Inference. Part II
1964cited by this paper
Probability theory
1963cited by this paper
Merging of Opinions with Increasing Information
1962cited by this paper

CITED BY

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning
2023cites this paper
Kalman Filter for Online Classification of Non-Stationary Data
2023cites this paper
Evaluating Representations with Readout Model Switching
2023cites this paper
ISG: I can See Your Gene Expression
2022cites this paper
Sequential Learning Of Neural Networks for Prequential MDL
2022cites this paper
Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies
2022cites this paper
Prequential MDL for Causal Structure Learning with Neural Networks
2021cites this paper
Predictions and Algorithmic Statistics for Infinite Sequences
2021cites this paper
Fully General Online Imitation Learning
2021cites this paper
Prediction and MDL for infinite sequences
2020cites this paper
Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine
2019cites this paper
Intuition, intelligence, data compression
2019cites this paper
Tractability of batch to sequential conversion
2018cites this paper
Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine
2018cites this paper
The Measure of All Minds: Evaluating Natural and Artificial Intelligence
2017cites this paper
Asymptotics of Discrete MDL for Online Prediction. erratum
2017cites this paper
Solomonoff Prediction and Occam’s Razor
2016cites this paper
Short Biography
2016cites this paper
Indefinitely Oscillating Martingales
2014cites this paper
Offline to Online Conversion
2014influential citation
Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma
2012cites this paper
Artificial curiosity for autonomous space exploration
2011cites this paper
When Data Compression and Statistics Disagree: Two Frequentist Challenges for the Minimum Description Length Principle
2010cites this paper
Discrete MDL Predicts in Total Variation
2009influential citation
Open Problems in Universal Induction & Intelligence
2009cites this paper
Consistency of discrete Bayesian learning
2008cites this paper
Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma
2008cites this paper
On the Limits of Learning with Computational Models
2008cites this paper
Minimum Description Length Model Selection
2008cites this paper
Transfer Learning Using the Minimum Description Length Principle with a Decision Tree Application
2007cites this paper
On the Consistency of Discrete Bayesian Learning
2007cites this paper
Recent Results in Universal and Non-Universal Induction
2006influential citation
The Missing Consistency Theorem for Bayesian Learning: Stochastic Model Selection
2006influential citation
Fundamental Research for Knowledge Federation
2006cites this paper
TCS Technical Report Potential Functions for Stochastic Model Selection
2006influential citation
TCS Technical Report Consistency Theorems for Discrete Bayesian Learning
2006influential citation
Erratum to "Asymptotics of Discrete MDL for Online Prediction"
2006cites this paper
Online Learning with Universal Model and Predictor Classes
2006cites this paper
No) Quadratic Loss Bounds for
2004cites this paper
Erratum to “Asymptotics of Discrete MDL for Online
year unknowncites this paper