Recent widely publicized data breaches have exposed thepersonal information of hundreds of millions of people. Somereports point to alarming increases in both the size and fre-quency of data breaches, spurring institutions around theworld to address what appears to be a worsening situation.But, is the problem actually growing worse? In this paper,we study a popular public dataset and develop BayesianGeneralized Linear Models to investigate trends in databreaches. Analysis of the model shows that neither sizenor frequency of data breaches has increased over the pastdecade. We nd that the increases that have attracted at-tention can be explained by the heavy-tailed statistical dis-tributions underlying the dataset. Speci cally, we nd thatdata breach size is log-normally distributed and that thedaily frequency of breaches is described by a negative bi-nomial distribution. These distributions may provide cluesto the generative mechanisms that are responsible for thebreaches. Additionally, our model predicts the likelihood ofbreaches of a particular size in the future. For example, we nd that in the next year there is only a 31% chance of abreach of 10 million records or more in the US. Regardlessof any trend, data breaches are costly, and we combine themodel with two di erent cost models to project that in thenext three years breaches could cost up to $55 billion.
Hype and Heavy Tails: A Closer Look at Data Breaches
Benjamin Edwards,S. Hofmeyr,S. Forrest
Published 2016 in Workshop on the Economics of Information Security
ABSTRACT
PUBLICATION RECORD
- Publication year
2016
- Venue
Workshop on the Economics of Information Security
- Publication date
2016-12-01
- Fields of study
Computer Science, Engineering
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-38 of 38 references · Page 1 of 1