Synthetic patient data has the potential to advance research in the medical field by providing privacy‐preserving access to data resembling sensitive personal data. Assessing the level of privacy offered is essential to ensure privacy compliance, but it is challenging in practice. Many common methods either fail to capture central aspects of privacy or result in excessive caution based on unrealistic worst‐case scenarios. We present a new approach to evaluating the privacy of synthetic datasets from known probability distributions based on the maximal local privacy loss. The strategy is based on measuring individual contributions to the likelihood of generating a specific synthetic dataset, to detect possibilities of reconstructing records in the original data. To demonstrate the method, we generate synthetic time‐to‐event data based on pancreatic and colon cancer data from the Cancer Registry of Norway using sequential regressions including a flexible parametric survival model. This illustrates the method's ability to measure information leakage at an individual level, which can be used to ensure acceptable privacy risks for every patient in the data.
Maximal Local Privacy Loss—A New Method for Privacy Evaluation of Synthetic Datasets
Sigrid Leithe,Bjørn Møller,B. Aagnes,Yngvar Nilssen,Paul C. Lambert,T. Myklebust
Published 2026 in Statistics in Medicine
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
Statistics in Medicine
- Publication date
2026-01-01
- Fields of study
Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-48 of 48 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1