Maximal Local Privacy Loss—A New Method for Privacy Evaluation of Synthetic Datasets

Sigrid Leithe,Bjørn Møller,B. Aagnes,Yngvar Nilssen,Paul C. Lambert,T. Myklebust

Published 2026 in Statistics in Medicine

ABSTRACT

Synthetic patient data has the potential to advance research in the medical field by providing privacy‐preserving access to data resembling sensitive personal data. Assessing the level of privacy offered is essential to ensure privacy compliance, but it is challenging in practice. Many common methods either fail to capture central aspects of privacy or result in excessive caution based on unrealistic worst‐case scenarios. We present a new approach to evaluating the privacy of synthetic datasets from known probability distributions based on the maximal local privacy loss. The strategy is based on measuring individual contributions to the likelihood of generating a specific synthetic dataset, to detect possibilities of reconstructing records in the original data. To demonstrate the method, we generate synthetic time‐to‐event data based on pancreatic and colon cancer data from the Cancer Registry of Norway using sequential regressions including a flexible parametric survival model. This illustrates the method's ability to measure information leakage at an individual level, which can be used to ensure acceptable privacy risks for every patient in the data.

PUBLICATION RECORD

Publication year
2026
Venue
Statistics in Medicine
Publication date
2026-01-01
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1002/sim.70376 PMID 41569604
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis
2025cited by this paper
Synthetic data as external control arms in scarce single-arm clinical trials
2025cited by this paper
"What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation
2024cited by this paper
SYNDSURV: A simple framework for survival analysis with data distributed across multiple institutions
2024cited by this paper
Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment
2024cited by this paper
Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare
2024cited by this paper
Synthetic data generation for a longitudinal cohort study – evaluation, method extension and reproduction of published data analysis results
2023cited by this paper
Synthetic data generation: State of the art in health care domain
2023cited by this paper
Synthetic data in health care: A narrative review
2023cited by this paper
Membership Inference Attacks against Synthetic Data through Overfitting Detection
2023cited by this paper
Comparison of Synthetic Data Generation Techniques for Control Group Survival Data in Oncology Clinical Trials: Simulation Study
2023cited by this paper
The Inadequacy of Similarity-Based Privacy Metrics: Privacy Attacks Against “Truly Anonymous” Synthetic Datasets
2023cited by this paper
On the Quality of Synthetic Generated Tabular Data
2023cited by this paper
Improving communication of cancer survival statistics—feasibility of implementing model-based algorithms in routine publications
2023cited by this paper
Synthetic data for privacy-preserving clinical risk prediction
2023cited by this paper
A New Bound for Privacy Loss from Bayesian Posterior Sampling
2022cited by this paper
Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions
2022cited by this paper
A Unified Framework for Quantifying Privacy Risk in Synthetic Data
2022cited by this paper
TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data
2022cited by this paper
A General Framework for Auditing Differentially Private Machine Learning
2022cited by this paper
Generating high-fidelity synthetic time-to-event datasets to improve data transparency and accessibility
2022influential reference
Federated Learning for Healthcare: Systematic Review and Architecture Proposal
2022cited by this paper
Privacy-preserving healthcare informatics: a review
2021cited by this paper
Membership Inference Attacks From First Principles
2021cited by this paper
Generation and evaluation of synthetic patient data
2020cited by this paper
Auditing Differentially Private Machine Learning: How Private is Private SGD?
2020cited by this paper
Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN)
2020cited by this paper
Differentially Private Generative Adversarial Network
2018cited by this paper
Differential Correct Attribution Probability for Synthetic Data: An Exploration
2018cited by this paper
Differential Privacy: A Primer for a Non-Technical Audience
2018influential reference
Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM
2017cited by this paper
PrivBayes
2017cited by this paper
Protecting Privacy in Large Datasets—First We Assess the Risk; Then We Fuzzy the Data
2017cited by this paper
synthpop: Bespoke Creation of Synthetic Data in R
2016cited by this paper
XGBoost: A Scalable Tree Boosting System
2016cited by this paper
PrivBayes: private data release via bayesian networks
2014cited by this paper
Bayesian Estimation of Disclosure Risks for Multiply Imputed, Synthetic Data
2014cited by this paper
Disclosure Risk Evaluation for Fully Synthetic Categorical Data
2014cited by this paper
Membership privacy: a unifying framework for privacy definitions
2013cited by this paper
Statistical Disclosure Risk: Separating Potential and Harm
2012cited by this paper
Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data
2012cited by this paper
Synthetic datasets for statistical disclosure control
2011cited by this paper
Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation
2011cited by this paper
How Much Is Enough? Choosing ε for Differential Privacy
2011cited by this paper
No free lunch in data privacy
2011cited by this paper
How Protective Are Synthetic Data?
2008cited by this paper
Calibrating Noise to Sensitivity in Private Data Analysis
2006cited by this paper
A General Coefficient of Similarity and Some of Its Properties
1971cited by this paper

CITED BY

No citing papers are available for this paper.