Differentially Private Shapley Values for Data Evaluation

Lauren Watson,R. Andreeva,Hao Yang,Rik Sarkar

Published 2022 in arXiv.org

ABSTRACT

The Shapley value has been proposed as a solution to many applications in machine learning, including for equitable valuation of data. Shapley values are computationally expensive and involve the entire dataset. The query for a point's Shapley value can also compromise the statistical privacy of other data points. We observe that in machine learning problems such as empirical risk minimization, and in many learning algorithms (such as those with uniform stability), a diminishing returns property holds, where marginal benefit per data point decreases rapidly with data sample size. Based on this property, we propose a new stratified approximation method called the Layered Shapley Algorithm. We prove that this method operates on small (O(\polylog(n))) random samples of data and small sized ($O(\log n)$) coalitions to achieve the results with guaranteed probabilistic accuracy, and can be modified to incorporate differential privacy. Experimental results show that the algorithm correctly identifies high-value data points that improve validation accuracy, and that the differentially private evaluations preserve approximate ranking of data.

PUBLICATION RECORD

Publication year
2022
Venue
arXiv.org
Publication date
2022-06-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.48550/arXiv.2206.00511 arXiv 2206.00511
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Shapley Value in Machine Learning
2022influential reference
GTG-Shapley: Efficient and Accurate Participant Contribution Evaluation in Federated Learning
2021cited by this paper
Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index
2021cited by this paper
Shapley values for feature selection: The good, the bad, and the axioms
2021cited by this paper
Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data
2021cited by this paper
The Shapley Value of Classifiers in Ensemble Games
2021cited by this paper
CGA: a new feature selection model for visual human action recognition
2020cited by this paper
Efficient computation and analysis of distributional Shapley values
2020cited by this paper
Efficient nonparametric statistical inference on population feature importance using Shapley values
2020cited by this paper
Problems with Shapley-value-based explanations as feature importance measures
2020cited by this paper
A Distributional Framework for Data Valuation
2020cited by this paper
Neuron Shapley: Discovering the Responsible Neurons
2020cited by this paper
Estimating Training Data Influence by Tracking Gradient Descent
2020cited by this paper
Interpretable feature subset selection: A Shapley value based approach
2020cited by this paper
Improving KernelSHAP: Practical Shapley Value Estimation via Linear Regression
2020cited by this paper
Efficient and Fair Data Valuation for Horizontal Federated Learning
2020cited by this paper
Differential Privacy Has Disparate Impact on Model Accuracy
2019cited by this paper
Towards Efficient Data Valuation Based on the Shapley Value
2019cited by this paper
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms
2019cited by this paper
An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms
2019influential reference
Data Valuation using Reinforcement Learning
2019cited by this paper
Federated Learning: Challenges, Methods, and Future Directions
2019cited by this paper
A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection
2019cited by this paper
Data Shapley: Equitable Valuation of Data for Machine Learning
2019influential reference
Generalization Bounds for Uniformly Stable Algorithms
2018cited by this paper
Finding Influential Training Samples for Gradient Boosted Decision Trees
2018cited by this paper
Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences
2018influential reference
Understanding Black-box Predictions via Influence Functions
2017cited by this paper
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
2017cited by this paper
A Unified Approach to Interpreting Model Predictions
2017cited by this paper
FDVT: Data Valuation Tool for Facebook Users
2017cited by this paper
Membership Inference Attacks Against Machine Learning Models
2016cited by this paper
Train faster, generalize better: Stability of stochastic gradient descent
2015cited by this paper
Influence in Classification via Cooperative Game Theory
2015cited by this paper
Understanding Machine Learning - From Theory to Algorithms
2014cited by this paper
Practical Differential Privacy via Grouping and Smoothing
2013influential reference
Bounding the Estimation Error of Sampling-based Shapley Value Approximation With/Without Stratifying
2013influential reference
Stability of Multi-Task Kernel Regression Algorithms
2013cited by this paper
Feature evaluation and selection with cooperative game theory
2012cited by this paper
Scikit-learn: Machine Learning in Python
2011cited by this paper
Bounds on the sample complexity for private learning and private data release
2010influential reference
Polynomial calculation of the Shapley value based on sampling
2009cited by this paper
Et al
2008cited by this paper
Sampling algorithms and coresets for ℓp regression
2007cited by this paper
Feature Selection via Coalitional Game Theory
2007cited by this paper
Calibrating Noise to Sensitivity in Private Data Analysis
2006influential reference
Differential Privacy
2006cited by this paper
Feature Selection Based on the Shapley Value
2005cited by this paper
An Introduction to Variable and Feature Selection
2003cited by this paper
Stability and Generalization
2002influential reference
Detection of Influential Observation in Linear Regression
2000cited by this paper
Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid
1996cited by this paper
On the Complexity of Cooperative Solution Concepts
1994cited by this paper
A Value for n-person Games
1988cited by this paper

CITED BY

Private and Robust Contribution Evaluation in Federated Learning
2026cites this paper
Challenges in Enabling Private Data Valuation
2026cites this paper
Detect & Score: Privacy-Preserving Misbehavior Detection and Contribution Evaluation in Federated Learning
2025cites this paper
On the Fragility of Contribution Score Computation in Federated Learning
2025cites this paper
XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models
2025cites this paper
A Comprehensive Study of Shapley Value in Data Analytics
2024cites this paper
Incentives in Private Collaborative Machine Learning
2024cites this paper
A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures
2024influential citation
Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation
2023cites this paper
LIA: Privacy-Preserving Data Quality Evaluation in Federated Learning Using a Lazy Influence Approximation
2022cites this paper