Communication-Efficient Federated Group Distributionally Robust Optimization

Published 2024 in Neural Information Processing Systems

ABSTRACT

Federated learning faces challenges due to the heterogeneity in data volumes and distributions at different clients, which can compromise model generalization ability to various distributions. Existing approaches to address this issue based on group distributionally robust optimization (GDRO) often lead to high communication and sample complexity. To this end, this work introduces algorithms tailored for communication-efficient Federated Group Distributionally Robust Optimization (FGDRO). Our contributions are threefold: Firstly, we introduce the FGDRO-CVaR algorithm, which optimizes the average top-K losses while reducing communication complexity to $O(1/\epsilon^4)$, where $\epsilon$ denotes the desired precision level. Secondly, our FGDRO-KL algorithm is crafted to optimize KL regularized FGDRO, cutting communication complexity to $O(1/\epsilon^3)$. Lastly, we propose FGDRO-KL-Adam to utilize Adam-type local updates in FGDRO-KL, which not only maintains a communication cost of $O(1/\epsilon^3)$ but also shows potential to surpass SGD-type local steps in practical applications. The effectiveness of our algorithms has been demonstrated on a variety of real-world tasks, including natural language processing and computer vision.

PUBLICATION RECORD

Publication year
2024
Venue
Neural Information Processing Systems
Publication date
2024-10-08
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.48550/arXiv.2410.06369 arXiv 2410.06369
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization
2023influential reference
Scaff-PD: Communication Efficient Fair and Robust Federated Learning
2023cited by this paper
Generalization Bounds for Federated Learning: Fast Rates, Unparticipating Clients and Unbounded Losses
2023cited by this paper
Explicit Personalization and Local Training: Double Communication Acceleration in Federated Learning
2023cited by this paper
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
2023cited by this paper
Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark
2023cited by this paper
FeDXL: Provable Federated Learning for Deep X-Risk Optimization
2022cited by this paper
Stochastic Constrained DRO with a Complexity Independent of Sample Size
2022cited by this paper
SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression
2022cited by this paper
Communication-Efficient Distributionally Robust Decentralized Learning
2022cited by this paper
Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications
2022cited by this paper
Federated learning enables big data for rare cancer boundary detection
2022cited by this paper
When AUC meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee
2022cited by this paper
An Agnostic Approach to Federated Learning with Class Imbalance
2022influential reference
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs
2022influential reference
Optimal Methods for Convex Risk Averse Distributed Optimization
2022cited by this paper
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization
2022cited by this paper
What Do We Mean by Generalization in Federated Learning?
2021cited by this paper
SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients
2021cited by this paper
The iWildCam 2021 Competition Dataset
2021cited by this paper
Distributionally Robust Federated Averaging
2021influential reference
A Novel Convergence Analysis for Algorithms of the Adam Family
2021cited by this paper
WILDS: A Benchmark of in-the-Wild Distribution Shifts
2020cited by this paper
Is Local SGD Better than Minibatch SGD?
2020cited by this paper
Biased Stochastic Gradient Descent for Conditional Stochastic Optimization
2020cited by this paper
Three Approaches for Personalization with Applications to Federated Learning
2020cited by this paper
Adaptive Federated Optimization
2020influential reference
Adaptive Personalized Federated Learning
2020cited by this paper
Multi-center federated learning: clients clustering for better personalization
2020cited by this paper
Using publicly available satellite imagery and deep learning to understand economic well-being in Africa
2020cited by this paper
Minibatch vs Local SGD for Heterogeneous Distributed Learning
2020cited by this paper
Lower Bounds and Optimal Algorithms for Personalized Federated Learning
2020cited by this paper
An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives
2020cited by this paper
Ditto: Fair and Robust Federated Learning Through Personalization
2020cited by this paper
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
2020cited by this paper
Attentional-Biased Stochastic Gradient Descent
2020cited by this paper
Personalized Federated Learning with First Order Model Optimization
2020cited by this paper
Think Locally, Act Globally: Federated Learning with Local and Global Representations
2020cited by this paper
Advances and Open Problems in Federated Learning
2019cited by this paper
Agnostic Federated Learning
2019cited by this paper
From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge
2019cited by this paper
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
2019cited by this paper
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
2019cited by this paper
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019cited by this paper
Momentum-Based Variance Reduction in Non-Convex SGD
2019cited by this paper
Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations
2019cited by this paper
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
2019cited by this paper
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
2019cited by this paper
Why ADAM Beats SGD for Attention Models
2019cited by this paper
Tighter Theory for Local SGD on Identical and Heterogeneous Data
2019cited by this paper
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
2019cited by this paper
Stochastic model-based minimization of weakly convex functions
2018influential reference
Federated Optimization in Heterogeneous Networks
2018cited by this paper
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
2018cited by this paper
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
2018cited by this paper
Federated Learning for Mobile Keyboard Prediction
2018cited by this paper
Sparsified SGD with Memory
2018cited by this paper
On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks
2018cited by this paper
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
2018cited by this paper
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
2018cited by this paper
Local SGD Converges Fast and Communicates Little
2018cited by this paper
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
2018cited by this paper
signSGD: compressed optimisation for non-convex problems
2018cited by this paper
Learning with Average Top-k Loss
2017cited by this paper
Gradient Sparsification for Communication-Efficient Distributed Optimization
2017cited by this paper
Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences
2016cited by this paper
Communication-Efficient Learning of Deep Networks from Decentralized Data
2016cited by this paper
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
2016cited by this paper
Variance-based Regularization with Convex Objectives
2016cited by this paper
Densely Connected Convolutional Networks
2016cited by this paper
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
2016cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions
2014cited by this paper
Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
2013cited by this paper
Gradient methods for minimizing composite functions
2012cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Convex optimization
2010cited by this paper
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
2004cited by this paper
Minimizing the sum of the k largest functions in linear time
2003cited by this paper

CITED BY

FedSDAF: Leveraging Source Domain Awareness for Enhanced Federated Domain Generalization
2025cites this paper