On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1--10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20× over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5× additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000 queries per second---a > 100× throughput improvement over a CPU-based baseline---while maintaining model accuracy.
GPU-based Private Information Retrieval for On-Device Machine Learning Inference
Maximilian Lam,Jeff Johnson,Wenjie Xiong,Kiwan Maeng,Udit Gupta,Yang Li,Liangzhen Lai,Ilias Leontiadis,Minsoo Rhu,Hsien-Hsin S. Lee,V. Reddi,Gu-Yeon Wei,David Brooks,Edward Suh
Published 2023 in International Conference on Architectural Support for Programming Languages and Operating Systems
ABSTRACT
PUBLICATION RECORD
- Publication year
2023
- Venue
International Conference on Architectural Support for Programming Languages and Operating Systems
- Publication date
2023-01-26
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-89 of 89 references · Page 1 of 1
CITED BY
Showing 1-19 of 19 citing papers · Page 1 of 1