CryptoGen: Secure Transformer Generation with Encrypted KV-Cache Reuse

He Zhang,Neusha Javidnia,Shweta Pardeshi,Qian Lou,F. Koushanfar

Published 2026 in Unknown venue

ABSTRACT

The widespread deployment of cloud-hosted generative models raises a fundamental challenge: enabling efficient autoregressive generation while preserving the privacy of both user prompts and model parameters in untrusted environments. We address this challenge in a client-server setting where an untrusted server hosts an autoregressive Transformer and the client requires cryptographic protection for both inputs and inference. We present CryptoGen, the first system to enable scalable privacy-preserving neural generation with persistent encrypted key-value (KV) cache reuse. Discriminative-task secure inference systems incur quadratic latency and memory growth when adapted to autoregressive decoding due to the lack of native encrypted KV-cache support. In contrast, CryptoGen achieves near-linear scaling by securely reusing and updating encrypted KV caches throughout generation. CryptoGen integrates homomorphic encryption and secret sharing to support both prefilling and generation. Key techniques include a unified encrypted KV-cache framework, heterogeneous SIMD encodings for different phases, optimized cipher-cipher matrix-matrix and matrix-vector operations, and efficient noise refresh and ciphertext concatenation mechanisms. Evaluation on generative Transformer models trained on WikiText-2, PTB, and LAMBADA shows that for input lengths of 128-512 tokens, CryptoGen achieves 4.4x-7.6x lower per-token latency than state-of-the-art discriminative secure inference systems, while maintaining near-linear latency and memory scaling, with advantages increasing for longer sequences. CryptoGen is released as an open-source library.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-02-09
Fields of study
Computer Science
Identifiers
arXiv 2602.08798
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

THOR: Secure Transformer Inference with Homomorphic Encryption
2025cited by this paper
DAHE: Parameter-Adaptive and Memory Efficient FPGA Acceleration of Homomorphic Encryption
2025cited by this paper
A Survey of LLM-based Agents in Medicine: How far are we from Baymax?
2025cited by this paper
TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation
2025cited by this paper
CipherPrune: Efficient and Scalable Private Transformer Inference
2025cited by this paper
BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-Wise Comparisons
2024cited by this paper
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
2024cited by this paper
OFHE: An Electro-Optical Accelerator for Discretized TFHE
2024cited by this paper
BOLT: Privacy-Preserving, Accurate and Efficient Inference for Transformers
2024influential reference
HEPrune: Fast Private Training of Deep Neural Networks With Encrypted Data Pruning
2024cited by this paper
Trinity: A General Purpose FHE Accelerator
2024cited by this paper
Primer: Fast Private Transformer Inference on Encrypted Data
2023cited by this paper
HEBridge: Connecting Arithmetic and Logic Operations in FV-style HE Schemes
2023cited by this paper
CryptoTrain: Fast Secure Training on Encrypted Dataset
2023cited by this paper
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
2023cited by this paper
BumbleBee: Secure Two-party Inference Framework for Large Transformers
2023cited by this paper
PriML: An Electro-Optical Accelerator for Private Machine Learning on Encrypted Data
2023cited by this paper
LLaMA: Open and Efficient Foundation Language Models
2023cited by this paper
SecFloat: Accurate Floating-Point meets Secure 2-Party Computation
2022cited by this paper
Iron: Private Inference on Transformers
2022influential reference
CryptoLight: An Electro-Optical Accelerator for Fully Homomorphic Encryption
2022cited by this paper
MPCFormer: fast, performant and private Transformer inference with MPC
2022cited by this paper
coxHE: A software-hardware co-design framework for FPGA acceleration of homomorphic computation
2022cited by this paper
MATCHA: A Fast and Energy-Efficient Accelerator for Fully Homomorphic Encryption over the Torusa
2022cited by this paper
SiRnn: A Math Library for Secure RNN Inference
2021cited by this paper
I-BERT: Integer-only BERT Quantization
2021influential reference
Fairness in Credit Scoring: Assessment, Implementation and Profit Implications
2021cited by this paper
Membership Inference Attacks on Machine Learning: A Survey
2021cited by this paper
HEMET: A Homomorphic-Encryption-Friendly Privacy-Preserving Mobile Neural Network Architecture
2021cited by this paper
SAFENet: A Secure, Accurate and Fast Neural Network Inference
2021cited by this paper
A Review of Interpretable ML in Healthcare: Taxonomy, Applications, Challenges, and Future Directions
2021cited by this paper
AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference
2020cited by this paper
CRYPTOGRU: Low Latency Privacy-Preserving Text Analysis With GRU
2020cited by this paper
CrypTFlow2: Practical 2-Party Secure Inference
2020cited by this paper
SHE: A Fast and Accurate Deep Neural Network for Encrypted Data
2019cited by this paper
Review of HIPAA, Part 1: History, Protected Health Information, and Privacy and Security Rules
2019cited by this paper
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019cited by this paper
EzPC: Programmable and Efficient Secure Two-Party Computation for Machine Learning
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019influential reference
Gazelle: A Low Latency Framework for Secure Neural Network Inference
2018influential reference
Scalable and accurate deep learning with electronic health records
2018cited by this paper
Attention is All you Need
2017influential reference
Simple Encrypted Arithmetic Library-SEAL
2017cited by this paper
Simple Encrypted Arithmetic Library - SEAL v2.1
2016cited by this paper
Gaussian Error Linear Units (GELUs)
2016cited by this paper
The LAMBADA dataset: Word prediction requiring a broad discourse context
2016cited by this paper
Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
2015cited by this paper
Algorithms in HElib
2014cited by this paper
(Leveled) Fully Homomorphic Encryption without Bootstrapping
2014cited by this paper
Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based
2013cited by this paper
Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP
2012cited by this paper
Somewhat Practical Fully Homomorphic Encryption
2012cited by this paper
Fully homomorphic encryption using ideal lattices
2009cited by this paper
Secure multi-party computation problems and their applications: a review and open problems
2001cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
Protocols for secure computations
1982cited by this paper

CITED BY

RobPI: Robust Private Inference against Malicious Client
2026cites this paper