Optimizing Encrypted Neural Networks: Model Design, Quantization and Fine-Tuning Using FHEW/TFHE

Yu-Te Ku; Feng-Hao Liu; Chih-Fan Hsu; Ming-Ching Chang; Shih-Hao Hung; I-Ping Tu; Wei-Chao Chen

Optimizing Encrypted Neural Networks: Model Design, Quantization and Fine-Tuning Using FHEW/TFHE

Authors: Yu-Te Ku (Data Science Degree Program, National Taiwan University and Academia Sinica), Feng-Hao Liu (Washington State University), Chih-Fan Hsu (Inventec Corporation), Ming-Ching Chang (State University of New York, University at Albany), Shih-Hao Hung (Data Science Degree Program, National Taiwan University and Academia Sinica), I-Ping Tu (Data Science Degree Program, National Taiwan University and Academia Sinica), Wei-Chao Chen (Inventec Corporation)

Volume: 2025
Issue: 4
Pages: 1075–1091
DOI: https://doi.org/10.56553/popets-2025-0172

Download PDF

Abstract: Third-generation Fully Homomorphic Encryption (FHE), particularly the FHEW/TFHE schemes, is recognized for its balanced security requirements, small parameters, and low memory usage, though the current methods in the scenarios of Deep Neural Network (DNN) inference still have high computational costs, limiting the practical applicability. This work demonstrates how to improve practicality of the third-generation technologies for DNN tasks while preserving its key advantages. Our work focuses on two main contributions. First, we developed a computational architecture called FHE-Neuron, which reconfigures the parameters and bootstrapping structure of traditional FHEW/TFHE Boolean operations. This architecture significantly reducing the cost of encrypted DNN inference by dynamically switching the precision of encrypted data during computation—using high precision for cost-effective linear operations and low precision for computationally expensive nonlinear operations. Second, we introduced an FHE-aware Quantization and Fine-tuning framework that optimizes model parameters to align with FHE-Neuron’s constraints, ensuring high accuracy in encrypted inference. We validate our approach on various neural network models across several computing platforms. In our experiments, our method achieves one-image inference time on average 4.5 milliseconds for MNIST and 17 milliseconds for Fashion MNIST, achieving accuracy rates of 96.52% and 88.57% respectively. For the CIFAR-10 dataset, our system completes one image inference in 30 seconds with a 90.5% accuracy rate.

Keywords: Encrypted inference, neural networks, Fully Homomorphic Encryption, FHEW/TFHE, boostrapping, quantization, approximated computation, model fine-tuning

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.