| Literature DB >> 34805760 |
Sharma Sagar1,2, Chen Keke1.
Abstract
With the ever-growing data and the need for developing powerful machine learning models, data owners increasingly depend on various untrusted platforms (e.g., public clouds, edges, and machine learning service providers) for scalable processing or collaborative learning. Thus, sensitive data and models are in danger of unauthorized access, misuse, and privacy compromises. A relatively new body of research confidentially trains machine learning models on protected data to address these concerns. In this survey, we summarize notable studies in this emerging area of research. With a unified framework, we highlight the critical challenges and innovations in outsourcing machine learning confidentially. We focus on the cryptographic approaches for confidential machine learning (CML), primarily on model training, while also covering other directions such as perturbation-based approaches and CML in the hardware-assisted computing environment. The discussion will take a holistic way to consider a rich context of the related threat models, security assumptions, design principles, and associated trade-offs amongst data utility, cost, and confidentiality.Entities:
Keywords: Confidential computing; Cryptographic protocols; Machine learning
Year: 2021 PMID: 34805760 PMCID: PMC8591683 DOI: 10.1186/s42400-021-00092-8
Source DB: PubMed Journal: Cybersecur (Singap) ISSN: 2523-3246
Fig. 1A data owner outsourcing to an untrusted cloud provider for learning a model. The data contributors directly submit their encrypted data to the cloud. The cloud carries out the major expensive computations over the encrypted data and data owner can assist with some lightweight work
Fig. 2A data owner outsources data storage and machine learning tasks to the Cloud. The Cryptographic Service Provider (CSP) manages the keys, decrypts intermediate results, and assists the Cloud with other relatively lightweight computations
Fig. 3The systematization framework for confidential machine learning (CML) approaches
Fig. 4The decomposition-mapping-composition (DMC) process for constructing hybrid CML solutions
Real cost comparison for confidential arithmetic and linear algebra operations at 112-bit security, v100×1 and M100×100
| AHE (Paillier) | SHE (RLWE) | Garbled Circuits | Secret Sharing | |||
|---|---|---|---|---|---|---|
| Comp | Comp | Comp | Comm | Comp | Comm | |
| Addition/Subtraction | 0.01 ms | 0.2 ms | 37 ms | 2 KB | 0.0 ms | 0.0 KB |
| Multiplication | 0.05 ms | 39 ms | 138 ms | 40 KB | 1 s | 2 KB |
| Comparison | 429 h | 105 | 37 ms | 2 KB | - | - |
| Division | - | - | 208 ms | 46 KB | - | - |
| Vector Addition | 0.6 ms | 0.2 ms | 36 ms | 192 KB | 0.0 ms | 0.0 KB |
| Dot Product | 6 ms | 39 ms | 5 s | 4 MB | 7 s | 195 KB |
| Matrix-vector Multiplication | 1 s | 3 m | 8 m | 396 MB | 7 s | 290 KB |
Examples for primitive switching strategies in hybrid composition of CML frameworks
| Framework | Primitive Switch | Operation Switch | Justification |
|---|---|---|---|
| SHE → GC | Matrix vector multiplication → Sign Check | Sign checking is impractically expensive with SHE whereas tolerable with GC. | |
| AHE → GC | Matrix Additions → Cholesky’s decomposition | The operations of division and square root in Cholesky’s decomposition were not feasible with the AHE scheme. | |
| AHE → GC | Matrix Additions → Gradient Descent | Gradient descent involved multiplications, additions, and subtractions not entirely feasible with the AHE scheme. | |
| SecSh → GC | Matrix-vector multiplication → Comparison | Comparison is impossible over randomly shared secrets leading the switch to the garbled circuits. | |
| GC → SecSh | Comparison → Vector Subtraction | Use of garbled circuits for comparison was unavoidable however continuing GC on to vector subtraction would result in excessive cost overhead. | |
| SecSh → AHE/OT | Data at rest → Multiplication | Multiplication with random shares required switching to either AHE or OT protocol involving the two parties in the frameworks. | |
| SecSh → GC | Matrix matrix multiplication → ReLu computation | Sign checking is impossible over randomly shared secrets leading the switch to garbled circuits. | |
| GC →SecSh | ReLu → Matrix vector multiplication | Use of garbled circuits for matrix vector multiplication is impractical. |
Example CML methods that replace the expensive algorithmic components with their crypto-friendly versions
| Framework | ML Algorithm | Original Component | Crypto-friendly Component | Benefits |
|---|---|---|---|---|
| Logistic Regression, Neural Networks | Sigmoid, Softmax | ReLu | Avoids inversion and limits expensive confidential divisions to one. | |
| LMC, Fisher’s LDA | Divisions | Multiplications with incorporated division factors | Avoids division costs and simplifies the protocol. | |
| Ridge Linear Regression | LU decomposition | Cholesky’s decomposition | Reduces the cost complexity by half. | |
| Matrix Factorization | Cholesky’s Decomposition | Sorting based matrix factorization | Reduces the overall complexity from quadratic to within a polylogarithmic factor of the complexity in the plaintext | |
| Boosting | Decision Stumps | Random Linear Classifiers | Reduced number of comparisons and simplicity in learning. | |
| Logistic Regression | Exponentiation | Taylor Expansion | Avoids costs involved in multiple levels of multiplications. | |
| Spectral Clustering | Eigen decomposition | Eigen-approximation by Lanczos and Nystrom | Reduces complexity of the problem from |