| Literature DB >> 30309364 |
Charlotte Bonte1, Frederik Vercauteren2.
Abstract
BACKGROUND: Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service.Entities:
Keywords: Fixed Hessian; Homomorphic encryption; Logistic regression; Privacy
Mesh:
Year: 2018 PMID: 30309364 PMCID: PMC6180357 DOI: 10.1186/s12920-018-0398-y
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Performance for the financial dataset with 31 covariates and 700 training records and 19,300 testing records
| # iterations | AUC SFH |
|---|---|
| 1 | 0.9418 |
| 5 | 0.9436 |
| 10 | 0.9448 |
| 20 | 0.9466 |
| 50 | 0.9517 |
| 100 | 0.9599 |
Comparing actual and predicted classes
| Actual class | |||
|---|---|---|---|
| -1 | 1 | ||
| Predicted | -1 | True negative (TN) | False negative (FN) |
| Class | 1 | False positive (FP) | True positive (TP) |
Fig. 1ROC curve for the cancer detection scenario of iDASH with 1000 training records and 581 testing records, all with 20 covariates
Fig. 2ROC curve for the financial fraud detection with 1000 training records and 19,000 testing records, all with 31 covariates
The parameters defining plaintext encoding
| w | t | ||
|---|---|---|---|
| Genomic data | (1) | 71 | 5179·5189·5197 |
| Financial data | (2) | 150 | 2237·2239 |
Performance for the genomic dataset with a fixed number of covariates equal to 20
| # training records | Computation time | AUC SFH | AUC glmfit |
|---|---|---|---|
| 500 | 22 min | 0.6348 | 0.6287 |
| 600 | 26 min | 0.6298 | 0.6362 |
| 800 | 35 min | 0.6452 | 0.6360 |
| 1000 | 44 min | 0.6561 | 0.6446 |
The number of testing records is for each row equal to the total number of input records (1581) minus the number of training records
Performance for the genomic dataset with a fixed number of training records equal to 500 and the number of testing records equal to 1081
| # covariates | Computation time | AUC SFH | AUC glmfit |
|---|---|---|---|
| 5 | 7 min | 0.65 | 0.6324 |
| 10 | 12 min | 0.6545 | 0.6131 |
| 15 | 17 min | 0.6446 | 0.6241 |
| 20 | 22 min | 0.6348 | 0.6272 |
Performance for the financial dataset with a fixed number of covariates equal to 31
| # training records | Computation time | AUC SFH | AUC glmfit |
|---|---|---|---|
| 700 | 30 min | 0.9416 | 0.9619 |
| 800 | 36 min | 0.9411 | 0.9616 |
| 900 | 40 min | 0.9409 | 0.9619 |
| 1000 | 45 min | 0.9402 | 0.9668 |
The number of testing records is for each row equal to the total number of input records (20,000) minus the number of training records
Performance for the financial dataset with a fixed number of records equal to 500 and the number of testing records equal to 19,500
| # covariates | Computation time | AUC SFH | AUC glmfit |
|---|---|---|---|
| 5 | 5 min | 0.8131 | 0.8447 |
| 10 | 8 min | 0.9403 | 0.9409 |
| 15 | 11 min | 0.9327 | 0.9492 |
| 20 | 15 min | 0.9401 | 0.9629 |