Literature DB >> 31141561

PPCD: Privacy-preserving clinical decision with cloud support.

Hui Ma¹, Xuyang Guo², Yuan Ping^1,3, Baocang Wang^1,4, Yuehua Yang¹, Zhili Zhang¹, Jingxian Zhou³.

Abstract

With the prosperity of machine learning and cloud computing, meaningful information can be mined from mass electronic medical data which help physicians make proper disease diagnosis for patients. However, using medical data and disease information of patients frequently raise privacy concerns. In this paper, based on single-layer perceptron, we propose a scheme of privacy-preserving clinical decision with cloud support (PPCD), which securely conducts disease model training and prediction for the patient. Each party learns nothing about the other's private information. In PPCD, a lightweight secure multiplication is presented and introduced to improve the model training. Security analysis and experimental results on real data confirm the high accuracy of disease prediction achieved by the proposed PPCD without the risk of privacy disclosure.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 31141561 PMCID： PMC6541381 DOI： 10.1371/journal.pone.0217349

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

With sharp growth of electronic data, machine learning has impacted on human’s lifestyle by predicting human’s behavior and future trends on everything [1], [2], [3]. To overcome the limitations of storage and computing resource, how to outsource pricey tasks of machine learning to the Cloud has attracted much more attention. For instances, data of the client can be transmitted to the Cloud for either model training and predicting [4], [5], [6]. As a popular machine learning algorithm, single-layer perceptron (SLP) is simple yet efficient and has been widely used in disease prediction [7], [8], [9]. It is more appropriate for real-time disease predicting than some complex techniques such as naïve bayesian [10], decision trees [2] and support vector machines (SVMs) [11], [12] and so on. Clinical decision support system (CDSS), which uses various data mining techniques to help physicians make proper disease diagnosis and provide health services for patients, has received considerable attention [7], [13], [14],[15]. However, for privacy concerns, users don’t want to submit their medical data to an unauthorized institution [16], [17], [18]. At the same time, due to classifier being considered as own asset of the medical service provider, there is a risk of exposing the prediction model to third-party. Otherwise, third-party will use the model to make disease prediction for a patient who could damage the profile of medical service provider. Therefore, the confidentiality of both medical data and disease model are crucial for the CDSS. How to achieve secure disease prediction without compromising the accuracy of the result becomes a challenging issue. To protect the privacy of patients’ medical data and the security of the prediction model, in this study, we propose a privacy-preserving clinical decision scheme based on SLP with cloud support (PPCD). As shown in Fig 1, two phases of SLP model training and disease predicting are included. In the model training, Diagnosed patients encrypt their symptoms data and outsource them with the corresponding diagnosed disease to the cloud. Meanwhile, the hospital generates random weights which are then encrypted and sent to the cloud. After receiving both of the encrypted medical data and the weights, the cloud trains the model accompanied by a few interactions with the hospital. The cloud selects an encrypted sample and executes the sign(.) function. If the returned value of sign(.) does not match its label, the cloud updates the weights until the convergence criterion is satisfied or all the disease cases are matched. When a patient wants to check his disease, he encrypts the data of the symptoms and submits it to the hospital which completes the analysis based on the disease model and sends back the encrypted diagnosis result and some medical advice.

Fig 1

Architecture of the proposed PPCD.

Towards tackling the privacy concerns in Clinical decision support system, PPCD provides disease model training and disease risk prediction for the patient in a privacy-preserving way that makes the Cloud learns nothing about the patient’s medical information and the actual model. Specifically, the main contributions lie in: The proposal of PPCD which provides a privacy-preserving clinical decision based on SLP with cloud support. It helps the doctor to predict disease since the medical data and the diagnosis result remains in encrypted forms. Furthermore, the built disease diagnosis model is also protected as an asset of the hospital. For privacy-preserving in the phase of model training, a specific lightweight secure multiplication (LSM) is presented. By employing LSM, PPCD securely finishes the inner-product in encrypted-domain (ED) after one round. We implement PPCD by Java to check its performance in ED. Experimental results from several medical data analysis confirm that PPCD achieves comparable accuracies with SLP in plain-domain (PD). The remainder of this paper is organized as follows: The following section briefly introduces the preliminaries. Then, PPCD is proposed along with LSM. Also, correction & security analysis is detailed, followed by the section of performance evaluation. Related works and conclusions are respectively given by the last two sections.

Preliminaries

In this section, a brief glimpse of the Paillier cryptosystem, SLP and secure multiplication (SM) are given. Table 1 summarizes the key notations.

Table 1

Summary of notations.

Notation	Definition
PK_h	Hospital’s public key of the Paillier encryption scheme
SK_h	Hospital’s private key of the Paillier encryption scheme
PK_up	Undiagnosed Patient’s public key of the Paillier encryption
SK_up	Undiagnosed Patient’s private key of the Paillier encryption
EPKh(⋅)	The Paillier’s encryption function
ESKh(⋅)	The Paillier’s decryption function
Sign(.)	Activation function of SLP
x_i	Symptom vector of patient i
O_i	Output value, O_i ∈ {−1, 1}
D_k	The k-th disease, k ∈{1, m}
Cxi→	Encrypted symptom vector of patient i
CWk→	Weight ciphertext vector of k-th disease
x_ij	The j-th symptom attribute of patient i
Cx_i,j	Ciphertext of x_ij
Cw_j	Ciphertext of w_j
\|x_ij\|	The absolute value of x_ij
r_xij, r_wj	The random numbers, r_xij, r_wj ∈ Z_N
EXP	Time cost of one exponentiation operation
MUL	Time cost of one multiplication operation
DIV	Time cost of one modular inverse operation
#	Not equal to

Single-layer perceptron

Following [19], SLP is to learn the weight vector w which is then multiplied with the input features to determine if a sample belongs to one class or the other. We define an activation function sign(z) which takes the linear combination of the input values x and w as input. If sign(z) is greater than a defined threshold θ, we predict 1 and -1 otherwise. In order to simplify the notation, we define w0 = −θ and x0 = 1, so that where For each training sample x, we calculate the output value, and update w if the output is not the same with the target. The value for updating the weights at each increment is calculated by the learning rule, where η is the learning rate (0 < η ≤ 1). It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable. If a linear decision boundary can’t separate the two classes, a maximum number of passes should be set over the training dataset and/or a threshold for the number of tolerated misclassifications.

Paillier cryptosystem

Paillier cryptosystem is an additively homomorphic cryptosystem [20]. It works as follows: Key generation: Two large prime numbers p and q are randomly and independently chosen such that gcd(pq, (p − 1)(q − 1)) = 1, where |p| = |q|. Then, we compute n = pq and λ = lcm(p − 1, q − 1), and select a random integer g in . By setting μ = (L(g mod n2))−1 mod n and , the public key (n, g) and the private key (λ, μ) are obtained. Encryption: Let m be a message to be encrypted where 0 ≤ m < n. With a randomly selected r where 0 < r < n, the ciphertext is calculated by c = E(m) = g · r mod n2. Decryption: Let c be the ciphertext to decrypt where , the plaintext message is got by m = D(c) = L(c mod n2) · μ mod n. As a additively homomorphic, its identities: D((E(m1, r1) · E(m2, r2) mod n2) = (m1 + m2) mod n and homomorphic multiplication of plaintexts: D((m1, r1) mod n2) = km1 mod n.

Secure multiplication

Secure Multiplication(SM) [21] supports multiplication in ED. Suppose Alice has two encrypted data E(X) and E(Y), Bob has the private key sk corresponding to public key pk, the goal of SM is to compute E(X * Y) without leaking X and Y to Alice. SM protocol is described as follow: Alice gets ciphertext E(x) and E(y), generates two random numbers r, r ∈ z, and then calculates x1 = E(x) · E(r) and y1 = E(y) · E(r). Send x1 and y1 to Bob. After received x1 and y1, Bob decrypts x1 and y1 by using the private key sk to get H = D(x1) and H = D(y1), then computes H1 = H · Hy mod N, last Bob encrypts H1 with pk H = E(H1) and sends H to Alice. Alice first computes , and s3 = E(r · r), then multiplies them as E(x · y) = H · s1 ·s2 ·s3.

The proposed PPCD model

Model overview and requirements

Model overview

To make employing SLP for model training and disease prediction with privacy being protected, the proposed PPCD model contains four parties which are illustrated in Table 2. They collaboratively conduct SLP model training and disease predicting. The CS trains a disease prediction model based on the DP’s disease data. To check a disease, UP submits his symptoms data to the Hospital which predicts the corresponding disease based on the trained model. Fig 1 depicts the detailed procedure.

Table 2

Description of the attended four parties.

Parts	Descriptions
Diagnosed Patient(DP)	DP encrypts the symptoms data with the hospital’s public key PK_h and the diagnosed result, which are used for training disease model, and then outsources the data to the Cloud server
Undiagnosed Patients(UP)	UP provides the encrypted disease symptoms data for hospital to make decisions
Hospital	As a medical service provider, the hospital is a trusted party who is in charge of generating, distributing and management of public key and private key. Meanwhile, the hospital performs model training together with the cloud server and disease predicting for UP based on patient’s symptoms
Cloud Server (CS)	CS with almost unlimited storage trains the disease model according to the outsourced medical data. The trained model is securely stored in the hospital

Privacy requirements

In PPCD, DPs are trustworthy. They provide correct medical data to the Cloud server. Meanwhile, CS and UP are honest-but-curious [22]. CS strictly follows the privacy-preserving SLP learning protocol performed in the system. It wants to know HP’s sensitive medical data and UP’s medical information once the condition is met. UP is interested in the trained disease model. Hospital is honest. At the same time, an adversary from outside is curious about all transferred data in the system by eavesdropping. So privacy-preserving is critical for successfully diagnosing the patient’s disease, and security requirements of PPCD are listed as follows. UP’s Privacy: In the disease diagnosis, sensitive symptom data of UP should not be leaked to other untrusted parties during the transmission. Furthermore, the diagnosed result is confidential for the patients such that it cannot be exposed to any other entities. It means that UP’s privacy should be preserved. DP’s Privacy: Generally, DP gets some history medical information, e.g., the diagnosed disease and the confirmed symptoms data. This information is highly sensitive and cannot be got by the unauthorized entities. Otherwise, DP is unwilling to provide the history disease data for model training due to privacy concerns. Hospital’s Privacy: In PPCD, hospital trains disease model using the historical medical data with the help of the Cloud. As an asset of the hospital, the disease model cannot be leaked to UP and other parties during disease diagnosis.

Design goal

Based on the above scenarios and the security requirements, the system will realize model training and disease diagnosis in a privacy-preserving and efficient way. The particular goals are shown as follows. Privacy-preserving requirements: the flourish of Clinical decision support hinges upon information secure and privacy-preserving. If the model’s privacy requirements are not considered, the patient’s sensitive data and the disease model will be exposed to the unauthorized parties. Thus history patients are more unwilling to share their medical data to PPCD, the accuracy of the trained model is not ensured, and diagnosis service will be bad. Therefore, the system should realize the privacy of history patients and undiagnosed patients. Confidentiality and accuracy of disease model should be achieved: the disease model is a valuable asset of the hospital, which may be reluctant to reveal the information of the disease model. Simultaneously, it is crucial applying privacy-preserving can’t compromise the accuracy of predicting model.

The Proposed PPCD Model

Privacy-preserving training

This section shows how to construct PPCD, train the disease model and predict disease based on the model in a privacy-preserving way. (1) System setting Key generation: Paillier encryption algorithm is run by the hospital to generate keys for both UP and the hospital. Given the secure parameter k, choose two large prime numbers p and q randomly which satisfy |q| = |p| = k, hospital generates the pubic key (n, g) and the corresponding private key (λ, μ), where n = pq and λ = lcm(p − 1, q − 1). Data encryption: Raw medical data are encrypted and submitted to the Cloud for storage and model training. The Cloud stores the disease patterns , each of which represents a disease sample , where x is a n-dimension vector, each element represents confirmed symptom and O ∈ {−1, 1} is associated desired output, where 1 represents suffering from the disease and -1 represents not. Suppose medical data have been preprocessed, so the format of data is suitable for PPCD. In system, disease output is stored in cloud server in plaintext because leaking disease output does not damage patients’ privacy. The encrypted patients’ medical data are stored in cloud as Table 3.

Table 3

Medical data for the k-th disease.

Medical sample	Medical data	Desired output
x₁	{Cx_1,1, Cx_1,2, ⋯, Cx_1,n}	O1
x₂	{Cx_2,1, Cx_2,2, ⋯, Cx_2,n}	O₂
⋯	⋯ ⋯	⋯ ⋯
x_n	{Cx_n,1, Cx_n,2, ⋯, Cx_n,n}	O_i

Meanwhile, the disease predicting model is sensitive data which should be encrypted. At the beginning of model training, the hospital generates a random weight w = (w1, w2, ⋯, wn) and encrypts it, then sends ciphertext of the weight to the Cloud server. (2) Lightweight secure multiplication protocol SM can be used to calculate inner-product on the two encrypted vectors. Given and , is calculated by running SM for n times. To efficiently compute the inner-product of two encrypted vectors, based on SM, we propose an efficient lightweight secure multiplication (LSM) protocol which can achieve inner-product on ciphertext in one time. By considering two parties C1 and C2, LSM is detailed in Algorithm 1. Algorithm 1: Require: C1 has and ; C2 has sk Step1: C1: (1) Chooses 2n random numbers r, r, ∈ Z (2) Cr ← E(r) (3) Cr ← E(r) For each Cx and Cw (4) Xij = Cxij · Crxij (5) W = Cw · Cr; Send X, W to the C2 Step2: C2 (1) Receive X W from C1 (2) (3) (4) (5) H = E(h); sends H to C1 Step3: C1 (1) Receiving the H (2) (3) (4) (5) (3) Model training In system setting phase, DP encrypts its medical information and outsources to the Cloud. The Cloud collects some medical data where k represents the k-th disease. To train the predicting model w of the k-th disease, the Cloud selects disease samples with I to train the model. Privacy-preserving disease model training is described by Algorithm 2. Algorithm 2: Privacy-Preserving Model Training Based on SLP 1: Input: n input samples, , 1 ≤ k ≤ m, iteration, learning rate η, sign function sign(·) 2: Output: prediction model w, 1 ≤ k ≤ m 3: DP: for 1 ≤ k ≤ m do 4: for 1 ≤ i ≤ n do 5: DP encrypts symptom data as and submits to the cloud 6: Endfor 7: Endfor 8: for 1 ≤ k ≤ m do 9: Hospital: chooses initialization randomly. 10: for iteration = 1, 2, …, iterationmax 11: for 1 ≤ i ≤ n do 12: Hospital: encrypts and upload to the cloud 13: Cloud: chooses a medical sample and executes LSM to get 14: and send to the hospital 15: Hospital: decrypt R and calculation sign function Si = sign(DEC(R)) and send to the cloud. 16: Cloud: If S # O and O = 1, exp = η 17: If S # O and O = −1, exp = n − η 18: for j = 1,…d 19: 20: Cw = Cw ⋅ u 21: endfor 22: endfor 23: endfor 24: return w, 1 ≤ k ≤ m Lines 3–7: DP encrypts symptom data and submits to the cloud. Lines 8–12: The hospital randomly generates the weight in which not all elements is equal to 0 and encrypts it with own public key pk, then, send weight ciphertext {to the Cloud. Lines 13–14: In the Cloud, choose a disease sample {C, I} and 2n random numbers r, r ∈ Z, then executes LSM to compute , where the cloud server is C1, hospital is C2. Lastly send R to the hospital. Lines 15: After receiving R, teh hospital decrypts R with private key sk, and execute the sign(·) function as , then send S to cloud. Lines 16–20: The Cloud compare S with O. if S # O and O = 1, let exp = η; if S # O and O = −1, let exp = n − η. Next the Cloud updates C as , and then, update Cw as . Line 24: If the entire disease samples are matched or training count is greater than convergence criterion, hospital will terminate the training model and is seen as prediction model for D, else return and repeat lines 13–14. After getting the k-th disease model, the Cloud selects and repeats lines 8–24. After all medical sample are trained, hospital cloud get prediction models for all disease.

Disease prediction

In the phase, assuming prediction models have been trained and stored in the hospital. The hospital can predict whether a patient suffers from K-th disease using a K-th disease model. When an undiagnosed patient submits his encrypted symptoms information to the hospital, the prediction will be executed as follow. Step 1: When the ciphertext of symptoms information is arrived, the hospital decrypts the ciphertext and gets the plaintext symptoms data . Step 2: Let s = 0, for each x and w, the hospital calculates s = x · w, then gets . Step 3: Compute S = sign(s), If S > = 0, then the patient suffers from the disease, but not otherwise. Step 4: hospital encrypts the prediction result with UP’s public key and return to the patient.

Correction & security analysis

In this section, we analyze the correction and security of the proposed PPCD scheme. Notably, we focus on how PPCD achieve the privacy preserving of medical information of patient and disease model.

(1) Correctness analysis of LSM

The correctness of LSM can be illustrated as follows: In Step1: In Step2: In the Step3: From the above derivation, LSM can calculate the in a round.

(2) Correctness analysis of training model

The correctness of PPCD can be illustrated as follows: in step3, the hospital decrypts R with private key sk, and compute So s is consistent with that in Eq (1). In Step 4. The Cloud update Cw as Cw = Cw · u, where If S # O and O = 1, exp = η Then If S # O and O = −1, exp = n − η Then Thus Cw is also consistent with that in Eq (2). From the above calculation, PPCD train correct disease model in the cloud. Namely the accuracy of prediction model is satisfied.

(3) Security of patient’s medical data

To predict disease for patients, DP and UP encrypt medical information x = {x, x,…,x} with the hospital’s public key PK and upload the ciphertext Cx = {Cx, Cx,…,Cx} to the Cloud. In the process of transmission, all the medical information is encrypted to prevent outside attacker from eavesdropping. An adversary cannot decrypt the ciphertext without the hospital’s private key SK. The symptom data is encrypted by the Paillier which is semantic secure against the choose plaintext attack. So the medical information stored in the Cloud is secure since the Cloud cannot identify the corresponding contents and get the plaintext of symptom data.

(4) Security of training disease model

During training the prediction model, all the computations are done over ciphertexts. is calculated by using LSM in which each party learns nothing from the protocol. The initial model is generated by the hospital randomly and updated in the process of training over ciphertext, and the hospital’s SK is well protected. and Cw = Cw · u = E(w + ηOx) can be computed easily over ciphertext because of the additive homomorphism property of Paillier. Suppose the disease model is leaked to UP or the Cloud, they are not able to recover w, without the private key SK.

(5) Security of predicting result

When a patient wants to identity his disease, he submits the ciphertext of symptoms data to the hospital. After finishing disease prediction, diagnosis result is encrypted by UP’s public key PK and returned to UP. When an attack captures predicting result, he can’t recover the corresponding contents without DP’s private key SK.

Performance evaluation

Complexity analysis

Computational complexity

To analyze the complexity of the proposed PPCD, Table 4 illustrates the computational cost for each step. For simplicity, we use EXP to denote the time complexity of one exponentiation operation on ciphertext in the Paillier cryptosystem. Similarly, the time complexities of one multiplication operation on ciphertext and one modular inverse operation in the decryption algorithm are represented by MUL and DIV, respectively. In Step 1 of the disease learning phase, n exponents and multiplications are required by the hospital which encrypts the initial weight. In Step 2, the Cloud uses (2n+3) exponents and (4n+7) multiplications, and the hospital executes 2n exponents and 4n multiplications to obtain R. In Step 3, one exponent and one modular inverse are consumed before getting S. In Step 4, to update the weight, the Cloud does n exponents and n multiplication. At last, (n-1) multiplications, one exponent and one modular inverse are executed to predict disease risk. Then the encrypted diagnosis result is sent to UP.

Table 4

Summary of computational cost for x in PPCD.

Phase	Step	Entity	Computational cost
Disease learning	Step 1	Hospital	n(EXP+MUL)
	Step 2	Cloud	(2n+3)EXP+(4n+7)MUL
	Step 2	Hospital	2n(EXP+2MUL)
	Step 3	Hospital	EXP+DIV
	Step 4	Cloud	n(EXP+MUL)
Disease prediction	Step 1	Hospital	(n-1)MUL+EXP+DIV

Communication complexity

Assuming there are N samples with n dimensions, and the length of the ciphertext is p. In the proposed PPCD system, the encrypted symptom data are outsourced to the Cloud to train the classifier which costs O(N(np+L)). In model training, the hospital transmits the encrypted initial weight which requires O(np+L). To compute R, the cost of transferring data is O(3np+2p+L). In disease prediction, the hospital sends the encrypted predicting result to UP that costs O(np+L). The communication complexities of the proposed PPCD are detailed in Table 5.

Table 5

Summary of communication overhead in PPCD.

Phase	Step	Communication overhead
Outsourcing DP’s data		N(np+L)
Disease learning	Step 1	np+L_IK
	Step 2	2np+2p
	Step 4	np+ L_IK
Disease prediction		np+ L_IK

Experimental results

To fairly evaluate the performance, the proposed PPCD is implemented by Java on Windows 7-X64. The Cloud is a computer with Intel Quad core 3.4GHz and 16GB available RAM, the hospital runs a machine with Intel Quad core 3.4GHz and 8GB available RAM, and the patient uses a laptop with Intel Dual core 2.0GHz and 8GB available RAM.

Data sets

In the experiment, we use the Wisconsin breast cancer dataset (WBCD), the heart disease dataset (HDD) and the acute inflammations dataset (AID) from the UCI machine learning repository [23] to test the performance of SLP based on our PPCD scheme. Table 6 shows the statistical information of the employed three datasets.

Table 6

Description of the benchmark data sets.

Data sets	size	dims	#classes	attributes
WBCD	683	9	2	clump thickness; uniformity of cell size; uniformity of cell shape; marginal adhesion; single epithelial cell size; bare nuclei; bland chromatin; normal nucleoli; mitoses
HDD	297	13	2	age; sex; cp; trestbpl; chol; fbs; restecg; thalach; exang; oldpeak; slope; ca; thal
AID	120	6	2	temperature; occurrence of nausea; lumbar pain; urine pushing; micturition pains; burning of urethra, itch, swelling of urethra outlet

WBCD contains 683 instances, and each instance includes 9 attributes ranging from 1 to 10. In WBCD, each instance can be grouped into one of two possible classes: benign or malignant. HDD has 297 instances, and each instance consists of 13 attributes with two classes. Except for sex, trestbpl, chol and thalach, the other 9 attributes range from 1 to 10. AID contains 120 instances, and each instance includes 6 attributes with two decisions, i.e., inflammation of urinary bladder (IUB) and nephritis of renal pelvis origin (NRPO). Except for the temperature, the other attribute is either 1 (YES) or 0 (No). In reality, the raw medical data may be decimal. However, the Paillier can only encrypt integers. To resolve the above problem, approximation and expansion (A&E) method is adopted. Following the suggestion of [12], we adopt expanding each piece of medical data by multiplying 104, and rounding off all the values after the decimal point. For instance, x is an integer lying in (Z ∼ −Z), the item of weight w = (w1, w2, …, w) is in (Z ∼ −Z), then x are encrypted using the Pallier as follows. where Cx, Cw are the ciphertexts of x and Cw, respectively.

Results and analysis

We conduct PPCD with a predefined iteration threshold 100, and then use the classifier and three real data sets to evaluate the classifier’s performance in terms of accuracy. For each data set, the ratio of training data samples to the testing data samples is 7:3. Experimental results are detailed in Tables 7–10. Apparently, for breast cancer, the overall accuracy achieved by SLP is 96.2% while PPCD reaches 95.6%. For heart disease, SLP obtains an overall accuracy of 94.6%, and PPCD has 93.9%. On AID, SLP gets an accuracy of 93.3% for IUB while PPCD achieves a comparable result 92.5%. For NRPO in AID, accuracy for SLP is 93.3% while PPCD gets 91.7%. Actually, PPCD reaches comparable disease analysis results with that of by SLP.

Table 7

Accuracy comparisons of SLP in PD and PPCD in ED on WBCD.

Output/Target		Class 1	Class 2	Overall
SLP(PD)	Class 1	426(62.3%)	18(2.6%)	96.0%
	Class 2	8(1.2%)	231(33.8%)	96.7%
	Overall	98.2%	92.8%	96.2%
PPCD(ED)	Class 1	423(61.9%)	21(3.1%)	95.3%
	Class 2	9(1.3%)	230(33.7%)	96.2%
	Overall	97.9%	91.6%	95.6%

Table 10

Accuracy comparisons of SLP in PD and PPCD in ED for NRPO of AID.

Output/Target		Class 1	Class 2	Overall
SLP(PD)	Class 1	48(52.2%)	2(1.7%)	96.0%
	Class 2	6(3%)	64(42.4%)	91.4%
	Overall	88.9%	97%	93.3%
PPCD(ED)	Class 1	46(52.2%)	4(1.7%)	92%
	Class 2	6(4.4%)	64(41.8%)	91.4%
	Overall	88.5%	94.1%	91.7%

In terms of efficiency, Table 11 gives the runtime comparisons of PPCD on the three data sets. For Breast cancer, it takes 6.125s for history patients to encrypt all the symptoms. In the training phase, it takes 2993.1s for the Cloud to train the classifier. In the predicting phase, it takes 0.098s for the hospital to computer undiagnosed patient’s disease risk (including 0.013s for UP to encrypt all the symptoms). For Heart disease and AID, the time cost of data encryption, model training, and disease predicting are decreased as the reduction of the number of sample cases. For the sake of simplicity, multicore programming has not adopted the evaluation.

Table 11

Runtime comparisons of PPCD in ED and SLP in PD.

Dataset	Phase	PPCD(s)	SLP(s)
Breast cancer	Data encryption	6.125	---
	Model training	2993.100	0.012
	Disease predicting	0.098	0.005
Heart disease	Data encryption	3.259	----
	Model training	1860.505	0.010
	Disease predicting	0.145	0.002
AID(UIB)	Data encryption	1.564	---
	Model training	743.875	0.010
	Disease predicting	0.143	0.001
AID(NRPO)	Data encryption	1.467	---
	Model training	683.387	0.080
	Disease predicting	0.148	0.001

Note: "---" means not available.

Related work

Without sufficient storage, computation or knowledge of the clinical decision, the clients frequently prefer outsourcing their data to the Cloud for model training and disease predicting. Ledley and lusted [24] firstly proposed a clinical decision support system which can help physicians to solve diagnostic problems. Later, a large number of disease prediction system based on various data mining techniques have been presented. For example, a fast prediction disease system based on SVM was proposed by [25] to predict the risk of progression of adolescent idiopathic scoliosis. Wang et al. [26] gave a risk assessment for individuals with a family history of pancreatic cancer using Bayesian classification. By introducing SVM, Huang et al. [27] designed a prediction model for breast cancer diagnosis while Barakat et al. [28] focused on the diagnosis of diabetes mellitus. For heart disease analysis, Anooj et al. [29] tried to use specific fuzzy rules. Though various prediction models have been developed, privacy protection of patients medical information fails to take into account which will impede the more progress of CDSS. To address this challenge, some secure disease prediction [1], [7], [8], [9], [11], [12], [14] which diagnose patients’ disease without leaking medical data and prediction model have been widely studied. Wang et al. [14] proposed a Healer framework based on somewhat homomorphic encryption. It uses a small samples size to facilitate secure rare variants analysis and obtains the final results by decrypting ciphertexts in the trusted party. A privacy-preserving CDSS on Naïve Bayesian Classification was proposed by Liu et al. [5] which can help a clinician to diagnose the risk of patients’ disease in a privacy-preserving way. Wang et al. [9] proposed a secure SLP learning model for e-Healthcare, but it can only protect the privacy of patients’ medical information, the disease model isn’t protected. In [11], Zhu et al. proposed an efficient and privacy-preserving medical pre-diagnosis framework using SVM which can protect the sensitive personal health information without privacy disclosure with lightweight multi-party random masking and polynomial. Recently, Tsung et al. [30] proposed a decentralized privacy-preserving healthcare predictive modeling framework on private Blockchain networks, in which privacy-preserving online machine learning is integrated with a private Blockchain network, apply transaction metadata to disseminate partial models, and design a new proof-of-information algorithm to determine the order of the online learning process, Each participating site contributes to model parameter estimation without revealing any patient health information. Zhang et al. [1] proposed a secure disease prediction scheme based on matrices and SLP which builds on new medical data encryption, disease learning, and disease prediction algorithms that utilizes random matrices. Liu et al. [7] proposed a Hybrid privacy-preserving clinical decision support system in fog–cloud computing, in which a fog server uses SLP to securely monitor patients’ health condition in real-time, The newly detected abnormal symptoms can be further sent to the cloud server for high-accuracy prediction in a privacy-preserving way. Compared with some sophisticated machine learning algorithms such as Naïve Bayesian, SVM, and deep learning classification, SLP is efficient and straightforward.

Conclusions

In this paper, we proposed a privacy-preserving disease predicting system based SLP which can help physicians make a proper diagnosis of disease and provide health services for patients anytime anywhere in a privacy-preserving way. In PPCD, DP’s historical medical data are used to train SLP in ED, and the hospital uses the trained model to predict diseases for a UP. Towards easing the privacy concerns from DP, we suggest an additively homomorphic encryption also for simplicity and generality. Inevitable multiplications of SLP motivate us introducing LSM into PPCD. Then users’ medical information and the trained model are secret to the cloud. Compared with SLP, comparable results reached by PPCD suggest that sacrificing data precision to improve efficiency is feasible in practical use. Although PPCD benefits privacy-preserving diagnosis, the balance between security and efficiency should be considered firstly. Therefore, how to optimize the model training using mini-batch for efficiency improvement and finding an effective way of introducing some other advanced machine learning methods to build the privacy-preserving disease prediction system are worthy of investigation.

Table 8

Accuracy comparisons of SLP in PD and PPCD in ED on HDD.

Output/Target		Class 1	Class 2	Overall
SLP(PD)	Class 1	155(52.2%)	5(1.7%)	96.9%
	Class 2	11(3.7%)	126(42.4%)	92.0%
	Overall	93.4%	96.2%	94.6%
PPCD(ED)	Class 1	155(52.2%)	5(1.7%)	96.9%
	Class 2	13(4.4%)	124(41.8%)	90.5%
	Overall	92.3%	96.1%	93.9%

Table 9

Accuracy comparisons of SLP in PD and PPCD in ED for IUB of AID.

Output/Target		Class 1	Class 2	Overall
SLP(PD)	Class 1	57(47.5%)	2(1.7%)	96.7%
	Class 2	6(5%)	55(45.8%)	90.2%
	Overall	90.5%	96.5%	93.3%
PPCD(ED)	Class 1	55(45.8%)	4(3.3%)	93.2%
	Class 2	5(4.2%)	56(46.7%)	91.8%
	Overall	91.7%	93.3%	92.5%

12 in total

1. Privacy-preserving clinical decision support system using Gaussian kernel-based classification.

Authors: Yogachandran Rahulamathavan; Suresh Veluru; Raphael C-W Phan; Jonathon A Chambers; Muttukrishnan Rajarajan
Journal: IEEE J Biomed Health Inform Date: 2014-01 Impact factor: 5.772

2. Efficient and Privacy-Preserving Online Medical Prediagnosis Framework Using Nonlinear SVM.

Authors: Hui Zhu; Xiaoxia Liu; Rongxing Lu; Hui Li
Journal: IEEE J Biomed Health Inform Date: 2016-03-29 Impact factor: 5.772

3. Privacy-Preserving Patient-Centric Clinical Decision Support System on Naïve Bayesian Classification.

Authors: Ximeng Liu; Rongxing Lu; Jianfeng Ma; Le Chen; Baodong Qin
Journal: IEEE J Biomed Health Inform Date: 2016-03 Impact factor: 5.772

Review 4. Computer-assisted decision support for the diagnosis and treatment of infectious diseases in intensive care units.

Authors: C A M Schurink; P J F Lucas; I M Hoepelman; M J M Bonten
Journal: Lancet Infect Dis Date: 2005-05 Impact factor: 25.071