Literature DB >> 35913921

Secure deep learning for distributed data against maliciouscentral server.

Abstract

In this paper, we propose a secure system for performing deep learning with distributed trainers connected to a central parameter server. Our system has the following two distinct features: (1) the distributed trainers can detect malicious activities in the server; (2) the distributed trainers can perform both vertical and horizontal neural network training. In the experiments, we apply our system to medical data including magnetic resonance and X-ray images and obtain approximate or even better area-under-the-curve scores when compared to the existing scores.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35913921 PMCID： PMC9342767 DOI： 10.1371/journal.pone.0272423

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

1 Introduction

Deep learning is a set of machine learning techniques that has gained much interest in recent years, owing to its potential applications in many fields. In the field of medicine, deep learning applications such as the classification of skin cancer [1], detection of diabetic retinopathy [2], and detection of pneumonia [3] and COVID-19 [4] using chest X-rays, have recently shown considerable potential to improve the quality of healthcare for patients worldwide. Medical data are by nature vastly distributed, as they often originate from several hospitals and medical centers throughout the world. It is reported in [5] that thousands of exabytes (1018 bytes) of medical data will be generated. To accelerate the availability and accuracy of deep learning in healthcare through the use of this kind of big data, deep learning systems using distributed data should be designed and examined. However, because medical data are personal and sensitive, serious attention should be paid to securely protect these data. Pioneering efforts for designing deep learning systems with distributed data have been in the works of Recht et al. [6] and Dean et al. [7], whose systems are then considered and expanded in the security and privacy domain by Shokri and Shmatikov [8]. All of these works assume a central parameter server that is responsible for updating the parameters of the neural network model in the deep learning systems. Specifically, in [6, 7], the server is assumed to be completely honest in operation, whereas in [8] the server is semi-honest (i.e. honest-but-curious) which signifies that it is honest in operation but curious in extracting sensitive information. Subsequent works with security improvements include [9, 10] which also assume a semi-honest parameter server. The systems of federated learning in [11, 12] have been experimentally demonstrated to be able to handle non independent and identically distributed (non-iid) data experimentally; however the systems are not for vertical training due to the average calculation of neural network weights over the central parameter server.

1.1 Our contributions

In this work, we propose a secure system for deep learning using distributed datasets. Our system has the following features: Detection of malicious activities in the parameter server: Our system is designed so that a malicious parameter server is detected with overwhelming probability. This is made possible by a novel use of authenticated encryption, in which the encryption part protects communication secrecy whereas the authentication part detects any changes in the communication. Both vertical and horizontal training: Our system can handle both vertical and horizontal training by design. For vertical training, we mean a training model produced by one trainer on a dataset can be re-used by another trainer on an entirely different dataset after proper modifications. For horizontal training, we mean a shared model is trained in a distributed manner using local datasets of the trainers. It is also worth noting that, besides improving the utility of the system, the combination of vertical and horizontal model training can produce robustness with respect to noisy labels as discussed in Section 3.3. Experimentation with distributed medical data: On chest X-ray images [13] and magnetic resonance imaging (MRI) images [14, 15], we demonstrate that our securely distributed system either approximates or outperforms existing results in the literature in which the data had to be centralized. Indeed, as showed in Table 1, the learning utility scores of our system in terms of area-under-the-curve (AUC) are very close to (or better than) the best known scores in non-distributed (centralized) training, as seen in Table 1. More detailed comparisons of AUC scores are given in Tables 3 and 4, again showing that the AUC scores of our system are very similar and in some cases superior to the best known scores.

Table 1

Comparison of area-under-the-curve (AUC) scores.

Learning utility AUC score (→) Dataset (↓)	Our system (securely distributed)	Best known (centralized)
MRI (Stanford, Croatia)	0.924	0.911
X-ray (ChestX-ray14)	0.839	0.841

1.2 Related works

All existing systems in [8-12] cannot detect malicious activity of the parameter server due to the fact that plaintexts or malleable ciphertexts are directly handled by the server. For example, in [9], a homomorphic ciphertext of a gradient vector G encrypted by a homomorphic encryption scheme is sent to the parameter server. If the server is malicious, it can modify that ciphertext by the following homomorphic calculation where ϵ is a vector intentionally selected by the server. In turn, the distributed trainers obtain G + ϵ instead of G without noticing, which is undesirable. Likewise, the system in [10] cannot detect malicious activities in the server due to the use of symmetric malleable encryption such as the Cipher Block Chaining (CBC) mode with the Advanced Encryption Standard algorithm. Indeed, the encryption of a vector W in the first block of the CBC mode by a symmetric key K is of the form and the malicious server can modify that ciphertext into where r is selected by the server. The decryption of the modified ciphertext is which is obtained by a distributed trainer instead of W without any awareness. This subsequently affects other blocks in the decryption and the entire training process, and a distributed trainer cannot identify whether the malfunctions originate from the server or other trainers. The proposed system in this paper extends [10] in the following directions of both security and learning utility: (1) we introduce authenticated encryption into the system to handle the malicious parameter server; (2) we make vertical training possible at each distributed trainer; (3) we perform experiments on medical imaging data to demonstrate the learning utility of the system in terms of AUC scores. However, it significantly deviates from [9] which is based on [7] (whose system is later restructured into TensorFlow [16].) Techniques for differential privacy [17-21] or anonymous transmission [22] can be used locally at each distributed trainer in our system to protect the privacy or the origin of the transmitted weights. Likewise, each distributed trainer can deploy preventive measures such as in [23] if necessary. These techniques are useful for protecting the weight privacy to the greatest extent possible while maintaining the learning utility of the system. It is also worth noting that requiring the weight to contain no information on the data can be fulfilled if each trainer continues transmitting uniformly random weights; however this kind of “perfectly private” system has no learning utility at all. Requiring the neural network weight sent from an honest trainer to contain no information on the data, while maintaining the learning utility of the system, is impossible in the setting of collaborative training [24]. Model weight inversion attacks such as in [25, 26] have limited impacts and they do not necessarily entail a privacy breach as discussed in [27]. Similarly, the use of generative adversarial networks for attack on collaborative training systems [28] has been reported to be unrealistic in [29]. In addition, it is known in the literature that attacks on neural network weights are apparently more difficult than neural network gradients, on which various attacks and corresponding defenses exist (e.g. [9, 28, 30, 31]). In contrast, weights can be viewed as a large aggregation of gradients and are thus more resistant to attacks as observed in [10, 32]. Secure linear models have been studied in several works [33-38] with threat models ranging from semi-honest to malicious adversaries. For example, the system in Zheng et al. [34] utilizes threshold homomorphic encryption, zero knowledge proofs, and malicious multi-party computation to deal with malicious adversaries. Aiming to achieve both secrecy and differential privacy, Aono et al. [39, 40] have designed systems for privacy-preserving linear and logistic regression, in which a semi-honest central server is used to handle homomorphic ciphertexts. Semantic security with homomorphism allows their system to achieve data secrecy (with respect to the central server) and differential privacy (with respect to publishing the final result) simultaneously. However, their technique of polynomial approximation of non-linear functions as in [40] appears to have limitations when applied to deep neural networks with multiple layers. Using two non-colluding servers on the cloud, Mohassel and Zhang [41] have proposed protocols for privacy-preserving linear regression, logistic regression, and multilayer perceptron in which secure-computation-friendly activation functions are employed. Subsequently, Mohassel and Rindal have also considered a three-server model in [42], in which data owners secretly share their data among three servers that train and evaluate models on the combined datasets using three-party computation. Several works [43-48], especially in the framework of secure outsourced computation, have examined the problem of secure neural network prediction in which predicted probabilities for individual data items can be obtained in a secure manner. This vein of research on secure prediction is orthogonal to the topic in this paper which focuses on securely distributed training. Chang et al. [49] have proposed a system for distributed deep learning and experimented with medical datasets, without a central parameter server. The system and the experiments are designed for horizontal training. Gupta and Raskar [50] have designed a method for distributed training where pieces of information such as data labels and neural network gradients are transmitted among distributed trainers. McClure et al. [51] have considered distributed training with a specific neural network only. These works have no explicit security considerations. Various machine learning algorithms involving multiple parties can be securely operated over completely trusted hardware. However, even in such setting, care should still be taken to guard the algorithms from memory access patterns that depend on data, as examined in [52]. Techniques for federated learning (e.g., [12, 53]) and subsequent works (e.g., [54-58]) can be used for distributed data, but they do not consider malicious central server as in our setting. Generic secure multiparty computation (MPC) using secret sharing [59, 60] can securely compute any function represented as arithmetic circuits. The known weakness of such protocols is in the communication costs [11]. To address the issue, a dedicated protocol for secure aggregation in federated learning has been also proposed in [11]. In works such as [11] or subsequent [61], the server learns the full or partial sum of the trainers’ inputs; which is orthogonal to our work in which the server cannot learn that kind of information. Works combining differential privacy with MPC (e.g., [62]), often admitting accuracy degradation due to noise addition, are also orthogonal to our work.

2 Preliminaries

We recall a few preliminaries on cryptography and machine learning in this section.

Authenticated encryption

Symmetric encryption schemes consist of the following (possibly probabilistic) polynomial-time algorithms: takes a security parameter κ and generates secret key K; , also written as , produces c which is the ciphertext of message m; and or returns message m encrypted in c. The security notion of ciphertext integrity (INT-CTXT) [63] requires that it be computationally infeasible to produce a ciphertext not previously produced by the holder of key K. In addition, ciphertext indistinguishability against chosen plaintext attacks (IND-CPA) ensures that no information is leaked from ciphertexts. Our system employs symmetric encryption with both ciphertext integrity and ciphertext indistinguishability. A generic construction that achieves INT-CTXT and IND-CPA simultaneously is the composition of encrypt-then-mac, where mac refers to message authentication code. Namely, an authenticated encryption scheme can be constructed as follows where where is an encryption algorithm in an IND-CPA-secure symmetric encryption scheme, and is a message authentication code. The keys for encryption and authentication K and K must be independent and generated uniformly at random by the key generation algorithm . It has been proved in [63] that when the message authentication code is strongly unforgeable then the encrypt-then-mac composition satisfies both INT-CTXT and IND-CPA notions of security. It should be noted that a weaker notion of integrity called plaintext integrity in [63] can also be used if one only needs to determine whether the plaintext (i.e. neural network weight) inside the ciphertext has been modified. This weaker notion of integrity leads to broader compositions of cryptographic primitives for authenticated encryption that can be used in our proposed system.

Neural networks

In each distributed trainer is a neural network. The neuron (including the bias) nodes are connected via weight variables W, which can be considered a real vector. In a deep learning neural network structure, there can be multiple layers each containing thousands of neurons. Each neuron node (except for the bias node) is associated with an activation function f. Typical examples of f can be f(x) = max{0, x} (rectified linear), and (sigmoid). The nonlinearity of these activation functions is important for the network to learn complex data distributions. Given a training dataset, the learning task is to determine these weight variables to minimize a predefined cost function such as the cross-entropy cost function detailed later in each experiment in Section 4.

3 Our system

3.1 System description

The proposed system makes use of an authenticated encryption scheme (, , ) with ciphertext integrity [63], where κ is a security parameter and K is a symmetric key generated by the key generation algorithm . A figurative and algorithmic illustration is provided in Fig 1. The system follows and generalizes the system in [10] in the following ways: (1) we introduce authenticated encryption into the system to address a malicious parameter server; (2) we insert vertical training into the distributed trainers so that they can handle more types of datasets.

Fig 1

Our system of deep learning for both horizontal and vertical training that can detect malicious activities in the server.

Below, M ∘ M denotes a modification of model M by distributed trainer i. Specifically, if M has a inputs and b outputs, and M has b inputs and c outputs, then M ∘ M is the composed model of a inputs and c outputs. Mathematically, if and then as the composed model. A visualization is given in the neural network model of trainers in Fig 1. Initialization (for all distributed trainers): common model and cryptographic key setup. This can be done by Trainer 0 in the system. Generate a common neural network model M. Generate a cryptographically symmetric key K using . Share model M and key K to all distributed trainers (but not the parameter server) via a secure channel. Central parameter server: When receiving ciphertext E from a distributed trainer, store it. When receiving a request from a distributed trainer, send E to that trainer. It is also possible that the server decides which trainer to send E. Initially, when E does not exist, send ⊥. If E has been sent, wait for the encrypted post-trained weight from the requested trainer. Each distributed trainer Generate neural network model M ∘ M where M is a model generated by trainer i. Obtain encrypted weight E from the central parameter server. If E = ⊥, initialize the weight for M ∘ M. If E ≠ ⊥, decrypt E to obtain the pre-trained weight W ∘ V, namely . Starting from W ∘ V, train the neural network model M ∘ M using the local data of trainer i to obtain the post-trained weight W′ ∘ V′. Set W ∘ V ← W′ ∘ V′. Send the encrypted to the central parameter server. Particular usage of our system: Vertical then horizontal training. The initial trainer (e.g., Trainer 0) queries the server and obtains E = ⊥, so it proceeds to train M ∘ M0 where M0 is null so that the model M ∘ M0 = M, namely the model is unchanged. Therefore Trainer 0 initializes weight W and trains model M with W on its data to obtain the post-trained weight W′. Trainer 0 sets W ← W′ and sends to the central parameter server. The next trainer (e.g. Trainer 1) queries the server and gets . It then decrypts the ciphertext and obtains weight W as the pre-trained weight for model M. Trainer 1 generates M1 and composes M ∘ M1. It then initializes V and feeds W ∘ V into M ∘ M1. Starting from W ∘ V, Trainer 1 trains model M ∘ M1 using its local data to obtain W′ ∘ V′. It then sets W ∘ V ← W′ ∘ V′ and sends to the central parameter server so that other trainers can continue the training process. Other trainers (Trainers 2, …, N) behave similarly to Trainer 1.

3.2 Security considerations for our system

Our system has a stronger security guarantee against the central parameter server than previous systems in [9, 10]. Details are provided below. Detecting a malicious server by any trainer. By a malicious server, we mean a server interested in extracting information about the data of the trainers. To accomplish that goal of information extraction, the server may even try to modify the incoming ciphertext before sending it to another trainer. In our system, if the central parameter server maliciously modifies ciphertexts uploaded by the trainers, the trainers can detect the malicious activity. This is by design; because the ciphertexts have integrity, thus it is computationally infeasible to produce a ciphertext not previously produced by the trainers. Specifically, if the mode of encrypt-then-mac [63] is used, then the message authentication code (e.g. HMAC [64]) can detect whether a ciphertext has been changed or not. Let K = (K, K) consist of the keys for symmetric encryption K and message authentication code K. As described in Section 2, for the weight vector W ∘ V. As a result, any change to C and weight vector W ∘ V can be detected by the distributed trainers with the common authentication key K of the . Security for a trainer against malicious trainers and server, and their collusion. This scenario is identical to that in [10] black (Section IV); thus our proposed system inherits the security results in [10]. In particular, our system ensures security in terms of onewayness for any honest trainer to the greatest extent possible. As mentioned in Section 1.2, requiring that the neural network weight sent from an honest trainer possesses no information on the data, while maintaining the learning utility of the system, is infeasible in the setting of collabo-rative training [24]. Regarding this point, various defenses have been discussed in [10], including the use of differential privacy (e.g. [20]), anonymous transmission (e.g. [22]), and adversarial regularization [23] to protect the weight (and its origin) of the honest trainer. The honest trainer does not send individual gradients of small batch sizes; thus it can resist attacks on gradients such as in [9, 28, 30, 31]. It is also worth remarking that, if a malicious trainer injects noise into the training process, then the noise can also be manually detected by an honest trainer by locally observing training indicators such as training loss and AUC scores. Nonetheless, it should be noted that our system is in the cross-silo scenario, in which trainers are large organizations such as medical or financial institutions with certain responsibilities required by regulations. Therefore, we expect that the case of malicious trainers (and server), and their collusion, is less likely to happen than the case of a malicious server alone.

3.3 Learning utility robustness via vertical training

The vertical training of Trainer 0 on a dataset with clean labels can improve model robustness against noisy labels of subsequent trainers. This is because deep neural networks have the ability to memorize patterns in the initial epochs (i.e. the vertical training phase in our context) as observed in [65, 66]. This is particularly true in our experiments in which Trainer 0 employs the ImageNet (http://www.image-net.org/) dataset, and other trainers use medical datasets that may have a portion (e.g. approximately 10% in the ChestX-ray14 dataset [13]) of inaccurate labels due to the process of automatic labeling from texts via natural language processing. To the best of our knowledge, this property of learning utility robustness has not been achieved in previous works [8-12] whose systems assume that labels are clean and accurate.

4 Experiments with medical data

All experiments employ a machine with Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz and GPU NVIDIA P-100; with Python 3.7.2 distributed in Anaconda 4.5.11. We assume a standard 1 Gbps channel between the trainers and the server. For authenticated encryption, the encrypt-then-mac method [63] is employed in which AES-256-CBC encryption is for the encryption part and HMAC-SHA512 (in OpenSSL 1.1.1a) is for the message authentication code part. Using more dedicated modes or hardware for authenticated encryption can improve the speed of encryption and decryption.

4.1 Experiment with MRI datasets

Trainers and datasets. In this experiment we suppose 3 distributed trainers: Vertical trainer 0 with the ImageNet dataset, Horizontal trainer 1 with an MRI dataset collected from Stanford University Medical Center [14], and Horizontal trainer 2 with an MRI dataset from Clinical Hospital Centre Rijeka (Croatia) [15]. The MRI dataset from Stanford contains 1130 exams, of which 208 exams have anterior cruciate ligament (ACL) tear, whereas the others (1130−208 = 922) do not. In addition, while each exam contains various types of images, only sagittal ones are compatible with the sagittal images from Croatia, and thus selected for distributed training in our system. The Croatia training dataset contains 552 exams with sagittal series of images, of which 139 has label 1 and 413 has label 0. The Croatia validation dataset contains 38 exams of label 1 and 143 exams of label 0. The test dataset has 50 exams of label 1 and 134 exams of label 0. These distributions of labels are summarized in Table 2.

Table 2

Label distribution in MRI datasets.

	Label 1 quantity	Label 0 quantity
Stanford dataset [14]	208	922
Croatia dataset [15]	139	413

Neural network model. Following [14], we employ AlexNet [67] as the base neural network model in our system. Trainer 0 trains AlexNet using the ImageNet dataset. The trained weight from Trainer 0 is sent securely to Trainer 1 via the central parameter server. Trainer 1 and 2 modify M as follows: each MRI series of images s × 3 × 224 × 224 is passed through a feature extractor based on AlexNet (= M) to obtain a s × 256 × 6 × 6 tensor; a global average pooling layer and max pooling are then applied sequentially to reduce that tensor to s × 256 tensor and 256 real numbers respectively; the last layer has one node fully connected with 256 nodes of the previous layer. These neural network models of Trainer 1 and 2, denoted as M ∘ M1 and M ∘ M2, contains 61,101,097 trainable parameters, having an approximate size of 234 MB when saved to disk. Authenticated encryption of model weights. We use encrypt-then-mac method which is proved to be authenticated encryption [63], whose running time is less than 3 seconds when applied on a model weight of size 234 MB. The ciphertext is also of 234 MB when saved to disk, and needs less than 3 seconds to be transmitted to the central parameter server. It is worth noting that the running times of encrypt-then-mac (3 seconds) and encrypted weight transmission (3 seconds) are relatively small when compared with the time for training (feedforward and backpropagation on GPU), which are approximately 13 seconds for one epoch on the Croatia training dataset, and 39 seconds for one epoch on the Stanford training dataset. Loss function for training. Trainers 1 and 2 use the same loss function of binary cross entropy with weights depending on the number of labels in the (joint) training set. More precisely, for a single data item (X, y) in the training set, the loss function is defined as in which because 139 (out of 552) and 208 (out of 1130) are the numbers of 1 in the Croatia and Stanford datasets respectively. Training details. Trainer 1 and 2 follow the training procedure in [14] to optimize the above loss function for 20 central epochs. Each trainer executes at most two local epochs on its data before encrypting and sending the trained weight. For example the order of training can be: (1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2) in which (1, 1) or (2, 2) means the trainer performs the training for 2 subsequent local epochs; and does only 1 local epoch in the case of (1, 2) or (2, 1). The Adam optimizer is used with an initial learning rate of 10−5, weight decay of 10−2 at each trainer, as in [14]. The learning rate is reduced on a plateau after 5 central epochs with a factor of 0.3. The trainers save and test every checkpoint of the model on the test dataset of Croatia. If the validation dataset of Croatia can be shared among the trainers, they can only save the checkpoint with the smallest validation loss to save disk space, if necessary. The AUC scores on the Croatia test set are given in Table 3. The scores demonstrate that our system outperforms previous results, which confirms the merits of greater quantities of data when using a deep learning approach. Additional experiments have also been done with Adam variants (AMSGrad [68], AdamX [69]), yielding similar AUC scores approximately 0.924 and all are superior to the previous best AUC score of 0.911 on the Croatia test set. The entire training time of our proposed system is less than 20 minutes, and the communication (including upload and download) of the encrypted weight from each distributed trainer with the server is approximately .

Table 3

Area-under-the-curve (AUC) scores of learning methods on MRI datasets.

Paper	Method	AUC score
Stajduhar et al. [15]	Support Vector Machine	0.894
Bien et al. [14]	Neural Network	0.824
Bien et al. [14]	Neural Network	0.911
This work (our system)	Neural Network	0.924

4.2 Experiment with ChestX-ray14 dataset

ChestX-ray14 dataset and its partition. The ChestX-ray14 dataset [13] contains 112,120 frontal-view chest X-ray images individually labeled with 14 different thoracic diseases: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, Hernia. Following previous works [3, 13, 70, 71], this dataset is split into three partitions of training, validation, and test datasets with a ratio of 70:10:20 approximately. The number of images are 78468 (of 21528 patients), 11219 (of 3090 patients), 22433 (of 6187 patients) respectively in the datasets. There is no patient overlap between the sets. Let us set the number of distributed trainers to N = 5, for concrete discussion. Trainer 0 has the ImageNet dataset. Other trainers (1, …, 4) possess approximately 78468/4 images from the training dataset described above. Neural network models. Vertical trainer 0 trains DenseNet-121 [72] as the model M with the ImageNet dataset. In addition, horizontal trainers (1, …, 4) utilize the code given in [71] which also employs DenseNet-121 as the common neural network model M. However, the last layer of DenseNet-121 of 1000 neural nodes (for ImageNet) is replaced by 14 nodes equipped with sigmoid nonlinearity as described in Section 2, corresponding to the predicted probabilities of the 14 thoracic diseases listed above. These replacements in DenseNet-121 form the neural network models M ∘ M1, M ∘ M2, M ∘ M3, M ∘ M4 of the 4 distributed trainers. These models have 6,968,206 trainable parameters, of size 28 MB when saved to disk. Authenticated encryption of weights. We use encrypt-then-mac method which is proved to be authenticated encryption [63], whose running time is less than 0.2 seconds when applied to a model weight of 28 MB. The ciphertext is also of 28 MB when saved to disk, and needs less than 1 second to be transmitted to the central parameter server. It is worth noting that the running time of encrypt-then-mac (0.2 seconds) and encrypted weight transmission (1 second) is negligible when compared with the time for training (feedforward and backpropagation on GPU) of approximately 60 seconds. Therefore, the overhead added by cryptographic operations and communications can be very small. Early sharing for improved accuracy. Because each trainer has unbalanced classes, and the data are not independent and not identically distributed (non-iid), each trainer decides not to train on its entire local dataset but train on a part of the dataset before sending out the weight. This helps improve accuracy because the trained weight is expectedly not biased toward a particular local dataset. In particular, each trainer in our system uniformly at random splits its local data into 20 parts (each of which has approximately 78468/(4 × 20) = 980 images), trains the neural network on a partition each time and sends the weight out after one pass over that partition. Loss function for training. The distributed Trainers 1, …, 4 use the same loss function of binary cross entropy. More precisely, for a single data item (X, y) in the training set, the loss function is defined as where y = (y1, …, y14) is a label, Pr[Y = 1|X] is the predicted probability that the image contains pathology c given X, and Pr[Y = 0|X] is the predicted probability that the image does not contain pathology c given X. Training details. Each trainer uses a batch size of 8 images, selects a random parti-tion of 980 images of its local data, and makes one pass (of feedforward and backpropagation) over that partition, which requires approximately 60 seconds. Each image is downscaled to a size of 224 × 224, and normalized based on the mean ([0.485, 0.456, 0.406]) and standard deviation ([0.229, 0.224, 0.225]) of images in the ImageNet training set. Data augmentation is applied as follows: each image is horizontally flipped and randomly rotated by at most 45∘. The stochastic gradient descent (SGD) optimizer is used with a momentum of 0.9, initial learning rate of 0.01, and weight decay of 10−4. The number of central epochs is set to 15, which is the number of times passing through all (i.e. 78468) original training images. The learning rate in each central epoch is decreased by the following rule, for , where is the integer part of . Our system with the above distributed trainers produces an AUC score of 0.8397, which is smaller than that in [3] but larger than those in [13, 70, 71] as reported in Table 4. It should be noted that our AUC score is based on distributed training while the others are based only on centralized training. The total running time and ciphertext communication of our system are approximately 20 hours and 34 GB for 15 central epochs.

Table 4

Area-under-the-curve (AUC) scores of learning methods on ChestX-ray14.

	Wang et al. [13]	Yao et al. [70]	Zech [71]	Our system	CheXNet [3]
Atelectasis	0.716	0.772	0.8161	0.8176	0.8094
Cardiomegaly	0.807	0.904	0.9105	0.9143	0.9248
Effusion	0.784	0.859	0.8839	0.8842	0.8638
Infiltration	0.609	0.695	0.7077	0.7098	0.7345
Mass	0.706	0.792	0.8308	0.8494	0.8676
Nodule	0.671	0.717	0.7748	0.7829	0.7802
Pneumonia	0.633	0.713	0.7651	0.7675	0.7680
Pneumothorax	0.806	0.841	0.8739	0.8762	0.8887
Consolidation	0.708	0.788	0.8008	0.8077	0.7901
Edema	0.835	0.882	0.8979	0.8931	0.8878
Emphysema	0.815	0.829	0.9227	0.9340	0.9371
Fibrosis	0.769	0.767	0.8293	0.8258	0.8047
Pleural Thickening	0.708	0.765	0.7860	0.7851	0.8062
Hernia	0.767	0.914	0.9010	0.9087	0.9164
Average	0.7381	0.8027	0.8358	0.8397	0.8414
Securely distributed training?	no	no	no	yes	no

5 Conclusion

In this paper, we design a secure system for distributed learning with the following features: (1) distributed trainers can detect malicious activities in the server via authenticated encryption; (2) distributed trainers can perform both vertical and horizontal neural network training. We conduct experiments with datasets of MRI and X-ray images and obtain promising AUC scores for our proposed system when training with the datasets.

9 in total

1. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors: Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal: JAMA Date: 2016-12-13 Impact factor: 56.272

2. Secure Outsourced Matrix Computation and Application to Neural Networks.

Authors: Xiaoqian Jiang; Miran Kim; Kristin Lauter; Yongsoo Song
Journal: Conf Comput Commun Secur Date: 2018-10

3. Semi-automated detection of anterior cruciate ligament injury from MRI.

Authors: Ivan Štajduhar; Mihaela Mamula; Damir Miletić; Gözde Ünal
Journal: Comput Methods Programs Biomed Date: 2016-12-15 Impact factor: 5.428

4. Dermatologist-level classification of skin cancer with deep neural networks.

Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun
Journal: Nature Date: 2017-01-25 Impact factor: 49.962

5. Distributed Weight Consolidation: A Brain Segmentation Case Study.

Authors: Patrick McClure; Jakub R Kaczmarzyk; Satrajit S Ghosh; Peter Bandettini; Charles Y Zheng; John A Lee; Dylan Nielson; Francisco Pereira
Journal: Adv Neural Inf Process Syst Date: 2018-12

6. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing.

Authors: Matthew Fredrikson; Eric Lantz; Somesh Jha; Simon Lin; David Page; Thomas Ristenpart
Journal: Proc USENIX Secur Symp Date: 2014-08

7. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet.

Authors: Nicholas Bien; Pranav Rajpurkar; Robyn L Ball; Jeremy Irvin; Allison Park; Erik Jones; Michael Bereket; Bhavik N Patel; Kristen W Yeom; Katie Shpanskaya; Safwan Halabi; Evan Zucker; Gary Fanton; Derek F Amanatullah; Christopher F Beaulieu; Geoffrey M Riley; Russell J Stewart; Francis G Blankenberg; David B Larson; Ricky H Jones; Curtis P Langlotz; Andrew Y Ng; Matthew P Lungren
Journal: PLoS Med Date: 2018-11-27 Impact factor: 11.069

8. Distributed deep learning networks among institutions for medical imaging.

Authors: Ken Chang; Niranjan Balachandar; Carson Lam; Darvin Yi; James Brown; Andrew Beers; Bruce Rosen; Daniel L Rubin; Jayashree Kalpathy-Cramer
Journal: J Am Med Inform Assoc Date: 2018-08-01 Impact factor: 7.942

9. Automated detection of COVID-19 through convolutional neural network using chest x-ray images.

Authors: Rubina Sarki; Khandakar Ahmed; Hua Wang; Yanchun Zhang; Kate Wang
Journal: PLoS One Date: 2022-01-21 Impact factor: 3.240

9 in total