| Literature DB >> 35121774 |
Mohammed Adnan1,2,3, Shivam Kalra1,2, Jesse C Cresswell4, Graham W Taylor2,3, Hamid R Tizhoosh5,6,7.
Abstract
The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.Entities:
Year: 2022 PMID: 35121774 PMCID: PMC8816913 DOI: 10.1038/s41598-022-05539-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The proposed federated learning algorithm to train a MEM model[33] for WSIs (disease) classification among multiple hospitals. Each client in FL is represented by a blue rectangle. Each client, first transforms their local WSIs into mosaics (sets of representative patches). The patches in each mosaic are converted to feature vectors using a DenseNet model[34]. Finally the sets of feature vectors are classified using a MEM model. A shared central MEM model is trained using FedAvg[6] among multiple clients (mimicking hospitals). Furthermore, DP-SGD[22] is used for training the central MEM model with strict privacy bounds.
Figure 2Illustration of a sample WSI and its mosaic extracted using the approach in Kalra et al.[35].
Figure 3Schematic of a MEM model used for the classification of WSI mosaics. X is an input sequence containing a number n of f -dimensional vectors. (a) The memory block is a sequence-to-sequence model that takes X and returns another sequence . The output is a permutation-invariant representation of X. A bijective transformation model (an autoencoder) converts the input X to a permutation-equivariant sequence C. The weighted sum of C is computed over different probability distributions p from memory units. The hyper-parameters of a memory block are (1) the dimensions of the bijective transformation h, and (2) the number of memory units m. (b) The memory unit has A, a trainable embedding matrix that transforms elements of X to a d-dimensional space (memories). The output p is a probability distribution over the input X, also known as attention. The memory unit has a single hyper-parameter d, i.e. the dimension of the embedding space[33] (* represents learnable parameters).
Source hospitals for test/train and external dataset and their data distribution.
| Dataset type | Source hospital (clients) | LUAD images | LUSC images | Total |
|---|---|---|---|---|
| Train/test | International Genomics Consortium | 189 | 78 | 267 |
| Indivumed | 94 | 117 | 211 | |
| Asterand | 90 | 117 | 207 | |
| Johns Hopkins | 121 | 78 | 199 | |
| External | Christiana Healthcare | 169 | 54 | 223 |
| Roswell Park | 35 | 75 | 110 | |
| Princess Margaret Hospital (Canada) | 0 | 52 | 52 |
Evaluation on different data distributions.
| Data distribution | Number of clients | Accuracy | ||
|---|---|---|---|---|
| Without FL | With FL | Centralized | ||
| IID | 4 | 0 | 0 | 0 |
| 8 | 0 | 0 | ||
| 16 | 0 | 0 | ||
| 32 | 0 | 0 | ||
| Non IID | 4 | 0 | 0 | 0 |
| 8 | 0 | 0 | ||
| 16 | 0 | 0 | ||
| 32 | 0 | 0 | ||
Centralized accuracy denotes the accuracy when the data is centralized. The accuracy without FL is the mean and standard deviation of accuracy values across multiple clients without any collaboration. The accuracy with FL is the mean and standard deviation of the central model trained at the end of FL evaluated on each client dataset.
Figure 4Comparison of the mean accuracy across clients versus the accuracy of the central model trained with FL for the fabricated clients (not the real hospitals). The accuracy is computed on two types of data distribution settings across clients—IID and Non-IID.
Figure 5Visualisation of IID and non-IID distribution of data among client models.
Ablation study of DP hyperparameters (gradient clipping and noise multiplier).
| Gradient clipping | Noise multiplier | Privacy budget ( | Test accuracy | External accuracy |
|---|---|---|---|---|
| 1.0 | 4 | 2.90 | 0.815 | 0.740 |
| 1.5 | 4 | 3.26 | 0.759 | 0.719 |
| 2.0 | 4 | 3.89 | 0.765 | 0.732 |
| 1.0 | 6.0 | 2.34 | 0.832 | 0.737 |
| 1.0 | 2.0 | 10.01 | 0.782 | 0.748 |
Evaluation of collaborative and non-collaborative learning on Test and External Datasets using DP-SGD, achieving privacy parameter ε = 2.90 for δ = 0.0001.
| Source hospital | Non-collaborative training | DP-FL training | FL training | Combined training | ||||
|---|---|---|---|---|---|---|---|---|
| Test | External | Test | External | Test | External | Test | External | |
| International Genomics Consortium | 0.654 | 0.631 | 0.823 ± 0.01 | 0.707 ± 0.01 | 0.823 ± 0.01 | 0.741 ± 0.01 | 0.839 ± 0.01 | 0.768 ± 0.003 |
| Indivumed | 0.648 | 0.556 | ||||||
| Asterand | 0.709 | 0.701 | ||||||
| John Hopkins | 0.681 | 0.600 | ||||||
For FL and combined training we report the mean accuracy and standard deviation across the client’s test datasets. On the external dataset we ran the experiments using three random initializations, and report the mean accuracy and standard deviation across them.