| Literature DB >> 34911013 |
Ming Y Lu1, Richard J Chen2, Dehan Kong3, Jana Lipkova1, Rajendra Singh4, Drew F K Williamson1, Tiffany Y Chen1, Faisal Mahmood5.
Abstract
Deep Learning-based computational pathology algorithms have demonstrated profound ability to excel in a wide array of tasks that range from characterization of well known morphological phenotypes to predicting non human-identifiable features from histology such as molecular alterations. However, the development of robust, adaptable and accurate deep learning-based models often rely on the collection and time-costly curation large high-quality annotated training data that should ideally come from diverse sources and patient populations to cater for the heterogeneity that exists in such datasets. Multi-centric and collaborative integration of medical data across multiple institutions can naturally help overcome this challenge and boost the model performance but is limited by privacy concerns among other difficulties that may arise in the complex data sharing process as models scale towards using hundreds of thousands of gigapixel whole slide images. In this paper, we introduce privacy-preserving federated learning for gigapixel whole slide images in computational pathology using weakly-supervised attention multiple instance learning and differential privacy. We evaluated our approach on two different diagnostic problems using thousands of histology whole slide images with only slide-level labels. Additionally, we present a weakly-supervised learning framework for survival prediction and patient stratification from whole slide images and demonstrate its effectiveness in a federated setting. Our results show that using federated learning, we can effectively develop accurate weakly-supervised deep learning models from distributed data silos without direct data sharing and its associated complexities, while also preserving differential privacy using randomized noise generation. We also make available an easy-to-use federated learning for computational pathology software package: http://github.com/mahmoodlab/HistoFL.Entities:
Keywords: Computational pathology; Federated learning; Pathology; Split learning; Whole slide imaging
Mesh:
Year: 2021 PMID: 34911013 PMCID: PMC9340569 DOI: 10.1016/j.media.2021.102298
Source DB: PubMed Journal: Med Image Anal ISSN: 1361-8415 Impact factor: 13.828
Fig. 1.Overview of the weakly-supervised multiple instance learning in a federated learning framework. At each client site, for each WSI, the tissue regions are first automatically segmented and image patches are extracted from the segmented foreground regions. Then all patches are embedded into a low-dimension feature representation using a pretrained CNN as the encoder. Each client site trains a model using weakly-supervised learning on local data (requires only the slide-level or patient-level labels) and sends the model weights each epoch to a central server. Random noise can be added to the weight parameters before communicating with the central hub for differential privacy preservation. On the central server, the global model is updated by averaging the model weights retrieved from all client sites. After the federated averaging, the updated weights of the global model is then sent to each client model for synchronization prior to starting the next federated round.
Privacy-preserving federated learning using attention-based multiple instance learning for multi-site histology-based classification and survival prediction.
| Input: |
| I. WSI Data and weak annotation (e.g. patient diagnosis or prognosis) scattered among |
| |
| |
| |
| II. Neural network models on local clients |
| III. Noise generator |
| IV. Number of training epochs or federated rounds, |
| V. Optimizers |
| VI. Weight coefficient for each client during federated averaging, e.g. |
| 1. initialize all model weights |
| 2. for |
| 3. for |
| 4. for |
| |
| |
| 5. |
| |
| |
| 6. end for |
| 7. end for |
| 8. |
| 9. for |
| 10. |
| 11. end for |
| 12. end for |
| 13. return global model |
Partition for BRCA subtyping (number of WSIs).
| ILC | IDC | Total | |
|---|---|---|---|
|
| |||
| TCGA Site 1 | 56 | 155 | 211 |
| TCGA Site 2 | 46 | 268 | 314 |
| TCGA Site 3 | 109 | 422 | 531 |
| BWH | 158 | 912 | 1070 |
| Total | 369 | 1757 | 2126 |
Partition for CCRCC survival prediction (number of cases).
| Uncensored | Censored | Total | |
|---|---|---|---|
|
| |||
| TCGA Site 1 | 16 | 88 | 104 |
| TCGA Site 2 | 27 | 49 | 76 |
| TCGA Site 3 | 128 | 203 | 331 |
| Total | 171 | 340 | 511 |
Fig. 2.Performance, comparative analysis and loss curves. a-c, d-f The classification performance and loss curves of BRCA histologic subtyping and RCC histological subtyping, respectively. Top: ROC curves are generated on the test sets for models trained using a centralized database, federated learning (with different levels of Gaussian random noise added during federated weight averaging) and using training data local to each institution individually. The AUC score (averaged over 5-fold cross-validation, s.d.) is reported for each experiment; macro-averaging is used for the multi-class classification of RCC subytping. Using multiinstitutional data and federated learning, we achieved a mean test AUC between 0.833 and 0.862 on BRCA histologic subtyping and an AUC of between 0.974 and 0.976 on RCC histologic subtyping respectively. Middle: Balanced accuracy score and the sensitivity (recall) for each class (IDC: Invasive Ductal Carcinoma, ILC: Invasive Lobular Carcinoma for BRCA subtyping; CHRCC: Chromophobe Renal Cell Carcinoma, CCRCC: Clear Cell Renal Cell Carcinoma, PRCC: Papillary Renal Cell Carcinoma for RCC subtyping) is plotted for all experiments to assess model performance when accounting for class-imbalance in the respective test set. Error bars show s.d. from 5-fold cross-validation. Bottom: For each experiment, the training loss and validation loss is monitored over each epoch before early stopping is triggered (see Section 3.2). Loss curves are shown for a single cross-validation fold from each task. Federated learning is observed to converge to a higher training and validation loss value in both tasks.
Fig. 3.Interpretability and visualization for weakly-supervised federated classification. In order to interpret and validate the morphological features learned by the model for RCC and BRCA histologic subtype classification, for randomly selected WSIs in the respective test set, the model trained with privacy-preserving federated learning (α= 0.01) is used to generate attention heatmaps using 256 × 256 sized tissue patches tiled at the 20 × magnification with a 90% spatial overlap. For each WSI, the attention scores predicted for all patches in the slide are normalized to the range of [0, 1] by converting them to percentiles. For subtype classification, patches with high attention refers to image regions of high diagnostic relevance used for class prediction. The normalized scores are then mapped to their respective spatial location in the slide. Finally, an RGB colormap is applied (red: high attention, blue: low attention), and the heatmap is overlaid on top of the original H&E image for display. For BRCA, patches of the most highly attended regions (red border) exhibited well-known tumor morphology of invasive ductal carcinoma (round cells with varying degrees of polymorphism arranged in tubules, nests, or papillae) and invasive lobular carcinoma (round and signet-ring cells with intracellular lumina and targetoid cytoplasmic mucin arranged in a single-file or trabecular pattern). For RCC, highly attended regions exhibited well-known tumor morphology of chromophobe RCC (large, round to polygonal cells with abundant, finely-reticulated to granular cytoplasm and perinuclear halos), clear cell RCC (large, round to polygonal cells with clear cytoplasm and distinct, but delicate cell borders), and papillary RCC (round to cuboidal cells with prominent papillary or tubulopapillary architecture with fibrovascular cores). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
BRCA subtyping test performance reported as five-fold mean (s.d.).
| AUC ↑ | Error ↓ | bACC ↑ | F1 ↑ | mAP ↑ | Cohen’s | |
|---|---|---|---|---|---|---|
|
| ||||||
| Site 1 only | 0.819±0.018 | 0.169±0.015 | 0.667±0.054 | 0.453±0.092 | 0.508±0.026 | 0.359±0.083 |
| Site 2 only | 0.752±0.066 | 0.178±0.018 | 0.684±0.053 | 0.478±0.074 | 0.454±0.075 | 0.373±0.079 |
| Site 3 only | 0.698±0.070 | 0.180±0.015 | 0.639±0.055 | 0.405±0.107 | 0.386±0.065 | 0.305±0.105 |
| Site 4 only | 0.788±0.017 | 0.190±0.029 | 0.691±0.009 | 0.490±0.013 | 0.441±0.050 | 0.375±0.031 |
| Centralized | 0.919±0.013 | 0.104±0.012 | 0.792±0.025 | 0.684±0.032 | 0.761±0.043 | 0.623±0.037 |
| Federated | 0.862±0.025 | 0.149±0.023 | 0.736±0.023 | 0.575±0.043 | 0.610±0.076 | 0.485±0.057 |
| Federated, | 0.836±0.021 | 0.166±0.023 | 0.744±0.016 | 0.568±0.032 | 0.537±0.049 | 0.467±0.045 |
| Federated, | 0.833±0.023 | 0.173±0.028 | 0.739±0.021 | 0.557±0.036 | 0.535±0.048 | 0.451±0.052 |
| Federated, | 0.842±0.022 | 0.159±0.021 | 0.756±0.026 | 0.585±0.036 | 0.550±0.053 | 0.488±0.048 |
| Federated, | 0.657±0.033 | 0.426±0.210 | 0.582±0.051 | 0.337±0.046 | 0.294±0.038 | 0.127±0.076 |
RCC subtyping test performance reported as five-fold mean (s.d.).
| AUC ↑ | Error ↓ | bACC ↑ | F1 ↑ | mAP ↑ | Cohen’s | |
|---|---|---|---|---|---|---|
|
| ||||||
| Site 1 only | 0.947±0.017 | 0.165±0.013 | 0.802±0.033 | 0.813±0.021 | 0.903±0.026 | 0.704±0.029 |
| Site 2 only | 0.939±0.008 | 0.185±0.012 | 0.812±0.036 | 0.805±0.023 | 0.898±0.019 | 0.685±0.027 |
| Site 3 only | 0.894±0.024 | 0.219±0.038 | 0.772±0.049 | 0.762±0.050 | 0.817±0.035 | 0.625±0.063 |
| Site 4 only | 0.943±0.015 | 0.163±0.032 | 0.825±0.037 | 0.815±0.039 | 0.899±0.028 | 0.715±0.057 |
| Centralized | 0.985±0.004 | 0.081±0.018 | 0.912±0.023 | 0.910±0.019 | 0.970±0.009 | 0.856±0.033 |
| Federated | 0.976±0.007 | 0.106±0.012 | 0.890±0.028 | 0.881±0.015 | 0.956±0.015 | 0.815±0.025 |
| Federated, | 0.976±0.007 | 0.107±0.025 | 0.896±0.034 | 0.883±0.030 | 0.956±0.014 | 0.814±0.046 |
| Federated, | 0.976±0.006 | 0.105±0.017 | 0.896±0.027 | 0.885±0.021 | 0.954±0.015 | 0.818±0.033 |
| Federated, | 0.974±0.007 | 0.101±0.010 | 0.900±0.020 | 0.891±0.009 | 0.953±0.014 | 0.823±0.020 |
| Federated, | 0.789±0.062 | 0.553±0.180 | 0.402±0.090 | 0.266±0.102 | 0.661±0.077 | 0.068±0.071 |
Fig. 4.Patient stratification and interpretability for weakly-supervised federated survival prediction. Patients in the test set were stratified into high risk and low risk groups using the median (50% percentile) of the model’s predicted risk score distribution as the cutoff and the log-rank test was used to assess the statistical significance between survival distributions of the resulting risk groups. Top: increasing α by over two orders of magnitude for stronger guarantees on differential privacy did not eliminate the model’s ability to stratify patients into statistically significantly (p -value < 0.05) different risk groups. Bottom: exemplars of Clear Cell Renal Cell Carcinoma patients predicted as high-risk and low-risk respectively by the model, the original H&E (left), attention-based heatmap (center), and highest-attention patches (right). As compared to the subtyping classification problem, since survival analysis is an ordinal regression problem, the high attention patches correspond to regions with high prognostic relevance in stratifying patients into low versus high risk groups. The highest attention patches for the high-risk case focus predominantly on the tumor cells themselves, while the highest attention patches for the low risk case focus predominantly on lymphocytes within the stroma and directly interfacing with tumor cells, which corroborates with the known prognostic relevance of tumor-immune co-localization in pathology.
CCRCC survival prediction test performance reported as five-fold mean (±s.d.).
| c-Index | AUC | P-Value | |
|---|---|---|---|
|
| |||
| Site 1 only | 0.502±0.018 | 0.513±0.032 | 0.937 |
| Site 2 only | 0.506±0.017 | 0.520±0.022 | 0.662 |
| Site 3 only | 0.645±0.064 | 0.674±0.077 | 9.14 × 10−4 |
| Centralized | 0.692±0.043 | 0.729±0.046 | 1.39 × 10−8 |
| Federated, | 0.683±0.064 | 0.719±0.070 | 2.86 × 10−8 |
| Federated, | 0.639±0.090 | 0.664±0.103 | 1.52 × 10−5 |
| Federated, | 0.648±0.099 | 0.676±0.111 | 2.39 × 10−5 |
| Federated, | 0.647±0.085 | 0.672±0.098 | 2.52 × 10−9 |
| Federated, | 0.508±0.036 | 0.504±0.044 | 0.805 |
BRCA subtyping performance tested on intra vs. inter-site test data, reported as five-fold mean (±s.d.).
| Site 1 | Site 2 | Site 3 | Site 4 | All (Macro-avg) | All (Micro-avg) | |
|---|---|---|---|---|---|---|
|
| ||||||
| Centralized | 0.929±0.034 | 0.883±0.055 | 0.887±0.060 | 0.938±0.022 | 0.909±0.013 | 0.919±0.013 |
| Site 1 only | 0.841±0.026 | 0.776±0.063 | 0.786±0.074 | 0.853±0.023 | 0.814±0.022 | 0.819±0.018 |
| Site 2 only | 0.703±0.143 | 0.739±0.050 | 0.782±0.045 | 0.847±0.039 | 0.768±0.052 | 0.752±0.066 |
| Site 3 only | 0.620±0.108 | 0.713±0.132 | 0.772±0.084 | 0.798±0.077 | 0.726±0.085 | 0.698±0.070 |
| Site 4 only | 0.806±0.026 | 0.704±0.048 | 0.828±0.045 | 0.853±0.042 | 0.798±0.031 | 0.788±0.017 |
| Federated | 0.859±0.046 | 0.838±0.077 | 0.837±0.041 | 0.919±0.024 | 0.863±0.024 | 0.862±0.025 |
RCC subtyping performance tested on intra vs. inter-site test data, reported as five-fold mean (±s.d.).
| Site 1 | Site 2 | Site 3 | Site 4 | All (Macro-avg) | All (Micro-avg) | |
|---|---|---|---|---|---|---|
|
| ||||||
| Centralized | 0.992±0.007 | 0.978±0.012 | 0.982±0.015 | 0.983±0.005 | 0.984±0.007 | 0.985±0.004 |
| Site 1 only | 0.981±0.019 | 0.928±0.023 | 0.975±0.016 | 0.947±0.018 | 0.958±0.012 | 0.947±0.017 |
| Site 2 only | 0.932±0.032 | 0.976±0.021 | 0.872±0.021 | 0.950±0.015 | 0.933±0.009 | 0.939±0.008 |
| Site 3 only | 0.943±0.020 | 0.846±0.057 | 0.980±0.021 | 0.877±0.050 | 0.911±0.026 | 0.894±0.024 |
| Site 4 only | 0.958±0.021 | 0.914±0.032 | 0.922±0.036 | 0.984±0.006 | 0.945±0.016 | 0.943±0.015 |
| Federated | 0.990±0.008 | 0.967±0.013 | 0.971±0.016 | 0.985±0.004 | 0.978±0.007 | 0.976±0.007 |
CCRCC survival prediction performance tested on intra vs. inter-site test data, reported as five-fold mean (±s.d.).
| Site 1 | Site 2 | Site 3 | All (Micro-avg) | All (Macro-avg) | |
|---|---|---|---|---|---|
|
| |||||
| Centralized | 0.577±0.185 | 0.653±0.148 | 0.709±0.067 | 0.692±0.043 | 0.646±0.068 |
| Site 1 only | 0.463±0.132 | 0.449±0.070 | 0.522±0.039 | 0.502±0.018 | 0.478±0.055 |
| Site 2 only | 0.475±0.076 | 0.566±0.062 | 0.491±0.029 | 0.506±0.017 | 0.511±0.028 |
| Site 3 only | 0.651±0.119 | 0.573±0.119 | 0.685±0.067 | 0.645±0.064 | 0.636±0.061 |
| Federated | 0.594±0.200 | 0.596±0.177 | 0.729±0.062 | 0.683±0.064 | 0.640±0.097 |
Survival prediction performance for different choices of R and comparison with existing approaches, reported as five-fold mean (±s.d.).
| c-Index | AUC | ||
|---|---|---|---|
|
| |||
| Grade | 0.648±0.047 | 0.668±0.058 | 0.272 |
| Grade + Age + Gender | 0.693±0.050 | 0.716±0.065 | 0.193 |
| 0.681±0.031 | 0.708±0.044 | 1.28 × 10−8 | |
| 0.685±0.020 | 0.723±0.022 | 6.38 × 10−6 | |
| 0.678±0.033 | 0.713±0.033 | 7.58 × 10−8 | |
| 0.692±0.043 | 0.732±0.048 | 1.39 × 10−8 | |
Federated learning performance for difference communication pace.
| A. BRCA subtyping performance for different communication pace | ||||
|
| ||||
|
| ||||
| AUC | 0.862±0.025 | 0.869±0.024 | 0.865±0.027 | 0.867±0.024 |
| B. RCC subtyping performance for different communication pace | ||||
|
| ||||
|
| ||||
| AUC | 0.976±0.007 | 0.975±0.006 | 0.975±0.007 | 0.973±0.009 |
| C. CCRCC survival prediction performance for different communication pace | ||||
|
| ||||
|
| ||||
| c-index | 0.683±0.064 | 0.686±0.053 | 0.664±0.083 | 0.655±0.074 |
Fig. 5.Performance comparison between simple averaging vs. weighted aggregation. Performance in terms of AUC ROC for classification and c-index for survival prediction is shown for federated averaging across different levels of α. Error bars show s.d. from 5-fold cross-validation.
Partition for RCC subtyping (number of WSIs).
| CCRCC | PRCC | CHRCC | Total | |
|---|---|---|---|---|
|
| ||||
| TCGA Site 1 | 108 | 120 | 39 | 267 |
| TCGA Site 2 | 78 | 100 | 31 | 209 |
| TCGA Site 3 | 333 | 77 | 51 | 461 |
| BWH | 184 | 40 | 23 | 247 |
| Total | 703 | 337 | 144 | 1184 |