| Literature DB >> 32724046 |
Micah J Sheller1, Brandon Edwards1, G Anthony Reina1, Jason Martin1, Sarthak Pati2,3, Aikaterini Kotrotsou4,5, Mikhail Milchenko6, Weilin Xu1, Daniel Marcus6, Rivka R Colen4,5,7,8, Spyridon Bakas9,10,11.
Abstract
Several studies underscore the potential of deep learning in identifying complex patterns, leading to diagnostic and prognostic biomarkers. Identifying sufficiently large and diverse datasets, required for training, is a significant challenge in medicine and can rarely be found in individual institutions. Multi-institutional collaborations based on centrally-shared patient data face privacy and ownership challenges. Federated learning is a novel paradigm for data-private multi-institutional collaborations, where model-learning leverages all available data without sharing data between institutions, by distributing the model-training to the data-owners and aggregating their results. We show that federated learning among 10 institutions results in models reaching 99% of the model quality achieved with centralized data, and evaluate generalizability on data from institutions outside the federation. We further investigate the effects of data distribution across collaborating institutions on model quality and learning patterns, indicating that increased access to data through data private multi-institutional collaborations can benefit model quality more than the errors introduced by the collaborative method. Finally, we compare with other collaborative-learning approaches demonstrating the superiority of federated learning, and discuss practical implementation considerations. Clinical adoption of federated learning is expected to lead to models trained on datasets of unprecedented size, hence have a catalytic impact towards precision/personalized medicine.Entities:
Mesh:
Year: 2020 PMID: 32724046 PMCID: PMC7387485 DOI: 10.1038/s41598-020-69250-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1System architectures of collaborative learning approaches for multi-institutional collaborations. The current paradigm for multi-institutional collaborations, based on Centralized Data Sharing, is shown in (a), whereas in (b) we note the proposed paradigm, based on Federated Learning. Panels (c) and (d) offer schematics for alternative data-private collaborative learning approaches evaluated in this study, namely Institutional Incremental Learning, and Cyclic Institutional Incremental Learning, respectively.
Figure 3Model quality results from single institution training, CDS, FL, IIL, and CIIL. CDS, FL, CIIL mean model Dice against the Original Institution group single institution held-out validation data over multiple runs of collaborative cross validation, as well as the average of single institutional results under the same scheme (AVG SIM). The AVG 1–10 column provides the average performance of each collaboration method across single institution validation sets. For CIIL, ‘best local’ and ‘random local’ are two methods we introduce for final model selection during CIIL (More details are given in the “Methods: Final Model Selection” section ). Note that the color scale here differs from that used in Fig. 2.
Figure 4Learning curves of collaborative learning methods on Original Institution data. Mean global validation Dice every epoch by collaborative learning method on the Original Institution group over multiple runs of collaborative cross validation. Confidence intervals are min, max. An epoch for DCS is defined as a single training pass over all of the centralized data. An epoch for FL is defined as a parallel training pass of every institutiuon over their training data, and an epoch during CIIL and IIL is defined as a single insitution training pass over its data.
Figure 2Single Original Institution Validation Results. Single institution mean final model qualities (based on the Dice Similarity Coefficient[34]) for the Original Institution group (y-axis) measured against all single institution held-out validation sets (x-axis) using multiple runs of five-fold collaborative cross validation. The Y axis represents models trained on a single institutional dataset, and the X axis represents the validation dataset of each independent institution (Local Validation Dataset). “AVG” indicates the average of each institution mean model performance over all institutions in the group other than itself, “W-AVG” denotes the same, but with a weighted average according to each institution’s contribution to the validation set size. The diagonal entries indicate how well each institution’s final models scored against their own validation set, and they are represented as the Single Institutional Model (SIM) results reported in Fig. 3.
Model quality results from single institution training, CDS, and all data-private methods.
| Model | BTest | WashU | MDACC | Global val | LOO |
|---|---|---|---|---|---|
| Avg single inst | 0.732 ± 0.054 | 0.666 ± 0.045 | 0.705 ± 0.033 | 0.733 | – |
| CDS | 0.863 ± 0.008 | 0.782 ± 0.009 | 0.828 ± 0.007 | 0.862 ± 0.007 | 0.84 ± 0.006 |
| FL | 0.858 ± 0.004 | 0.771 ± 0.008 | 0.82 ± 0.003 | 0.857 ± 0.007 | 0.835 ± 0.006 |
| CIIL “best local” | 0.855 ± 0.007 | 0.775 ± 0.013 | 0.82 ± 0.009 | 0.853 ± 0.006 | 0.831 ± 0.012 |
| CIIL “rand. local” | 0.84 ± 0.021 | 0.758 ± 0.021 | 0.808 ± 0.014 | 0.824 ± 0.035 | 0.804 ± 0.031 |
| IIL “smallest first” | 0.833 ± 0.006 | 0.751 ± 0.007 | 0.781 ± 0.009 | 0.825 ± 0.007 | 0.785 ± 0.023 |
| Institution 1 | 0.826 | 0.731 | 0.773 | 0.824 | – |
| Institution 2 | 0.614 | 0.572 | 0.651 | 0.628 | – |
| Institution 3 | 0.700 | 0.635 | 0.718 | 0.702 | – |
| Institution 4 | 0.751 | 0.680 | 0.701 | 0.747 | – |
| Institution 5 | 0.753 | 0.685 | 0.691 | 0.733 | – |
| Institution 6 | 0.708 | 0.621 | 0.668 | 0.709 | – |
| Institution 7 | 0.721 | 0.674 | 0.712 | 0.732 | – |
| Institution 8 | 0.755 | 0.687 | 0.720 | 0.755 | – |
| Institution 9 | 0.745 | 0.691 | 0.715 | 0.755 | – |
| Institution 10 | 0.751 | 0.687 | 0.700 | 0.745 | – |
Mean ± standard deviation of Dice for all collaboration methods on the Original Institution group under multiple runs of collaborative cross validation, as well as the mean of single institutional results under the same scheme. The LOO results are a weighted average over institutional LOO tests, weighted by test institution contribution. The ‘–’ entries in the LOO column indicate single-institution tests, where the LOO method did not apply.