| Literature DB >> 33204939 |
Jie Xu1, Benjamin S Glicksberg2, Chang Su1, Peter Walker3, Jiang Bian4, Fei Wang1.
Abstract
With the rapid development of computer software and hardware technologies, more and more healthcare data are becoming readily available from clinical institutions, patients, insurance companies, and pharmaceutical industries, among others. This access provides an unprecedented opportunity for data science technologies to derive data-driven insights and improve the quality of care delivery. Healthcare data, however, are usually fragmented and private making it difficult to generate robust results across populations. For example, different hospitals own the electronic health records (EHR) of different patient populations and these records are difficult to share across hospitals because of their sensitive nature. This creates a big barrier for developing effective analytical approaches that are generalizable, which need diverse, "big data." Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, provides great promise to connect the fragmented healthcare data sources with privacy-preservation. The goal of this survey is to provide a review for federated learning technologies, particularly within the biomedical space. In particular, we summarize the general solutions to the statistical challenges, system challenges, and privacy issues in federated learning, and point out the implications and potentials in healthcare. © Springer Nature Switzerland AG 2020.Entities:
Keywords: Federated learning; Healthcare; Privacy
Year: 2020 PMID: 33204939 PMCID: PMC7659898 DOI: 10.1007/s41666-020-00082-4
Source DB: PubMed Journal: J Healthc Inform Res ISSN: 2509-498X
Fig. 1Schematic of the federated learning framework. The model is trained in a distributed manner: the institutions periodically communicate the local updates with a central server to learn a global model; the central server aggregates the updates and sends back the parameters of the updated global model
Fig. 2Communication efficient federated learning methods. Existing research on improving communication efficiency can be categorized into a model compression, b client selection, c updates reducing, and d peer-to-peer learning
Fig. 3Privacy-preserving schemes. a Secure multi-party computation. In security sharing, security values (blue and yellow pie) are split into any number of shares that are distributed among the computing nodes. During the computation, no computation node is able to recover the original value nor learn anything about the output (green pie). Any nodes can combine their shares to reconstruct the original value. b Differential privacy. It guarantees that anyone seeing the result of a differentially private analysis will make the same inference (answer 1 and answer 2 are nearly indistinguishable)
Summary of recent work on federated learning for healthcare
| Problem | ML method | No. of clients | Data |
|---|---|---|---|
| Patient similarity learning [ | Hashing | 3 | MIMIC-III [ |
| Patient similarity learning [ | Hashing | 20 | MIMIC-III |
| Phenotyping [ | TF | 1–5 | MIMIC-III, UCSD [ |
| Phenotyping [ | NLP | 10 | MIMIC-III |
| Representation learning [ | PCA | 10–100 | ADNI, UK Biobank, PPMI, MIRIAD |
| Mortality prediction [ | Autoencoder | 5–50 | eICU Collaborative Research Database [ |
| Hospitalization prediction [ | SVM | 5, 10 | Boston Medical Center |
| Preterm-birth prediction [ | RNN | 50 | Cerner Health Facts |
| Mortality prediction [ | LR, NN | 31 | eICU Collaborative Research Database |
| Mortality prediction [ | LR, MLP | 2 | MIMIC-III |
| Activity recognition [ | CNN | 5 | UCI Smartphone [ |
| Adverse drug reactions Prediction [ | SVM, MLP, LR | 10 | LCED, MIMIC |
| Arrhythmia detection [ | NN | 16, 32, 64 | PhysioNet Dataset [ |
| Disease prediction [ | NN | 5, 10 | Pima Indians Diabetes Dataset [ |
| Imaging data analysis | VAE | 4 | MNIST, Brain Imaging Data |
| Mortality prediction [ | LRR, MLP, LASSO | 5 | Mount Sinai COVID-19 Dataset |
TF tensor factorization, MLP multi-layer perceptron, VAE variational autoencoder, LCED Limited MarketScan Explorys Claims-EMR Data. https://www.ibm.com/downloads/cas/6KNYVVQ2
Popular tools for federated learning research
| Project name | Developer | Description |
|---|---|---|
| PySyft [ | OpenMined | It decouples private data from model training using federated learning, DP, and MPC within PyTorch. TensorFlow bindings are also available [ |
| TFF [ | With TFF, TensorFlow provides users with a flexible and open framework through which they can simulate distributed computing locally. | |
| FATE [ | Webank | FATE support the Federated AI ecosystem, where a secure computing protocol is implemented based on homomorphic encryption and MPC. |
| Tensor/IO [ | Dow et al. | Tensor/IO is a lightweight cross-platform library for on-device machine learning, bringing the power of TensorFlow and TensorFlow Lite to iOS, Android, and React native applications. |