| Literature DB >> 32302316 |
Li Huang1,2, Yifeng Yin3, Zeng Fu4, Shifa Zhang5,6, Hao Deng7, Dianbo Liu6,8.
Abstract
Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.Entities:
Year: 2020 PMID: 32302316 PMCID: PMC7164603 DOI: 10.1371/journal.pone.0230706
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Communication between the clients and the server under FedAvg.
Fig 2FedAvg complemented by the data-sharing strategy: Distribute shared data to the clients at initialization.
Fig 3Communication between the clients and the server under LoAdaBoost FedAvg.
Summary of the evaluation dataset.
| representation | count | |
|---|---|---|
| integer: IDs ranging from 2 to 99,999 | 30,000 | |
| binary: 0 for female and 1 for male | 17,284/12,716 | |
| binary: 0 for ages less than or equal to 65 and 1 for greater | 13,947/16,053 | |
| binary: 0 for survival and 1 for expired | 20,841/9,159 | |
| binary: 0 for not prescribed to patients and 1 for prescribed | 2814 dimensions |
Example rows and columns of DRUGS.
| SUBJECT_ID | D5W | Heparin Sodium | Nitro-glycerine | Docusate Sodium | Insulin | Atropine Sulphate | … |
|---|---|---|---|---|---|---|---|
| … | … | … | … | … | … | … | … |
| 9 | 1 | 0 | 0 | 0 | 1 | 0 | … |
| 10 | 0 | 0 | 0 | 0 | 1 | 0 | … |
| 11 | 0 | 0 | 0 | 1 | 1 | 0 | … |
| 12 | 1 | 0 | 0 | 0 | 1 | 0 | … |
| 13 | 1 | 1 | 1 | 1 | 1 | 1 | … |
Fig 4Performance gap between IID and non-IID data.
Fig 5Comparison of FedAvg and LoAdaboost on IID data.
LoAdaBoost converged slightly slower than FedAvg, but to a higher test AUC.
IID scenario: 10-fold cross validation results with varying C and E.
| FedAvg | LoAdaBoost | |||||
|---|---|---|---|---|---|---|
| AUC | average epochs | AUC | average epochs | |||
| 10% | 5 | 0.7891+-0.0002 | 75 | 0.7940+-0.0001 | 68 | 0.03 |
| 10 | 0.7876+-0.0010 | 100 | 0.7900+-0.0007 | 73 | 0.03 | |
| 15 | 0.7897+-0.0006 | 75 | 0.7907+-0.0010 | 52 | 0.03 | |
| 20% | 5 | 0.7905+-0.0003 | 75 | 0.7971+-0.0005 | 69 | 0.03 |
| 50% | 5 | 0.7903+-0.0003 | 80 | 0.7932+-0.0005 | 75 | 0.03 |
| 100% | 5 | 0.7888+-0.0002 | 75 | 0.7887+-0.0003 | 72 | 0.78 |
Fig 6Comparison of FedAvg and LoAdaboost on non-IID data with data-sharing strategy.
Non-IID scenario: 10-fold cross validation results with varying α and β.
| FedAvg with data sharing | LoAdaBoost with data sharing | |||||
|---|---|---|---|---|---|---|
| AUC | average epochs | AUC | average epochs | |||
| 1% | 10% | 0.7842+-0.0016 | 40 | 0.7916+-0.0015 | 36 | 0.03 |
| 20% | 0.7954+-0.0012 | 40 | 0.8016+-0.0015 | 35 | 0.03 | |
| 30% | 0.8167+-0.0011 | 40 | 0.8203+-0.0011 | 34 | 0.03 | |
| 2% | 10% | 0.7913+-0.0010 | 40 | 0.7984+-0.0008 | 35 | 0.03 |
| 3% | 10% | 0.8033+-0.0010 | 40 | 0.8063+-0.0010 | 34 | 0.03 |
Non-IID scenario: 10-fold cross validation results with varying C.
| FedAvg with data sharing | LoAdaBoost with data sharing | ||||
|---|---|---|---|---|---|
| AUC | average epochs | AUC | average epochs | ||
| 10% | 0.7842+-0.0016 | 40 | 0.7916+-0.0015 | 36 | 0.03 |
| 20% | 0.7869+-0.0008 | 50 | 0.7893+-0.0005 | 46 | 0.03 |
| 50% | 0.7831+-0.0005 | 40 | 0.7877+-0.0006 | 35 | 0.03 |
| 100% | 0.7609+-0.0004 | 40 | 0.7900+-0.0003 | 35 | 0.03 |
Summary of the eICU dataset.
| representation | count | |
|---|---|---|
| integer: six-digit patient ID | 22,500 | |
| integer: hospital IDs ranging from 63 to 458 | 45 | |
| binary: 0 for survival and 1 for expired | 21393/1107 | |
| binary: 0 for not prescribed to patients and 1 for prescribed | 1399 dimensions |
Fig 7Comparison of FedAvg and LoAdaboostFedAvg on eICU data.
Evaluation on eICU data: 10-fold cross validation results.
| data distribution | method | AUC | average epochs | |
|---|---|---|---|---|
| IID | FedAvg | 0.5693+-0.0057 | 400 | 0.03 |
| LoAdaBoost | 0.6057+-0.0077 | 262 | ||
| non-IID | FedAvg | 0.6512+-0.0043 | 300 | 0.03 |
| LoAdaBoost | 0.6548+-0.0048 | 271 | ||
| FedAvg with data-sharing | 0.6253+-0.0088 | 350 | 0.03 | |
| LoAdaBoost with data-sharing | 0.6412+-0.0065 | 272 |