Literature DB >> 32143809

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.

Jin Li1, Yu Tian1, Yan Zhu1, Tianshu Zhou1, Jun Li2, Kefeng Ding2, Jingsong Li3.   

Abstract

BACKGROUND: The accuracy of a prognostic prediction model has become an essential aspect of the quality and reliability of the health-related decisions made by clinicians in modern medicine. Unfortunately, individual institutions often lack sufficient samples, which might not provide sufficient statistical power for models. One mitigation is to expand data collection from a single institution to multiple centers to collectively increase the sample size. However, sharing sensitive biomedical data for research involves complicated issues. Machine learning models such as random forests (RF), though they are commonly used and achieve good performances for prognostic prediction, usually suffer worse performance under multicenter privacy-preserving data mining scenarios compared to a centrally trained version. METHODS AND MATERIALS: In this study, a multicenter random forest prognosis prediction model is proposed that enables federated clinical data mining from horizontally partitioned datasets. By using a novel data enhancement approach based on a differentially private generative adversarial network customized to clinical prognosis data, the proposed model is able to provide a multicenter RF model with performances on par with-or even better than-centrally trained RF but without the need to aggregate the raw data. Moreover, our model also incorporates an importance ranking step designed for feature selection without sharing patient-level information. RESULT: The proposed model was evaluated on colorectal cancer datasets from the US and China. Two groups of datasets with different levels of heterogeneity within the collaborative research network were selected. First, we compare the performance of the distributed random forest model under different privacy parameters with different percentages of enhancement datasets and validate the effectiveness and plausibility of our approach. Then, we compare the discrimination and calibration ability of the proposed multicenter random forest with a centrally trained random forest model and other tree-based classifiers as well as some commonly used machine learning methods. The results show that the proposed model can provide better prediction performance in terms of discrimination and calibration ability than the centrally trained RF model or the other candidate models while following the privacy-preserving rules in both groups. Additionally, good discrimination and calibration ability are shown on the simplified model based on the feature importance ranking in the proposed approach.
CONCLUSION: The proposed random forest model exhibits ideal prediction capability using multicenter clinical data and overcomes the performance limitation arising from privacy guarantees. It can also provide feature importance ranking across institutions without pooling the data at a central site. This study offers a practical solution for building a prognosis prediction model in the collaborative clinical research network and solves practical issues in real-world applications of medical artificial intelligence.
Copyright © 2020 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Clinical decision support; Distributed privacy-preserving modeling; Ensemble learning; Generative adversarial networks; Variable importance ranking

Mesh:

Year:  2020        PMID: 32143809     DOI: 10.1016/j.artmed.2020.101814

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  6 in total

1.  Development of machine learning models to prognosticate chronic shunt-dependent hydrocephalus after aneurysmal subarachnoid hemorrhage.

Authors:  Giovanni Muscas; Tommaso Matteuzzi; Eleonora Becattini; Simone Orlandini; Francesca Battista; Antonio Laiso; Sergio Nappini; Nicola Limbucci; Leonardo Renieri; Biagio R Carangelo; Salvatore Mangiafico; Alessandro Della Puppa
Journal:  Acta Neurochir (Wien)       Date:  2020-07-08       Impact factor: 2.216

2.  Alternative stopping rules to limit tree expansion for random forest models.

Authors:  Mark P Little; Philip S Rosenberg; Aryana Arsham
Journal:  Sci Rep       Date:  2022-09-06       Impact factor: 4.996

Review 3.  OMOP CDM Can Facilitate Data-Driven Studies for Cancer Prediction: A Systematic Review.

Authors:  Najia Ahmadi; Yuan Peng; Markus Wolfien; Michéle Zoch; Martin Sedlmayr
Journal:  Int J Mol Sci       Date:  2022-10-05       Impact factor: 6.208

4.  Machine learning-based gray-level co-occurrence matrix signature for predicting lymph node metastasis in undifferentiated-type early gastric cancer.

Authors:  Xin Wei; Xue-Jiao Yan; Yu-Yan Guo; Jie Zhang; Guo-Rong Wang; Arsalan Fayyaz; Jiao Yu
Journal:  World J Gastroenterol       Date:  2022-09-28       Impact factor: 5.374

5.  Preoperative Heart Rate Variability During Sleep Predicts Vagus Nerve Stimulation Outcome Better in Patients With Drug-Resistant Epilepsy.

Authors:  Xi Fang; Hong-Yun Liu; Zhi-Yan Wang; Zhao Yang; Tung-Yang Cheng; Chun-Hua Hu; Hong-Wei Hao; Fan-Gang Meng; Yu-Guang Guan; Yan-Shan Ma; Shu-Li Liang; Jiu-Luan Lin; Ming-Ming Zhao; Lu-Ming Li
Journal:  Front Neurol       Date:  2021-07-07       Impact factor: 4.003

Review 6.  Applications of Artificial Intelligence in Screening, Diagnosis, Treatment, and Prognosis of Colorectal Cancer.

Authors:  Hang Qiu; Shuhan Ding; Jianbo Liu; Liya Wang; Xiaodong Wang
Journal:  Curr Oncol       Date:  2022-03-07       Impact factor: 3.677

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.