Literature DB >> 32178250

Stroke Prediction with Machine Learning Methods among Older Chinese.

Yafei Wu1,2,3, Ya Fang1,2,3.   

Abstract

Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73-0.83) for RF and 0.72 (95% CI, 0.71-0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods (p < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.

Entities:  

Keywords:  imbalanced data; machine learning; prediction; stroke

Year:  2020        PMID: 32178250     DOI: 10.3390/ijerph17061828

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   3.390


  7 in total

1.  Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine.

Authors:  Yulu Zheng; Zheng Guo; Yanbo Zhang; Jianjing Shang; Leilei Yu; Ping Fu; Yizhi Liu; Xingang Li; Hao Wang; Ling Ren; Wei Zhang; Haifeng Hou; Xuerui Tan; Wei Wang
Journal:  EPMA J       Date:  2022-05-27       Impact factor: 8.836

2.  Construction of Xinjiang metabolic syndrome risk prediction model based on interpretable models.

Authors:  Yan Zhang; Jaina Razbek; Deyang Li; Lei Yang; Liangliang Bao; Wenjun Xia; Hongkai Mao; Mayisha Daken; Xiaoxu Zhang; Mingqin Cao
Journal:  BMC Public Health       Date:  2022-02-08       Impact factor: 3.295

3.  Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.

Authors:  Po-Yuan Su; Yi-Chia Wei; Hung-Yu Wei; Tsong-Hai Lee; Hao Luo; Chi-Hung Liu; Wen-Yi Huang; Kuan-Fu Chen; Ching-Po Lin
Journal:  JMIR Med Inform       Date:  2022-03-25

4.  Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults.

Authors:  Xiao Huang; Tianyu Cao; Liangziqian Chen; Junpei Li; Ziheng Tan; Benjamin Xu; Richard Xu; Yun Song; Ziyi Zhou; Zhuo Wang; Yaping Wei; Yan Zhang; Jianping Li; Yong Huo; Xianhui Qin; Yanqing Wu; Xiaobin Wang; Hong Wang; Xiaoshu Cheng; Xiping Xu; Lishun Liu
Journal:  Front Cardiovasc Med       Date:  2022-05-06

5.  Using random forest algorithm for glomerular and tubular injury diagnosis.

Authors:  Wenzhu Song; Xiaoshuang Zhou; Qi Duan; Qian Wang; Yaheng Li; Aizhong Li; Wenjing Zhou; Lin Sun; Lixia Qiu; Rongshan Li; Yafeng Li
Journal:  Front Med (Lausanne)       Date:  2022-07-28

6.  Machine Learning Prediction Models for Postoperative Stroke in Elderly Patients: Analyses of the MIMIC Database.

Authors:  Xiao Zhang; Ningbo Fei; Xinxin Zhang; Qun Wang; Zongping Fang
Journal:  Front Aging Neurosci       Date:  2022-07-18       Impact factor: 5.702

7.  A Genomic-Clinicopathologic Nomogram for the Prediction of Lymph Node Invasion in Prostate Cancer.

Authors:  Zongtai Zheng; Shiyu Mao; Zhuoran Gu; Ruiliang Wang; Yadong Guo; Wentao Zhang; Xudong Yao
Journal:  J Oncol       Date:  2021-05-26       Impact factor: 4.375

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.