| Literature DB >> 31822270 |
Xuemeng Li1, Di Bian2, Jinghui Yu1, Mei Li3, Dongsheng Zhao4.
Abstract
BACKGROUND: With the character of high incidence, high prevalence and high mortality, stroke has brought a heavy burden to families and society in China. In 2009, the Ministry of Health of China launched the China national stroke screening and intervention program, which screens stroke and its risk factors and conducts high-risk population interventions for people aged above 40 years old all over China. In this program, stroke risk factors include hypertension, diabetes, dyslipidemia, smoking, lack of exercise, apparently overweight and family history of stroke. People with more than two risk factors or history of stroke or transient ischemic attack (TIA) are considered as high-risk. However, it is impossible for this criterion to classify stroke risk levels for people with unknown values in fields of risk factors. The missing of stroke risk levels results in reduced efficiency of stroke interventions and inaccuracies in statistical results at the national level. In this paper, we use 2017 national stroke screening data to develop stroke risk classification models based on machine learning algorithms to improve the classification efficiency.Entities:
Keywords: Machine learning models; National Stroke Screening; Risk level classification
Mesh:
Year: 2019 PMID: 31822270 PMCID: PMC6902572 DOI: 10.1186/s12911-019-0998-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The choice of hyperparameters of each model
| Machine learning models | Hyperparameters | Values to be selected | Optimum Value |
|---|---|---|---|
| Decision tree (C4.5) | confidence factor used for pruning (C); minimum number of instances of each leaf (N) | ||
| Neural network | the size of network (number of hidden nodes, H); gradient descent (D). | ||
| Random forest | the depth of the tree(T); number of tree models(N) | ||
| Bagging with C4.5 decision tree | the sampling ratio (P); number of sub-classifiers(N) | ||
| Boosting with C4.5 decision tree | the number of sub-classifiers(N) |
Evaluation results of each model using test set A
| Learning method | Precision (95% CI) | Recall (95% CI) | F1-score (95%CI) | AUC (95% CI) |
|---|---|---|---|---|
| Logistic regression | 91.84% [91.81,91.87%] | 97.82% [97.76,97.88%] | 94.74% [94.69,94.78%] | 99.14% [99.09,99.19%] |
| Naïve Bayesian | 69.48% [69.42,69.54%] | 97.35% [97.31,97.39%] | 81.09% [81.03,81.14%] | 98.44% [98.42,98.46%] |
| Bayesian network | 69.66% [69.62,69.70%] | 97.55% [97.53,97.57%] | 81.28% [81.24,81.31%] | 98.41% [98.38,98.44%] |
| Decision tree(C4.5) | 92.25% [92.21,92.29%] | 99.83% [99.78,99.88%] | 95.89% [95.85,95.94%] | 99.92% [99.90,99.94%] |
| Neural network | 92.19% [92.14,92.24%] | 99.72% [99.68,99.76%] | 95.81% [95.76,95.85%] | 99.15% [99.11,99.19%] |
| Random forest | 98.44% [98.41,98.47%] | |||
| Bagging with C4.5 decision tree | 92.25% [92.22,92.28%] | 99.74% [99.71,99.77%] | 95.85% [95.82,95.88%] | 99.93% [99.92,99.94%] |
| Voting | 94.34% [94.32,94.36%] | 99.66% [99.63,99.69%] | 96.93% [96.91,96.95%] | |
| Boosting with C4.5 decision tree | 95.51% [95.48,95.54%] | 97.67% [97.64,97.70%] |
*The bold in tables is the maximum value of that evaluation standard
Evaluation results of each model using test set B
| Learning method | Precision (95% CI) | Recall (95% CI) | F1-score (95%CI) | AUC (95% CI) |
|---|---|---|---|---|
| Logistic regression | 31.54% [31.50,31.58%] | 94.52% [94.48,94.56%] | 47.30% [47.25,47.35%] | 71.85% [71.82,71.88%] |
| Naïve Bayesian | 41.98% [41.92,42.04%] | 83.44% [83.40,83.48%] | 55.86% [55.80,55.92%] | 82.37% [82.33,82.41%] |
| Bayesian network | 42.95% [42.91,42.99%] | 84.12% [84.08,84.16%] | 56.87% [56.82,56.91%] | 83.06% [83.04,83.08%] |
| Decision tree(C4.5) | 33.18% [33.14,33.22%] | 95.55% [95.51,95.59%] | 49.26% [49.21,49.31%] | 71.15% [71.12,71.18%] |
| Neural network | 32.72% [32.69,32.75%] | 94.86% [94.84,94.88%] | 48.66% [48.62,48.69%] | 80.33% [80.31,80.35%] |
| Random forest | 92.81% [92.78,92.84%] | 82.52% [82.49,82.55%] | ||
| Bagging with C4.5 decision tree | 33.06% [33.04,33.08%] | 94.57% [94.52,94.62%] | 48.99% [48.96,49.02%] | 71.02% [70.98,71.06%] |
| Voting | 39.66% [39.62,39.70%] | 91.08% [91.03,91.13%] | 55.26% [55.21,55.31%] | |
| Boosting with C4.5 decision tree | 36.35% [36.30,36.40%] | 52.71% [52.65,52.76%] | 80.27% [80.25,80.29%] |
*The bold in tables is the maximum value of that evaluation standard
Evaluation results of each model using screening data in 2016
| Learning method | Precision (95% CI) | Recall (95% CI) | F1-score (95%CI) | AUC (95% CI) |
|---|---|---|---|---|
| Logistic regression | 90.56% [90.52,90.60%] | 96.35% [96.31,96.39%] | 93.37% [93.33,93.41%] | 97.96% [99.09,99.19%] |
| Naïve Bayesian | 66.96% [66.93,66.99%] | 94.99% [94.95,95.03%] | 78.55% [78.51,78.58%] | 96.64% [96.62,96.66%] |
| Bayesian network | 67.50% [67.47,67.53%] | 93.85% [93.80,93.90%] | 78.52% [78.49,78.56%] | 96.86% [96.82,96.90%] |
| Decision tree(C4.5) | 91.95% [91.90,92.00%] | 98.12% [98.09,98.15%] | 94.93% [94.89,94.98%] | 99.36% [99.33,99.39%] |
| Neural network | 91.82% [91.78,91.86%] | 98.52% [98.49,98.55%] | 95.05% [95.02,95.09%] | 99.23% [99.20,99.26%] |
| Random forest | 95.76% [95.74,95.78%] | 96.32% [96.30,96.35%] | ||
| Bagging with C4.5 decision tree | 92.21% [92.19,92.23%] | 98.86% [98.83,98.89%] | 95.42% [95.39,95.44%] | 99.39% [99.92,99.94%] |
| Voting | 92.12% [92.07,92.17%] | 98.98% [98.96,99.00%] | 95.43% [95.39,95.46%] | 99.39% [99.36,99.42%] |
| Boosting with C4.5 decision tree | 94.89% [94.85,94.93%] |
*The bold in tables is the maximum value of that evaluation standard
Estimation results of supplementing to current screening methods
| Learning method | Increased number of identited high-risk people | Number of misidentited high-risk people |
|---|---|---|
| Logistic regression | 5586 | 16,492 |
| Naïve Bayesian | 4931 | 13,977 |
| Bayesian network | 4971 | 13,743 |
| Decision tree(C4.5) | 5647 | 16,097 |
| Neural network | 5606 | 16,208 |
| Random forest | 5485 | 11,722 |
| Bagging with C4.5 decision tree | 5589 | 16,126 |
| Voting | 5383 | 14,536 |
| Boosting with C4.5 decision tree | 5663 | 15,333 |