| Literature DB >> 34860663 |
Eman M Alanazi1,2, Aalaa Abdou3, Jake Luo4.
Abstract
BACKGROUND: Stroke, a cerebrovascular disease, is one of the major causes of death. It causes significant health and financial burdens for both patients and health care systems. One of the important risk factors for stroke is health-related behavior, which is becoming an increasingly important focus of prevention. Many machine learning models have been built to predict the risk of stroke or to automatically diagnose stroke, using predictors such as lifestyle factors or radiological imaging. However, there have been no models built using data from lab tests.Entities:
Keywords: lab tests; machine learning technology; predictive analytics; stroke
Year: 2021 PMID: 34860663 PMCID: PMC8686476 DOI: 10.2196/23440
Source DB: PubMed Journal: JMIR Form Res ISSN: 2561-326X
Figure 1Flow diagram of the study methodology. NHANES: National Health and Nutrition Examination Survey.
Figure 2Participant selection and prevalence of stroke in the National Health and Nutrition Examination Survey (NHANES).
List of the data attributes.
| Featurea | Units |
| Age | Years |
| Gender | N/Ab |
| Albumin, urine | ug/mL |
| Creatinine, urine | mg/dL |
| White blood cell count | 1000 cells/μL |
| Lymphocytes | 1000 cells/μL |
| Monocytes | 1000 cells/μL |
| Segmented neutrophils | 1000 cells/μL |
| Eosinophils | 1000 cells/μL |
| Basophils | 1000 cells/μL |
| Red blood cell count | Million cells/μL |
| Hemoglobin | g/dL |
| Hematocrit | % |
| Mean cell volume | fL |
| Mean cell hemoglobin | pg |
| Mean corpuscular hemoglobin concentration | g/dL |
| Red cell distribution width | % |
| Platelet count | 1000 cells/μL |
| Mean platelet volume | fL |
| Cotinine, serum | ng/mL |
| Red blood cell folate | mg/dL |
aAll data types were numeric, except for “gender,” which was nominal.
bN/A: not applicable; this type of data did not have units.
Results of three data analysis techniques.
| Technique and classifier | Accuracy | Sensitivity | Specificity | PPVa | NPVb | AUCc | |||
|
| |||||||||
|
| Naïve Bayes | 0.82 | 0.34 | 0.88 | 0.27 | 0.91 | 0.76 | ||
| BayesNet | 0.82 | 0.38 | 0.89 | 0.37 | 0.90 | 0.88 | |||
| Decision tree | 0.83 | 0.33 | 0.87 | 0.14 | 0.95 | 0.73 | |||
| Random forest | 0.86 | 0.55 | 0.86 | 0.01 | 0.99 | 0.87 | |||
|
| |||||||||
|
| Naïve Bayes | 0.81 | 0.32 | 0.88 | 0.25 | 0.91 | 0.74 | ||
| BayesNet | 0.86 | 0.53 | 0.92 | 0.54 | 0.92 | 0.85 | |||
| Decision tree | 0.88 | 0.61 | 0.91 | 0.46 | 0.95 | 0.74 | |||
| Random forest | 0.90 | 0.89 | 0.90 | 0.33 | 0.99 | 0.85 | |||
|
| |||||||||
|
| Naïve Bayes | 0.82 | 0.33 | 0.88 | 0.29 | 0.90 | 0.74 | ||
| BayesNet | 0.87 | 0.53 | 0.93 | 0.57 | 0.92 | 0.85 | |||
| Decision tree | 0.93 | 0.76 | 0.95 | 0.72 | 0.96 | 0.86 | |||
| Random forest | 0.96 | 0.97 | 0.96 | 0.75 | 0.99 | 0.97 | |||
aPPV: positive predictive value.
bNPV: negative predictive value.
cAUC: area under the curve.
Figure 3Performance comparison among three data selection techniques for the decision tree model. AUC: area under the curve; NPV: negative predictive value; PPV: positive predictive value.
Figure 4Performance comparison among three data selection techniques for the random forest model. AUC: area under the curve; NPV: negative predictive value; PPV: positive predictive value.
Pearson correlation coefficient values of independent predictors.
| Independent predictor of stroke | Pearson correlation coefficient ( |
| Age | 0.26 |
| Gender | 0.13 |
| Red cell distribution width (%) | 0.18 |
| Lymphocytes (%) | 0.15 |
| Red blood cell folate (ng/mL) | 0.13 |
| Segmented neutrophils (%) | 0.12 |
| Hemoglobin (g/dL) | 0.11 |
| Red blood cell count (million cells/μL) | 0.11 |
| Hematocrit (%) | 0.09 |
| Lymphocytes (1000 cells/μL) | 0.08 |
| Segmented neutrophils (1000 cell/μL) | 0.07 |