| Literature DB >> 32554386 |
Tianzhou Yang1, Li Zhang1, Liwei Yi2, Huawei Feng1, Shimeng Li1, Haoyu Chen2, Junfeng Zhu1, Jian Zhao1, Yingyue Zeng1, Hongsheng Liu1,3,4.
Abstract
BACKGROUND: Early diabetes screening can effectively reduce the burden of disease. However, natural population-based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes.Entities:
Keywords: machine learning; non-invasive attributes; screening; type 2 diabetes
Year: 2020 PMID: 32554386 PMCID: PMC7333074 DOI: 10.2196/15431
Source DB: PubMed Journal: JMIR Med Inform
Figure 1The data cleaning and feature selection process. Note that the feature selection process was run only in the NHANES 2011-2014 dataset. n: number of cases. p: number of features.
Factors associated with diabetes used to build the models.
| Feature | Crudea ORb (95% CI) | Adjustedc OR (95% CI) | |
| Age | 1.05 (1.05-1.06) | 1.05 (1.04-1.06) | <.001 |
| Sex | 0.82 (0.70-0.97) | 0.62 (0.50-0.76) | <.001 |
| Waistline | 1.04 (1.03-1.05) | 0.99 (0.97-1.01) | .27 |
| Sagittal abdominal diameter | 1.20 (1.18-1.22) | 1.16 (1.09-1.24) | <.001 |
| Relative leg length | 0.70 (0.66-0.74) | 0.85 (0.79-0.91) | <.001 |
| 60 second pulse | 1.02 (1.01-1.02) | 1.02 (1.01-1.03) | <.001 |
| Smoking | 0.74 (0.63-0.88) | 1.13 (0.92-1.38) | .26 |
| Alcohol | 1.43 (1.19-1.72) | 1.31 (1.04-1.66) | .02 |
| Hypertension | 3.26 (2.72-3.90) | 1.02 (0.82-1.27) | .86 |
| Family history | 0.28 (0.24-0.34) | 0.32 (0.26-0.39) | <.001 |
| General health condition | 2.05 (1.88-2.24) | 1.59 (1.44-1.76) | <.001 |
| Control or loss of weight | 0.42 (0.35-0.51) | 0.55 (0.44-0.69) | <.001 |
aCrude: 1-way logistic regression.
bOR: odds radio.
cAdjusted: multiple logistic regression.
Average results (SD) of the 5-fold cross-validation of the models in the training set.
| Method | AUCa | Sensitivity | Specificity | Accuracy | PPVb | |
|
| ||||||
|
| Linear discriminant analysis | 0.844 (0.016) | 0.741 (0.035) | 0.795 (0.015) | 0.787 (0.013) | 0.402 (0.020) |
|
| Random forest | 0.823 (0.016) | 0.862 (0.029) | 0.612 (0.019) | 0.651 (0.015) | 0.292 (0.011) |
|
| Support vector machine | 0.808 (0.015) | 0.692 (0.035) | 0.811 (0.017) | 0.792 (0.014) | 0.405 (0.023) |
|
| ||||||
|
| EEc linear discriminant analysis | 0.845 (0.016) | 0.797 (0.032) | 0.735 (0.016) | 0.745 (0.014) | 0.358 (0.017) |
|
| EE random forest | 0.834 (0.016) | 0.784 (0.033) | 0.732 (0.016) | 0.740 (0.014) | 0.352 (0.016) |
|
| EE support vector machine | 0.842 (0.016) | 0.787 (0.034) | 0.748 (0.017) | 0.754 (0.014) | 0.367 (0.018) |
aAUC: area under the curve.
bPPV: positive predictive value.
cEE: easy ensemble method.
Figure 2Comparison of the top 50 models with the easy ensemble method and the simple method with different machine learning methods and 5-fold cross-validation in the training set. AUC: area under the curve. LDA: linear discriminant analysis. RF: random forest. SVM: support vector machine.
Performance of the simple and ensemble methods in the text and validation sets.
| Method | AUCa | Sensitivity | Specificity | Accuracy | PPVb | ||
|
| |||||||
|
|
| ||||||
|
|
| Linear discriminant analysis | 0.864 | 0.697 | 0.829 | 0.808 | 0.429 |
|
|
| Random forest | 0.836 | 0.830 | 0.648 | 0.676 | 0.303 |
|
|
| Support vector machine | 0.796 | 0.630 | 0.864 | 0.827 | 0.460 |
|
|
| ||||||
|
|
| EEc linear discriminant analysis | 0.867 | 0.758 | 0.777 | 0.774 | 0.385 |
|
|
| EE random forest | 0.850 | 0.776 | 0.770 | 0.771 | 0.383 |
|
|
| EE support vector machine | 0.861 | 0.752 | 0.783 | 0.778 | 0.390 |
|
| |||||||
|
|
| ||||||
|
|
| Linear discriminant analysis | 0.846 | 0.759 | 0.762 | 0.761 | 0.418 |
|
|
| Random forest | 0.828 | 0.888 | 0.594 | 0.648 | 0.331 |
|
|
| Support vector machine | 0.811 | 0.720 | 0.789 | 0.776 | 0.435 |
|
|
| ||||||
|
|
| EEc linear discriminant analysis | 0.849 | 0.819 | 0.709 | 0.730 | 0.389 |
|
|
| EE random forest | 0.836 | 0.813 | 0.713 | 0.731 | 0.390 |
|
|
| EE support vector machine | 0.848 | 0.824 | 0.714 | 0.734 | 0.394 |
aAUC: area under the curve.
bPPV: positive predictive value.
cEE: easy ensemble method.