| Literature DB >> 35597855 |
Xiaohan Yuan1, Shuyu Chen2, Chuan Sun1, Lu Yuwen1.
Abstract
Chronic diseases are one of the most severe health issues in the world, due to their terrible clinical presentations such as long onset cycle, insidious symptoms, and various complications. Recently, machine learning has become a promising technique to assist the early diagnosis of chronic diseases. However, existing works ignore the problems of feature hiding and imbalanced class distribution in chronic disease datasets. In this paper, we present a universal and efficient diagnostic framework to alleviate the above two problems for diagnosing chronic diseases timely and accurately. Specifically, we first propose a network-limited polynomial neural network (NLPNN) algorithm to efficiently capture high-level features hidden in chronic disease datasets, which is data augmentation in terms of its feature space and can also avoid over-fitting. Then, to alleviate the class imbalance problem, we further propose an attention-empowered NLPNN algorithm to improve the diagnostic accuracy for sick cases, which is also data augmentation in terms of its sample space. We evaluate the proposed framework on nine public and two real chronic disease datasets (partly with class imbalance). Extensive experiment results demonstrate that the proposed diagnostic algorithms outperform state-of-the-art machine learning algorithms, and can achieve superior performances in terms of accuracy, recall, F1, and G_mean. The proposed framework can help to diagnose chronic diseases timely and accurately at an early stage.Entities:
Mesh:
Year: 2022 PMID: 35597855 PMCID: PMC9123399 DOI: 10.1038/s41598-022-12574-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Flowchart of the proposed algorithms: (a) NLPNN; (b) AEPNN.
The composition details of chronic disease datasets.
| Datasets | Features | Samples | Positive:Negative | Missing |
|---|---|---|---|---|
| CKD | 24 | 400 | 1:0.6 | No |
| PIDD | 8 | 768 | 1:1.87 | No |
| T2DM | 40 | 5642 | 1:11.19 | Yes |
| CVD | 11 | 70000 | 1:1.001 | No |
| Heart | 13 | 1025 | 1:0.95 | No |
| GDM | 83 | 1000 | 1:1.13 | Yes |
| Fra_Heart | 15 | 4240 | 1:5.58 | Yes |
| Hep | 19 | 155 | 1:0.24 | Yes |
| BCW | 10 | 699 | 1:1.9 | Yes |
| Pri_hyper | 33 | 9091 | 1:1.13 | No |
| Pri_diab | 28 | 14,525 | 1:12.78 | No |
The bijective relationship between and .
| Regularization factor | ||||||
|---|---|---|---|---|---|---|
| Depth | 2 | 1 | 2 | 3 | 4 | 5 |
| 3 | 6 | 7 | 8 | 9 | 10 | |
| 4 | 11 | 12 | 13 | 14 | 15 | |
| 5 | 16 | 17 | 18 | 19 | 20 | |
Figure 2Training and test performance versus on eleven chronic disease datasets.
Optimal parameter settings for different datasets.
| Network parameter | |||
|---|---|---|---|
| Dataset | CKD | ||
| PIMA | |||
| T2DM | |||
| CVD | |||
| Heart | |||
| GDM | |||
| Fra_Heart | |||
| Hep | |||
| BCW | |||
| Pri_hyper | |||
| Pri_diab | |||
Performance of different algorithms.
| Abbreviation | Acc | Re | F1_score | Abbreviation | Acc | Re | F1_score | ||
|---|---|---|---|---|---|---|---|---|---|
| CKD | SVM | 0.9875 | 0.9800 | 0.9899 | Fra_Heart | SVM | 0.8349 | 0.0000 | 0.0000 |
| LR | 0.9875 | 0.9800 | 0.9899 | LR | 0.8420 | 0.1000 | |||
| KNN | 0.9500 | 0.9200 | 0.9583 | KNN | 0.8337 | 0.0357 | 0.0662 | ||
| DT | 0.9750 | 0.9600 | 0.9796 | DT | 0.8361 | 0.0643 | 0.1146 | ||
| MLP | 0.9875 | 0.9800 | 0.9899 | MLP | 0.8314 | 0.2099 | |||
| 0.0614 | 0.1148 | ||||||||
| PIDD | SVM | 0.7792 | 0.5517 | 0.6531 | Hep | SVM | 0.8000 | 1.0000 | 0.8846 |
| LR | 0.7987 | 0.6207 | 0.6990 | LR | 0.8333 | 0.9565 | 0.8979 | ||
| KNN | 0.7662 | 0.4827 | 0.6086 | KNN | 0.8000 | 0.9130 | 0.8749 | ||
| DT | 0.7468 | 0.4310 | 0.5618 | DT | 0.8333 | 1.0000 | 0.9019 | ||
| MLP | 0.6559 | 0.2414 | 0.3457 | MLP | 0.8000 | 0.9130 | 0.8749 | ||
| 1.0000 | |||||||||
| T2DM | SVM | 0.9179 | 0.0000 | 0.0000 | BCW | SVM | 0.9714 | 0.9545 | 0.9545 |
| LR | 0.9202 | LR | 0.9714 | 0.9545 | 0.9545 | ||||
| KNN | 0.9164 | 0.0092 | 0.0176 | KNN | 0.9714 | 0.9545 | 0.9545 | ||
| DT | 0.9187 | 0.0092 | 0.0182 | DT | 0.9500 | 0.9545 | 0.9231 | ||
| MLP | 0.9179 | 0.0092 | 0.0180 | MLP | 0.3143 | 1.0000 | 0.4783 | ||
| 0.0097 | 0.0192 | 1.0000 | |||||||
| CVD | SVM | 0.7244 | 0.6401 | 0.6977 | Pri_hyper | SVM | 0.7372 | 0.6162 | 0.6859 |
| LR | 0.7249 | 0.6858 | 0.7125 | LR | 0.7350 | 0.6494 | 0.6953 | ||
| KNN | 0.6354 | 0.5389 | 0.5950 | KNN | 0.7570 | 0.7216 | |||
| DT | 0.7247 | 0.6730 | 0.7085 | DT | 0.7224 | 0.4663 | 0.6100 | ||
| MLP | 0.5385 | 0.6789 | MLP | 0.7009 | 0.6210 | 0.6591 | |||
| 0.6847 | 0.6706 | ||||||||
| Heart | SVM | 0.8780 | 0.9307 | 0.8826 | Pri_diab | SVM | 0.9280 | 0.0000 | 0.0000 |
| LR | 0.8780 | 0.9109 | 0.8804 | LR | 0.9315 | 0.0478 | 0.0913 | ||
| KNN | 0.9024 | 0.8911 | 0.9000 | KNN | 0.9294 | ||||
| DT | 0.8488 | 0.8317 | 0.8442 | DT | 0.9325 | 0.0718 | 0.1327 | ||
| MLP | 0.8878 | 0.8911 | 0.8866 | MLP | 0.9322 | 0.0861 | 0.1545 | ||
| 0.0053 | 0.0106 | ||||||||
| GDM | SVM | 0.6500 | 0.5591 | 0.5977 | |||||
| LR | 0.6150 | 0.5376 | 0.5649 | ||||||
| KNN | 0.6200 | 0.4194 | 0.5064 | ||||||
| DT | 0.6950 | 0.5269 | 0.6164 | ||||||
| MLP | 0.6250 | 0.4839 | 0.5455 | ||||||
The best results for each dataset are marked in bold.
Figure 3ROC curves of different algorithms with the corresponding AUC values on chronic disease datasets.
Figure 4The test performance versus number of iteration on Fra_Heart dataset: (a) generalization performance; (b) performance growth rate.
Figure 5The test performance versus number of iteration on T2DM dataset: (a) generalization performance; (b) performance growth rate.
Figure 6The test performance versus number of iteration on Pri_diab dataset: (a) generalization performance; (b) performance growth rate.
Figure 7The test performance versus number of iteration on CVD dataset: (a) generalization performance; (b) performance growth rate.
Performance of two proposed algorithms on four datasets with class imbalance.
| Abbreviation | Acc | Re | F1 | G_mean | |
|---|---|---|---|---|---|
| T2DM | NLPNN | 0.9232 | 0.0097 | 0.0192 | 0.0985 |
AEPNN (10/10) | 0.8095 | 0.3883 | 0.2402 | 0.5728 | |
| CVD | NLPNN | 0.7265 | 0.6847 | 0.7161 | 0.7256 |
AEPNN (10/10) | 0.7279 | 0.6810 | 0.7160 | 0.7267 | |
| Fra_Heart | NLPNN | 0.8726 | 0.0614 | 0.1148 | 0.2476 |
AEPNN (6/10) | 0.8219 | 0.2456 | 0.2705 | 0.4731 | |
| Pri_diab | NLPNN | 0.9360 | 0.0053 | 0.0106 | 0.0731 |
AEPNN (10/10) | 0.9098 | *0.3048 | 0.3032 | 0.5385 | |