| Literature DB >> 34799627 |
Abstract
Chronic disease prediction is a critical task in healthcare. Existing studies fulfil this requirement by employing machine learning techniques based on patient features, but they suffer from high dimensional data problems and a high level of bias. We propose a framework for predicting chronic disease based on Graph Neural Networks (GNNs) to address these issues. We begin by projecting a patient-disease bipartite graph to create a weighted patient network (WPN) that extracts the latent relationship among patients. We then use GNN-based techniques to build prediction models. These models use features extracted from WPN to create robust patient representations for chronic disease prediction. We compare the output of GNN-based models to machine learning methods by using cardiovascular disease and chronic pulmonary disease. The results show that our framework enhances the accuracy of chronic disease prediction. The model with attention mechanisms achieves an accuracy of 93.49% for cardiovascular disease prediction and 89.15% for chronic pulmonary disease prediction. Furthermore, the visualisation of the last hidden layers of GNN-based models shows the pattern for the two cohorts, demonstrating the discriminative strength of the framework. The proposed framework can help stakeholders improve health management systems for patients at risk of developing chronic diseases and conditions.Entities:
Mesh:
Year: 2021 PMID: 34799627 PMCID: PMC8604920 DOI: 10.1038/s41598-021-01964-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
ICD-9-AM and ICD-10-AM codes for cardiovascular disease (CVD) and chronic pulmonary disease (CPD).
| CVD | CPD | |
|---|---|---|
| ICD-9-AM codes | 398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93, 428.x, 426.10, 426.11, 426.13, 426.2–426.53, 426.6–426.28, 427.0, 427.2, 427.31, 427.60, 427.9, 785.0, V45.0, V53.3, 093.2, 394.0–397.1, 424.0–424.91, 746.3–746.6, V42.2, V43.3, 416.x, 417.9, 440.x, 441.2, 441.4, 441.7, 441.9, 443.1–443.9, 447.1, 557.1, 557.9, V43.4 | 416.8, 416.9, 490.x–505.x, 506.4, 508.1, 508.9 |
| ICD-10-AM codes | I09.9, I1.0, I13.0, I13.2, I25.5, I42.0, I42.5–I42.9, 143.x, 150.x, P29.0, I44.1–I44.3, I45.6, I45.9, I47.x, R00.0, R00.1, R00.8, T82.1, Z45.0, Z95.0, A52.0, I05.x–108.x, I09.1, I09.8, I34.x–I39.x, Q23.0–Q23.3, Z95.2–Z95.4, I26.x, I27.x, I28.0, I28.8, I28.9, I70.x, I71.x, I73.1, I73.8, I73.9, I77.1, I79.0, I79.2, K55.1, K55.8, K55.9, Z95.8, Z95.9 | I27.8, I27.9, J40.x–J47.x, J60.x–J67.x, J68.4, J70.1, J70.3 |
Figure 1(a) Illustration of the process for constructing a weighted patient network. (b) The workflow of graph neural network (GNN)-based models for disease prediction. X1, X2, …, Xn are the input features, and Z1, Z2, …, Zn are the output of the last layer in the GNN-based model. (c) Block diagram of the proposed GNN-based framework.
Characteristics of the patient network.
| Characteristics | WPN for CVD | WPN for CPD |
|---|---|---|
| Number of nodes | 2537 | 989 |
| Number of edges | 138,108 | 31,174 |
| Average degree | 108,875 | 63.041 |
Performance of models based on CVD test data. TP, TN, FP and FN stand for True Positive, True Negative, False Positive and False Negative, respectively.
| Features | Method | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | TP | TN | FP | FN |
|---|---|---|---|---|---|---|---|---|---|
| With network features | LR | 76.25 | 76.5 | 76.28 | 76.2 | 210 | 188 | 75 | 49 |
| SVM | 77.97 | 78.5 | 77.97 | 77.88 | 219 | 188 | 75 | 40 | |
| RF | 86.59 | 86.69 | 86.59 | 86.58 | 217 | 235 | 28 | 42 | |
| ANN | 85.06 | 85.8 | 85.06 | 84.97 | 201 | 243 | 20 | 58 | |
| GCN | 90.80 | 90.85 | 90.80 | 90.80 | 239 | 235 | 28 | 20 | |
| GAT | 92.34 | 92.70 | 92.34 | 92.32 | 251 | 231 | 32 | 8 | |
| Without network features | LR | 71.26 | 71.54 | 71.26 | 71.19 | 198 | 174 | 89 | 61 |
| SVM | 67.82 | 69.74 | 67.82 | 67.08 | 215 | 139 | 124 | 44 | |
| RF | 65.52 | 65.62 | 65.52 | 65.52 | 171 | 171 | 92 | 88 | |
| ANN | 71.84 | 71.97 | 71.84 | 71.82 | 195 | 180 | 83 | 64 | |
| GCN | 90.04 | 90.05 | 90.04 | 90.04 | 231 | 239 | 24 | 28 | |
| GAT | 93.49 | 93.86 | 93.49 | 93.48 | 254 | 234 | 29 | 5 |
Performance of models based on CPD test data. TP, TN, FP and FN stand for True Positive, True Negative, False Positive and False Negative, respectively.
| Features | Method | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | TP | TN | FP | FN |
|---|---|---|---|---|---|---|---|---|---|
| With network features | LR | 66.98 | 67.23 | 66.98 | 67.05 | 79 | 63 | 32 | 38 |
| SVM | 66.04 | 66.06 | 66.04 | 65.16 | 93 | 47 | 48 | 24 | |
| RF | 76.42 | 77.39 | 76.42 | 76.47 | 84 | 78 | 17 | 33 | |
| ANN | 72.17 | 72.14 | 72.17 | 72.16 | 88 | 65 | 30 | 29 | |
| GCN | 85.38 | 86.57 | 85.38 | 85.07 | 112 | 69 | 26 | 5 | |
| GAT | 89.15 | 89.47 | 89.15 | 89.06 | 111 | 78 | 17 | 6 | |
| Without network features | LR | 58.96 | 63.77 | 58.96 | 57.59 | 48 | 77 | 18 | 69 |
| SVM | 61.32 | 60.95 | 61.32 | 60.32 | 88 | 42 | 53 | 29 | |
| RF | 60.38 | 60.98 | 60.38 | 60.48 | 69 | 59 | 36 | 48 | |
| ANN | 61.79 | 61.47 | 61.79 | 60.72 | 83 | 48 | 47 | 34 | |
| GCN | 87.26 | 88.31 | 87.26 | 87.03 | 113 | 72 | 23 | 4 | |
| GAT | 89.15 | 90.04 | 89.15 | 88.99 | 114 | 75 | 20 | 3 |
The accuracy measure of different models for CVD and CPD without the consideration of edge weight in the patient network.
| Features | Method | Accuracy for CVD (%) | Accuracy for CPD (%) |
|---|---|---|---|
| With network features (i.e., degree centrality, eigenvector centrality etc.) | LR | 76.63 | 69.34 |
| SVM | 79.89 | 66.51 | |
| RF | 86.59 | 79.25 | |
| ANN | 83.91 | 73.11 | |
| GCN | 87.26 | 85.85 | |
| GAT | 89.15 | 88.21 | |
| Without network features | GCN | 89.08 | 83.02 |
| GAT | 93.49 | 89.15 |
Figure 2The distribution of edge weight for CVD and CPD.
Figure 3(a) t-SNE visualisation of GNN-Based models embeddings for CVD. (b) t-SNE visualisation of GNN-Based models embeddings for CPD. Node colours denote labels.