| Literature DB >> 31557898 |
Musa Uba Muhammad1, Ren Jiadong2, Noman Sohail Muhammad3, Munawar Hussain4, Irshad Muhammad5.
Abstract
A chronic disease diabetes mellitus is assuming pestilence proportion worldwide. Therefore prevalence is important in all aspects. Researchers have introduced various methods, but still, the improvement is a need for classification techniques. This paper considers data mining approach and principal component analysis (PCA) techniques, on a single platform to approaches on the polytomous variable-based classification of diabetes mellitus and some selected chronic diseases. The PCA result shows eigenvalues, and the total variance is explained for the principal components (PCs) solution. Total of twelve attributes was analyzed with the intention to precise the pattern of the correlation with minimum factors as possible. Usually, factors with large eigenvalues retained. The first five components have their eigenvalues large enough to be retained. Their variances are 18.9%, 14.0%, 13.6%, 10.3%, and 8.6%, respectively. That explains ~65.3% of the total variance. We further applied K-means clustering with the aid of the first two PCs. As well, correlation results between diabetes mellitus and selected diseases; it has revealed that diabetes patients are more likely to have kidney and hypertension. Therefore, the study validates the proposed polytomous method for classification techniques. Such a study is important in better assessment on low socio-economic status zone regions around the globe.Entities:
Keywords: PCA; cardiovascular problem; classification; correlation coefficient; data mining; diabetes mellitus; eigenvalues; hypertension; variance
Year: 2019 PMID: 31557898 PMCID: PMC6801713 DOI: 10.3390/ijerph16193593
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Variable categorization platform.
Figure 2(a–g) Data flow for the categorized variables of interest.
Statistics.
| Diabetic Conditions | Instance ( | Weight (Kg) | Age (Year) | Instance ( | ||
|---|---|---|---|---|---|---|
| GTD | IND | NID | ||||
| GTD | 11 | 65.6 ± 8.96 | <20 | 0 | 1 | 0 |
| IND | 7 | 60.5 ± 7.56 | 20–40 | 6 | 1 | 45 |
| NID | 263 | 62.4 ± 12.85 | 40–65 | 5 | 3 | 136 |
| ≥65 | 0 | 2 | 82 | |||
Eigenvalues and explained variances for the principal components analysis (PCA) result.
| DIM | EV | VP | CVP |
|---|---|---|---|
| Dim 1 | 2.2637060 | 18.86 | 18.86 |
| Dim 2 | 1.6831058 | 14.03 | 32.89 |
| Dim 3 | 1.6337142 | 13.61 | 46.50 |
| Dim 4 | 1.2350124 | 10.29 | 56.79 |
| Dim 5 | 1.0306272 | 8.59 | 65.38 |
| Dim 6 | 0.9044255 | 7.54 | 72.92 |
| Dim 7 | 0.8012641 | 6.68 | 79.60 |
| Dim 8 | 0.7679526 | 6.40 | 86.00 |
| Dim 9 | 0.5675913 | 4.73 | 90.73 |
| Dim 10 | 0.5353058 | 4.46 | 95.19 |
| Dim 11 | 0.4093681 | 3.41 | 98.60 |
| Dim 12 | 0.1679269 | 1.40 | 100.00 |
Legends: DIM: Dimension; EV: Eigenvalues; VP: Variance percentage; CVP: Cumulative variance percentage.
Figure 3Scree plot for the percentage of explained variances by each component.
Figure 4(a,b) Bar plots for the individual variables contribution to PCs in Dim-1 and Dim-2.
Figure 5The data reduction process flow.
Figure 6Represents the K-means clusters assessment for the attribute residential suburb. (Legends: VL: Village; TW: Town; CY: City).
For the attributes.
| AGE | GLU | DBP | BMI | WGT | OCP | SEX | DIT | MST | RSB | LOE | DCD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AGE | 1.00 | 0.52 | 0.17 | −0.07 | 0.01 | 0.02 | −0.01 | −0.09 | −0.03 | −0.03 | −0.02 | −0.06 |
| GLU | 1.00 | 0.18 | −0.11 | 0.04 | 0.00 | −0.03 | 0.00 | −0.05 | −0.01 | 0.11 | 0.06 | |
| DBP | 1.00 | −0.01 | 0.09 | 0.02 | −0.08 | −0.11 | −0.01 | −0.09 | −0.03 | 0.01 | ||
| BMI | 1.00 | 0.81 | 0.05 | −0.09 | 0.19 | 0.22 | 0.22 | 0.05 | 0.06 | |||
| WGT | 1.00 | 0.14 | −0.01 | 0.17 | 0.29 | 0.24 | 0.11 | 0.08 | ||||
| OCP | 1.00 | 0.39 | 0.08 | 0.33 | 0.02 | 0.32 | 0.08 | |||||
| SEX | 1.00 | −0.01 | 0.14 | −0.07 | 0.30 | 0.03 | ||||||
| DIT | 1.00 | 0.02 | 0.14 | 0.26 | −0.02 | |||||||
| MST | 1.00 | 0.07 | −0.02 | 0.02 | ||||||||
| RSB | 1.00 | 0.06 | −0.05 | |||||||||
| LOE | 1.00 | 0.05 | ||||||||||
| DCD | 1.00 |
Legends: AGE: Patients age; GLU: Patients glucose level; DBP: Patients diastolic blood pressure; BMI: Patients body mass index; WGT: Patients weight; OCP: Patients occupation status; SEX: Patients sex; DIT: Diet took by patient; MST: Patients marital status; LOE: Patients level of education; DCD: Patients diabetic condition.
Figure 7Correlation Matrix.
Figure 8Correlation Circle.
Results for some selected symptoms of diabetes mellitus about other chronic diseases.
| SFH | VMT | FTG | SCE | DRV | EYP | PCJ | SOB | SWG | NRV | HIT | SBF | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 0.22 | 0.21 | 0.28 | −0.06 | −0.09 | 0.03 | 0.16 | 0.29 | 0.23 | 0.02 | −0.09 | 0.08 |
|
| 0.16 | 0.07 | 0.30 | 0.22 | 0.08 | 0.14 | 0.01 | 0.27 | 0.21 | 0.07 | −0.10 | −0.05 |
|
| −0.07 | −0.02 | 0.06 | 0.23 | 0.01 | −0.03 | 0.11 | 0.02 | −0.08 | 0.22 | 0.11 | 0.27 |
|
| 0.09 | 0.20 | 0.43 | 0.15 | 0.14 | 0.11 | 0.06 | 0.42 | 0.14 | 0.04 | −0.04 | 0.04 |
|
| −0.01 | 0.01 | 0.03 | 0.37 | 0.01 | 0.09 | 0.11 | 0.05 | −0.02 | 0.23 | 0.20 | 0.21 |
|
| 0.19 | 0.09 | 0.10 | 0.27 | 0.23 | 0.21 | 0.14 | 0.06 | 0.15 | 0.21 | 0.18 | 0.04 |
|
| 0.18 | 0.16 | 0.12 | −0.02 | 0.21 | 0.16 | 0.08 | 0.11 | 0.20 | −0.07 | −0.08 | −0.02 |
|
| 0.31 | 0.18 | 0.02 | 0.08 | 0.09 | 0.17 | 0.25 | 0.04 | 0.19 | 0.30 | 0.16 | 0.24 |
|
| 0.13 | 0.28 | 0.01 | −0.04 | 0.03 | 0.03 | 0.08 | 0.08 | 0.07 | 0.28 | 0.29 | 0.34 |
|
| 0.25 | 0.23 | 0.04 | −0.01 | 0.02 | 0.05 | 0.25 | 0.06 | 0.08 | 0.20 | 0.27 | 0.15 |
Legends: EXT: Patients suffering from excessive thirst; FRU: Patients suffering from frequent urination; WLG: Patients suffering from unexplained weight loss or gain; FLS: Patients suffering from flulike symptoms; BRV: Patients suffering from blurred vision; IRT: Patients suffering from irritability; SHC: Patients suffering from slow healing on cut or bruise; TLF: Patients suffering from tingling or loss of feeling in hand or feet; RIG: Patients suffering from recurring infection on gum or skin; RIV: Patients suffering from recurring vaginal/bladder infection; SFH: Patients suffering from swelling on ankle, feet, or hand; VMT: Patients suffering from vomiting; FTG: Patients suffering from fatigue; SCE: Patients suffering from spiders, cobwebs, or tiny specks in eye; DRV: Patients suffering from dark streaks or a red that blocks vision; EYP: Patients suffering from eye pain; PCJ: Patients suffering from pain in chest, jaw, or arm; SOB: Patients suffering from shortness of breath; SWG: Patients suffering from swelling (edema); NRV: Patients suffering from nervousness; HIT: Patients suffering from heat intolerance; SBF: Patients suffering from slowing in body function.