| Literature DB >> 31731612 |
Mauricio Barrios1,2, Miguel Jimeno2, Pedro Villalba3, Edgar Navarro4.
Abstract
Metabolic Syndrome (MetS) is a cluster of risk factors that increase the likelihood of heart disease and diabetes mellitus. It is crucial to get diagnosed with time to take preventive measures, especially for patients in locations without proper access to laboratories and medical consultations. This work presented a new methodology to diagnose diseases using data mining that documents all the phases thoroughly for further improvement of the resulting models. We used the methodology to create a new model to diagnose the syndrome without using biochemical variables. We compared similar classification models, using their reported variables and previously obtained data from a study in Colombia. We built a new model and compared it to previous models using the holdout, and random subsampling validation methods to get performance evaluation indicators between the models. Our resulting ANN model used three hidden layers and only Hip Circumference, dichotomous Waist Circumference, and dichotomous blood pressure variables. It gave an Area Under Curve (AUC) of 87.75% by the IDF and 85.12% by HMS MetS diagnosis criteria, higher than previous models. Thanks to our new methodology, diagnosis models can be thoroughly documented for appropriate future comparisons, thus benefiting the diagnosis of the studied diseases.Entities:
Keywords: SEMMA; artificial neural networks; decision tree; design methodology; diabetes mellitus; heart disease; holdout; metabolic syndrome; principal component logistic regression; random subsampling
Year: 2019 PMID: 31731612 PMCID: PMC6963320 DOI: 10.3390/diagnostics9040192
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Diagnosis criteria.
| Risk Factors | ATP III(2001) | IDF(2006) and HMS(2009) |
|---|---|---|
| Central | Male: | Country/ethnic,specific |
| TG | >150 mg/dL | >150 mg/dL |
| FG | >100 mg/dL | >100 mg/dL |
| HDL-C | Male:<40 mg/dL | Male:<40 mg/dL |
| BP, SBP, DBP | SBP ≥ 130 mmHg | SBP ≥ 130 mmHg |
Figure 1Phase of the SEMMA methodology [26,27].
Figure 2RAMAD methodology.
Figure 3Selection criteria for choosing classification models for MetS diagnosis from the literature.
Criteria established for choosing the articles.
| Criteria for the Selection of Articles |
|---|
| Can be obtained the variables through first-level medical attention? Y/N |
| What diagnostic criteria do authors use? |
| What datamining technique do authors use? |
| What validation method do authors use? |
| What performance indicators do authors use? |
Variables and technique used by the authors.
| Authors | Kroon et al. [ | Romero et al. [ | Chen et al. [ | Kupusinac et al. [ |
|---|---|---|---|---|
| Age | E | E | ||
| Sex | I | E | E | E |
| Weight | I | E | I | I |
| Height | I | E | I | I |
| WHR: Waist to | E | |||
| WSR: Waist | E | |||
| BMI: Body | E | E | E | E |
| HC: Hip | E | |||
| WC: Waist | I | E | E | I |
| WCD: WC | E | |||
| BPD: Blood Pressure | E | |||
| SBP: Systolic | I | E | E | |
| DBP: Diastolic | I | E | E | |
| Technique | DT | ANN | PCLR, ANN | ANN |
DT:Decision Tree; PCLR: Principal Component Logistic Regression; ANN: Artificial neural networks; E: Explicit use of variable; I: Implicit use of variables
Figure 4Process of analysis of the classification models.
Figure 5Basic structure of the artificial neural network [41,42].
Figure 6Basic artificial neural.
Assessment rules of AUC [38].
| AUC | Discrimination Ability |
|---|---|
| AUC=0.5 | No discrimination |
| 0.5 < AUC < 0.7 | Regular |
| 0.7 ≤ AUC < 0.8 | Acceptable |
| 0.8 ≤ AUC < 0.9 | Excellent |
| AUC ≥ 0.9 | Outstanding |
Relationship between IDF and HMS in the database.
| Criteria | HMS | IDF | ||
|---|---|---|---|---|
| Gender | No MetS | MetS | No MetS | MetS |
| Men | 154 | 113 | 147 | 120 |
| Women | 209 | 139 | 206 | 142 |
| Total | 363 | 252 | 353 | 262 |
Statistic description of the total data of the study variables.
| Variables | MetS | No MetS | Total |
|---|---|---|---|
| AGE m(SD) | 47.62(17.49) | 38.89(15.96) | 42.61(17.17) |
| WC m(SD) | 99.81(11.33) | 87.24(11.91) | 92.59(13.21) |
| HC: m(SD) | 105.51(10.56) | 93.73(12.50) | 98.75(13.07) |
| BMI: m(SD) | 29.09(5.31) | 25.26(4.74) | 26.89(5.33) |
| WHR: m(SD) | 0.94(0.05) | 0.93(0.09) | 0.94(0.08) |
| WSR: m(SD) | 0.61(0.67) | 53.79(7.43) | 0.56(0.79) |
| SBP: m(SD) | 128,52(18,46) | 112,91(12,61) | 119.55(17.19) |
| DBP: m(SD) | 78.48(11.13) | 71.18(9.21) | 74.29(10.69) |
| SEX (Women/Men) | (142/120) | (206/147) | (348/267) |
Average(m); Standard deviation (SD); Women/Men.
Figure 7Dichotomous variables.
Parameter of the ANN [22].
| Parameter | Value |
|---|---|
| Training Function | Levenberg-Marquardt backpropagation |
| min_grad | 10 |
| mu | 10 |
| mu_dec | 0.1 |
| mu_inc | 10 |
| mu_max | 10 |
| HL function | hyperbolic tangent sigmoid |
| Out function | linear |
Correlations between obesity related variables of Universidad del Norte data.
| BMI | WC | HC | WSR | WHR | |
|---|---|---|---|---|---|
| BMI | 1.00 | 0.78 | 0.79 | 0.79 | 0.06 |
| WC | 0.78 | 1 | 0.86 | 0.92 | 0.32 |
| HC | 0.79 | 0.86 | 1 | 0.84 | −0.20 |
| WSR | 0.79 | 0.92 | 0.84 | 1 | 0.21 |
| WHR | 0.06 | 0.32 | −0.20 | 0.21 | 1 |
Correlation among HC, BPD and WCD.
| HC | BPD | WCD | |
|---|---|---|---|
| HC | 1.0000 | 0.3015 | 0.6459 |
| BPD | 0.3015 | 1.0000 | 0.2275 |
| WCD | 0.6459 | 0.2275 | 1.0000 |
Performance indicator versus technique using hold out validation with IDF criteria.
| DT [ | ANN25 [ | PCLR [ | ANN5 [ | |
|---|---|---|---|---|
| SS | 80.27% | 75.25% | 59.02% | 76.71% |
| SP | 74.17% | 58.62% | 0% | 71.67% |
| PPV | 82.92% | 66.97% | 100% | 84.82% |
| NPV | 70.63% | 68% | 0% | 59.72% |
| ACC | 77.89% | 67.39% | 59.02% | 75% |
| AUC | 76.78% | 76.06% | 49.86% | 80.95% |
Performance indicator versus technique using random subsampling validation with IDF criteria.
| ANN96 [ | ANN85 [ | ANN25 [ | ANN5 [ | ANN3 | |
|---|---|---|---|---|---|
| SS | 74.15% | 73.81% | 72.64% | 76.47% | 76.39% |
| SP | 66.03% | 65.89% | 63.78% | 70.22% | 82.52% |
| PPV | 77.58% | 77.65% | 76.21% | 80.41% | 90.24% |
| NPV | 60.8% | 60.18% | 58.47% | 64.31% | 59.46% |
| ACC | 70.61% | 70.34% | 68.79% | 73.7% | 77.46% |
| AUC | 76.04% | 75.8% | 77.13% | 81.75% | 87.75% |
Performance indicator versus technique using hold out validation with HMS criteria.
| DT [ | ANN25 [ | PCLR [ | ANN5 [ | |
|---|---|---|---|---|
| SP | 74.17% | 59.46% | 0% | 68.52% |
| PPV | 82.44% | 71.96% | 100% | 84.40% |
| NPV | 67.94% | 57.14% | 0% | 49.33% |
| ACC | 76.26% | 65.76% | 57.40% | 70.11% |
| AUC | 75.19% | 75.48% | 50.19% | 78.97% |
Performance indicator versus technique using random subsampling validation with HMS criteria.
| ANN96 [ | ANN85 [ | ANN25 [ | ANN5 [ | ANN3 | |
|---|---|---|---|---|---|
| SS | 71.62% | 72.22% | 69.44% | 75.37% | 75.08% |
| SP | 66.95% | 66.25% | 63.73% | 72.54% | 80.15% |
| PPV | 77.4% | 76.05% | 75.74% | 81.38% | 87.17% |
| NPV | 58.67% | 60.80% | 55.34% | 64.21% | 59.94% |
| ACC | 69.33% | 69.42% | 68.88% | 73.91% | 75.35% |
| AUC | 74.94% | 74.79% | 74.8% | 81.74% | 85.13% |
Figure 8AUC distribution for each ANN to diagnose using the IDF criteria.
Figure 9AUC distribution for each ANN to diagnose using the HMS criteria.