| Literature DB >> 34945799 |
Valeria Maeda-Gutiérrez1, Carlos E Galván-Tejada1, Miguel Cruz2, Jorge I Galván-Tejada1, Hamurabi Gamboa-Rosales1, Alejandra García-Hernández1, Huizilopoztli Luna-García1, Irma Gonzalez-Curiel3, Mónica Martínez-Acuña3.
Abstract
One of the main microvascular complications presented in the Mexican population is diabetic retinopathy which affects 27.50% of individuals with type 2 diabetes. Therefore, the purpose of this study is to construct a predictive model to find out the risk factors of this complication. The dataset contained a total of 298 subjects, including clinical and paraclinical features. An analysis was constructed using machine learning techniques including Boruta as a feature selection method, and random forest as classification algorithm. The model was evaluated through a statistical test based on sensitivity, specificity, area under the curve (AUC), and receiving operating characteristic (ROC) curve. The results present significant values obtained by the model obtaining 69% of AUC. Moreover, a risk evaluation was incorporated to evaluate the impact of the predictors. The proposed method identifies creatinine, lipid treatment, glomerular filtration rate, waist hip ratio, total cholesterol, and high density lipoprotein as risk factors in Mexican subjects. The odds ratio increases by 3.5916 times for control patients which have high levels of cholesterol. It is possible to conclude that this proposed methodology is a preliminary computer-aided diagnosis tool for clinical decision-helping to identify the diagnosis of DR.Entities:
Keywords: diabetic retinopathy; feature selection; random forest; risk factors
Year: 2021 PMID: 34945799 PMCID: PMC8705564 DOI: 10.3390/jpm11121327
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Flowchart representing the proposed methodology.
Features gathered from the Mexican dataset.
| Feature Category | Feature |
|---|---|
| Basic information | EDU, SAL, SEX, AGE, AGE DX, WHR, BMI, SBP, DBP |
| Biochemical indicators | GLU, UREA, CRE, CHOL, HDL, LDL, TG, TCHOLU, |
| Additional information | GB, MF, PG, RG, AB, INS, LIPIDS TX, HA-TX |
| Output | DIABETIC RETINOPATHY |
Note: EDU(Education, studies concluded by the subject); SAL (Salary, monthly income); SEX (subject sex); AGE (subject age); AGE DX (age at diagnosis of diabetes); WHR (waist hip ratio); BMI (body mass index); SBP (systolic blood pressure); DBP (diastolic blood pressure); USBP (systolic blood pressure uncorrected by treatment); UDBP (diastolic blood pressure uncorrected by treatment); GLU (blood glucose levels); UREA (waste product resulting from the breakdown of protein in the subject body); CRE (waste product produced by muscles as part of regular activity); CHOL (fat-like substance that is found in all cells of the subject body); HDL (high density lipoprotein); LDL (low density lipoprotein); TG (type of fat found in the subject body); TCHOLU (total cholesterol uncorrected); UHDL (high density lipoprotein uncorrected by treatment); ULDL (low density lipoprotein uncorrected by treatment); UTG (triglycerides uncorrected by treatment); HBA1C (glycated hemoglobin); GFR (glomerular filtration rate); GB (drug treatment); MF (drug treatment); PG (drug treatment); RG (drug treatment); AB (drug treatment); INS (drug treatment); LIPIDS TX (lipids treatment); HA-TX (hypertension treatment).
General characteristics of all the subjects included.
| Feature | Cases (n = 149) | Controls (n = 149) |
|---|---|---|
| Education, | ||
| Elementary school | 25 (16.77%) | 39 (26.17%) |
| Salary, | ||
| Less than $2000.00 | 35 (23.48%) | 37 (24.83%) |
| Sex, | ||
| Female | 75 (50.33%) | 98 (65.77%) |
| Age (years) | 57 ± 9.96 | 55 ± 9.42 |
| Age DX (years) | 44.09 ± 7.55 | 45 ± 7.06 |
| WHR (cm/cm) | 0.94 ± 0.07 | 0.91 ± 0.07 |
| BMI (kg/m2) | 29.55 ± 5.00 | 30.02 ± 5.07 |
| SBP (mmHg) | 125 ± 15.88 | 123.41 ± 14.95 |
| DBP (mmHg) | 81.93 ± 11.09 | 83.82 ± 11.11 |
| USBP (mmHg) | 122.32 ± 15.01 | 120.06 ± 13.92 |
| UDBP (mmHg) | 80.27 ± 10.65 | 82.14 ± 10.62 |
Biochemical indicators of all the subjects included.
| Feature | Cases (n = 149) | Controls (n = 149) |
|---|---|---|
| Glucose (mg/dL) | 162.02 ± 70.44 | 165.51 ± 68.67 |
| Urea (mg/dL) | 38.14 ± 22.83 | 34.45 ± 16.91 |
| Creatinine (mg/dL) | 0.96 ± 0.59 | 0.84 ± 0.28 |
| Cholesterol (mg/dL) | 209.54 ± 46.38 | 219.55 ± 52.56 |
| HDL (mg/dL) | 39.24 ± 12.88 | 43.95 ± 13.34 |
| LDL (mg/dL) | 148.50 ± 39.13 | 154.86 ± 41.55 |
| Triglycerides (mg/dL) | 231.49 ± 157.77 | 214.16 ± 112.94 |
| TCHOLU (mg/dL) | 184.71 ± 40.29 | 201.58 ± 46.47 |
| UHDL (mg/dL) | 41.71 ± 12.40 | 45.64 ± 13.37 |
| ULDL (mg/dL) | 125.43 ± 33.15 | 138.04 ± 34.55 |
| UTG (mg/dL) | 208.07 ± 152.53 | 198.26 ± 111.44 |
| HBA1C (mmol/L) | 7.74 ± 3.15 | 7.47 ± 2.58 |
| GFR (mL/min) | 98.31 ± 41.27 | 101.88 ± 33.65 |
Additional information of all the subjects included.
| Feature | Cases (n = 149) | Controls (n = 149) |
|---|---|---|
| Glibenclamide | ||
| 0 | 74 (49.66%) | 87 (58.38%) |
| Metformin | ||
| 0 | 35 (23.48%) | 32 (21.41%) |
| Pioglitazone | ||
| 0 | 147 (98.65%) | 144 (96.64%) |
| Rosiglitazone | ||
| 0 | 149 (100%) | 149 (100%) |
| Acarbose | ||
| 0 | 147 (98.65%) | 148 (99.32%) |
| Insuline | ||
| 0 | 103 (69.12%) | 116 (77.85%) |
| HA.TX | ||
| 0 | 100 (67.11%) | 99 (66.44%) |
| Lipids TX | ||
| 0 | 75 (50.33%) | 96 (64.42%) |
p-value of each feature.
|
|
|
|
| Education | <2.2 × | <2.2 × |
| Salary | <2.2 × | <2.2 × |
| Sex | <2.2 × | <2.2 × |
| Age | <2.2 × | <2.2 × |
| Age DX | <2.2 × | <2.2 × |
| WHR | 4.054 × | <2.2 × |
| BMI | <2.2 × | <2.2 × |
| SBP | <2.2 × | <2.2 × |
| DBP | <2.2 × | <2.2 × |
| USBP | <2.2 × | <2.2 × |
| UDBP | <2.2 × | <2.2 × |
| Glucose | <2.2 × | <2.2 × |
| Urea | <2.2 × | <2.2 × |
| Creatinine | 0.515 | <2.2 × |
| Cholesterol | <2.2 × | <2.2 × |
| HDL | <2.2 × | <2.2 × |
| LDL | <2.2 × | <2.2 × |
| Triglycerides | <2.2 × | <2.2 × |
| TCHOLU | <2.2 × | <2.2 × |
| UHDL | <2.2 × | <2.2 × |
| ULDL | <2.2 × | <2.2 × |
| UTG | <2.2 × | <2.2 × |
| HBA1C | <2.2 × | <2.2 × |
| GFR | <2.2 × | <2.2 × |
| Glibenclamide | <2.2 × | <2.2 × |
| Metformin | 3.87 × | <2.2 × |
| Pioglitazone | <2.2 × | 0.02484 |
| Rosiglitazone | NA | NA |
| Acarbose | <2.2 × | 0.3189 |
| Insuline | <2.2 × | 1.224 × |
| HA.TX | <2.2 × | 8.107 × |
| Lipids TX | <2.2 × | 8.086 × |
Figure 2Feature selection process according by Boruta.
Feature selection results—Confirmed attributes.
| No. | Attribute | Feature Selection-Boruta |
|---|---|---|
| Norm Hits | ||
| 1 | Creatinine | 0.6355 |
| 2 | Lipids TX | 0.7054 |
| 3 | GFR | 0.7615 |
| 4 | WHR | 0.8316 |
| 5 | TCHOLU | 0.8537 |
| 6 | HDL | 0.8777 |
Figure 3ROC curve obtained for the model based on the key features identified by Boruta.
Performance metrics of RF classifier.
| Classifier | Sensitivity | Specificity | AUC |
|---|---|---|---|
| Random Forest | 0.6422 | 0.6169 | 0.69 |
Confusion matrix of the RF classifier.
| Reference | ||||
|---|---|---|---|---|
|
|
|
| ||
| Prediction | 0 | 91 | 58 | 0.3892 |
| 1 | 55 | 94 | 0.3691 | |
Participants groups divided by condition.
| Lipids Profile | Cases | Controls | Total |
|---|---|---|---|
| Lipids TX | 74 | 53 | 127 |
| GFR | 66 | 59 | 125 |
| WHR | 72 | 56 | 128 |
| TCHOLU | 45 | 69 | 114 |
| HDL | 139 | 133 | 272 |
| Creatinine | 37 | 24 | 61 |
| Total | 433 | 394 | 827 |
Risk Ratio with 95% C.I.
| Lipids Profile | Estimate | Lower | Upper |
|---|---|---|---|
| Lipids TX | 1 | NA | NA |
| GFR | 1.1310 | 0.8575 | 1.4916 |
| WHR | 1.0483 | 0.7889 | 1.3930 |
| TCHOLU | 1.4503 | 1.1257 | 1.8686 |
| HDL | 1.1716 | 0.9228 | 1.4876 |
| Creatinine | 0.9427 | 0.6490 | 1.3693 |
Odds Ratio with 95% C.I.
| Lipids Profile | Estimate | Lower | Upper |
|---|---|---|---|
| Lipids TX | 1 | NA | NA |
| GFR | 1.2466 | 0.7571 | 2.0575 |
| WHR | 1.0854 | 0.6596 | 1.7877 |
| TCHOLU | 2.1316 | 1.2759 | 3.5916 |
| HDL | 1.3342 | 0.8726 | 2.0494 |
| Creatinine | 0.9073 | 0.4815 | 1.6910 |
Results for DR classification.
| Sensitivity | Specificity | AUC | |
|---|---|---|---|
| 32 features + RF | 0.5709 | 0.5634 | 0.61 |
| Boruta approach + RF | 0.6422 | 0.6169 | 0.69 |
Comparison of machine learning models.
| Classifier | Sensitivity | Specificity | AUC |
|---|---|---|---|
| RF | 0.6422 | 0.6169 | 0.69 |
| LR | 0.6714 | 0.5858 | 0.68 |
| SVM | 0.6647 | 0.5882 | 0.67 |