| Literature DB >> 35931671 |
Jushuang Li1,2,3, Chengnan Guo1,2, Tao Wang1,2, Yixi Xu1,2, Fang Peng1,2, Shuzhen Zhao1,2, Huihui Li1,2, Dongzhen Jin1,2, Zhezheng Xia1,2, Mingzhu Che1,2, Jingjing Zuo4, Chao Zheng5, Honglin Hu6, Guangyun Mao7,8,9.
Abstract
OBJECTIVE: Early identification of diabetic retinopathy (DR) is key to prioritizing therapy and preventing permanent blindness. This study aims to propose a machine learning model for DR early diagnosis using metabolomics and clinical indicators.Entities:
Mesh:
Year: 2022 PMID: 35931671 PMCID: PMC9355962 DOI: 10.1038/s41387-022-00216-0
Source DB: PubMed Journal: Nutr Diabetes ISSN: 2044-4052 Impact factor: 4.725
Fig. 1Data preprocessing and selection of machine learning models.
Metabolomic data preprocessing work flow (A), accuracy heat map of machine learning model (B), decision tree (D), and its hyper-parameter learning curve (C). Notes: C Maximum depth parameter (max_depth) selection in the decision tree model used hold-out and 10-fold cross-validation based on the hyper-parameter learning curve; D A decision tree model based on the training set to distinguish the healthy control group, DM group, and DR group. Abbreviations: QC quality control, CV Coefficient of variation, KNN K-Nearest Neighbors, GNB Gaussian Naive Bayes, LR Logistics Regression, DT Decision Tree, RF Random Forest, XGB XGBoost, DNN Neural Networks, SVM Support Vector Machine, MEDP545 2-pyrrolidinone, MEDN430 thiamine triphosphate, Control healthy control group, DR diabetic retinopathy group, DM diabetes mellitus without DR group.
Clinical and demographic characteristics of the study population.
| Training set | Testing set | |||||
|---|---|---|---|---|---|---|
| Variables | Control | DM | DR | Control | DM | DR |
| Age, years | 56.0 (54.0,62.0) | 53.0 (48.0,59.0) | 57.0 (54.0,65.0) | 58.3 ± 7.0 | 56.0 ± 12.0 | 57.0 ± 10.0 |
| Male, # (%) | 36 (73.5) | 29 (59.2) | 26 (53.1) | 17 (85.0) | 9 (45.0) | 10 (50.0) |
| BMI, kg/m2 | 24.5 (23.4,26.7) | 23.9 (22.1,27.4) | 24.1 (22.4,27.0) | 24.7 (23.2,26.9) | 24.8 (23.1,26.1) | 25.2 (22.4,26.2) |
| Fpg, mmol/L | 5.4 (5.0,5.7) | 8.3 (6.9,12.0) | 8.9 (6.7,10.9) | 5.3 (5.1,6.4) | 8.6 (6.8,11.6) | 7.3 (5.7,9.1) |
| HbA1c, % | 5.7 (5.4,6.0) | 9.9 (8.2,12.0) | 9.7 (8.7,10.9) | 5.9 ± 0.8 | 9.9 ± 2.0 | 10.3 ± 1.9 |
| LDL, mmol/L | 2.9 ± 0.8 | 2.7 ± 0.9 | 2.6 ± 1.1 | 2.7 ± 0.7 | 2.4 ± 1.2 | 2.5 ± 1.0 |
| HDL, mmol/L | 1.1 (1.0,1.3) | 1.1 (0.8,1.4) | 1.1 (0.9,1.4) | 1.1 (1.0,1.2) | 0.9 (0.8,1.2) | 1.0 (0.8,1.2) |
| TG, mmol/L | 2.0 (1.5,3.0) | 1.6 (1.0,2.1) | 1.4 (1.0,1.7) | 1.9 (1.4,2.7) | 1.8 (1.5,2.4) | 1.5 (1.1,2.3) |
| TC, mmol/L | 5.0 ± 0.9 | 4.8 ± 1.1 | 4.6 ± 1.5 | 4.9 ± 0.9 | 4.5 ± 1.3 | 4.3 ± 1.3 |
| hypertension, # (%) | 17 (34.7) | 12 (24.5) | 21 (42.9) | 2 (10.0) | 10 (50.0) | 10 (50.0) |
| Systolic BP, mm Hg | 124.8 ± 14.4 | 136.8 ± 19.7 | 134.5 ± 17.6 | 140.1 ± 24.8 | ||
| Diastolic BP, mm Hg | 78.0 (73.0,86.0) | 76.0 (70.0,80.0) | 82.6 ± 10.3 | 80.6 ± 11.5 | ||
| Duration of diabetes, years | 8.1 ± 6.2 | 12.2 ± 6.1 | 9.9 ± 6.6 | 12.1 ± 7.8 | ||
| Education, # (%) | ||||||
| Junior high school or below | 25 (53.2) | 23 (51.1) | 10 (52.6) | 13 (65.0) | ||
| High school or above | 22 (46.8) | 22 (48.9) | 9 (47.4) | 7 (35.0) | ||
| Occupation, # (%) | ||||||
| Manual workers | 23 (48.9) | 22 (50.0) | 8 (44.4) | 12 (60.0) | ||
| Mental worker | 10 (21.3) | 8 (18.2) | 5 (27.8) | 3 (15.0) | ||
| Both | 14 (29.8) | 14 (31.8) | 5 (27.8) | 5 (25.0) | ||
| History of diabetes, # (%) | 17 (34.7) | 25 (51.0) | 10 (50.0) | 8 (40.0) | ||
| Smoking habits, # (%) | ||||||
| Non-smokers | 29 (61.7) | 23 (51.1) | 12 (63.2) | 13 (65.0) | ||
| Current smokers | 13 (27.7) | 16 (35.6) | 6 (31.6) | 5 (25.0) | ||
| Ex-smokers | 5 (10.6) | 6 (13.3) | 1 (5.3) | 2 (10.0) | ||
| Alcohol consumption, # (%) | ||||||
| Non-drinkers | 22(46.8) | 20(44.4) | 11(57.9) | 9(45.0) | ||
| Current drinkers | 23(48.9) | 19(42.2) | 7(36.8) | 8(40.0) | ||
| Ex-drinkers | 2(4.3) | 6(13.3) | 1(5.3) | 3(15.0) | ||
BMI body mass index, FPG fasting plasma glucose, HbA1c glycated hemoglobin, HDL high-density lipoprotein, LDL low-density lipoprotein, TG triglyceride, TC total cholesterol, Systolic BP systolic blood pressure, Diastolic BP diastolic blood pressure, Control healthy control group, DM T2DM without DR participants, DR T2DM patients with DR.
Continuous data obeying normal or similar normal distribution were described as mean ± standard deviation (SD) and variance analysis of randomized block design or the paired t-test was applied to compare the differences between the three/two groups. Otherwise, median (1st quartile, 3rd quartile) and Friedman M or Wilcoxon signed-rank tests were used. Categorical data were presented as a number of cases (%) and the McNemar-Bowker test was utilized to compare the differences between the groups.
Fig. 2Development and validation of the nomogram model.
Developed nomogram for diabetic retinopathy (A), and the ROC curve and decision curves analysis curve of the Nomogram model, Rhee et al. model, Aspelund et al. model, Hippisley-Cox and Coupland model, and Dagliati et al. model in the training set (B, C) and testing set (D, E). Notes: nomogram model, thiamine triphosphate, systolic blood pressure, duration of diabetes; Rhee et al. model, glutamine/glutamate ratio; Aspelund et al. model, gender, systolic blood pressure, duration of diabetes and glycated hemoglobin; Hippisley-Cox and Coupland model, age, BMI, systolic blood pressure, cholesterol/high-density lipoprotein ratio, glycated hemoglobin; Dagliati et al. model, age, gender, duration of diabetes, BMI, glycated hemoglobin, hypertension, smoke; none, net benefit when all patients are considered as not having the outcome (diabetic retinopathy); all, net benefits when all patients are considered as having the outcome. The preferred model is the model with the highest net benefit at any given threshold. Abbreviations: MEDN430 thiamine triphosphate, sBp systolic blood pressure, DM_duration duration of diabetes.
Comparison of the predictive ability of the Nomogram model and models constructed in previous studies.
| Model | AUC | AUC (95%CI) | Sensitivity (%) | Specificity (%) | Precision (%) | Positive predictive value (%) | Negative predictive value (%) | Youden’s index |
|---|---|---|---|---|---|---|---|---|
| Training set | ||||||||
| Nomogram model | 0.99 | 0.97, 1.00 | 97.96 | 93.88 | 95.92 | 94.12 | 97.87 | 0.92 |
| Rhee et al. model | 0.64 | 0.53, 0.76 | 87.76 | 48.98 | 68.37 | 63.24 | 80.00 | 0.37 |
| Aspelund et al. model | 0.70 | 0.59, 0.80 | 83.67 | 51.02 | 67.35 | 63.08 | 75.76 | 0.35 |
| Hippisley-Cox and Coupland model | 0.67 | 0.57, 0.78 | 61.22 | 73.47 | 67.35 | 69.77 | 65.45 | 0.35 |
| Dagliati et al. model | 0.69 | 0.59, 0.80 | 55.10 | 79.59 | 67.35 | 72.97 | 63.93 | 0.35 |
| Testing set | ||||||||
| Nomogram model | 0.99 | 0.96, 1.00 | 95.00 | 100.00 | 97.50 | 100.00 | 95.24 | 0.95 |
| Rhee et al. model | 0.76 | 0.61, 0.92 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 0.50 |
| Aspelund et al. model | 0.73 | 0.57, 0.88 | 55.00 | 80.00 | 67.50 | 73.33 | 64.00 | 0.35 |
| Hippisley-Cox and Coupland model | 0.77 | 0.62, 0.92 | 80.00 | 75.00 | 77.50 | 76.19 | 78.95 | 0.55 |
| Dagliati et al. model | 0.78 | 0.63, 0.92 | 80.00 | 65.00 | 72.50 | 69.57 | 76.47 | 0.45 |
Nomogram model contains thiamine triphosphate, systolic blood pressure, and duration of diabetes; Rhee et al. model contains glutamine/glutamic acid ratio; Aspelund et al. model contains sex, systolic blood pressure, duration of diabetes, and glycated hemoglobin; Hippisley-Cox and Coupland model contains sex, BMI, systolic blood pressure, cholesterol/high-density lipoprotein ratio, and glycated hemoglobin; Dagliati et al. model contains age, sex, duration of diabetes, BMI, glycated hemoglobin, hypertension, and smoking.