| Literature DB >> 35250610 |
Ludovica Ilari1, Agnese Piersanti1, Christian Göbl2, Laura Burattini1, Alexandra Kautzky-Willer3, Andrea Tura4, Micaela Morettini1.
Abstract
Gestational diabetes mellitus (GDM) is a type of diabetes that usually resolves at the end of the pregnancy but exposes to a higher risk of developing type 2 diabetes mellitus (T2DM). This study aimed to unravel the factors, among those that quantify specific metabolic processes, which determine progression to T2DM by using machine-learning techniques. Classification of women who did progress to T2DM (labeled as PROG, n = 19) vs. those who did not (labeled as NON-PROG, n = 59) progress to T2DM has been performed by using Orange software through a data analysis procedure on a generated data set including anthropometric data and a total of 34 features, extracted through mathematical modeling/methods procedures. Feature selection has been performed through decision tree algorithm and then Naïve Bayes and penalized (L2) logistic regression were used to evaluate the ability of the selected features to solve the classification problem. Performance has been evaluated in terms of area under the operating receiver characteristics (AUC), classification accuracy (CA), precision, sensitivity, specificity, and F1. Feature selection provided six features, and based on them, classification was performed as follows: AUC of 0.795, 0.831, and 0.884; CA of 0.827, 0.813, and 0.840; precision of 0.830, 0.854, and 0.834; sensitivity of 0.827, 0.813, and 0.840; specificity of 0.700, 0.821, and 0.662; and F1 of 0.828, 0.824, and 0.836 for tree algorithm, Naïve Bayes, and penalized logistic regression, respectively. Fasting glucose, age, and body mass index together with features describing insulin action and secretion may predict the development of T2DM in women with a history of GDM.Entities:
Keywords: disease prediction; logistic regression; mathematical model; pathophysiology; predictive biomarker; statistical learning
Year: 2022 PMID: 35250610 PMCID: PMC8892139 DOI: 10.3389/fphys.2022.789219
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.566
Description of all the features included in the generated data set.
| Name | Acronym | Units |
| Age | age | years |
| Body weight | BW | kg |
| Height | h | cm |
| Body mass index | BMI | kg⋅m−2 |
| Basal glucose | gb | mg⋅dL−1 |
| Mean area under the glucose curve | GMEAN | mmol⋅L−1 |
| Mean area under the insulin curve | IMEAN | pmol⋅L−1 |
| Mean area under the C-peptide curve | CpMEAN | pmol⋅L−1 |
| Area under the insulin curve during the 1st phase of test | AUCINS–1P | pmol⋅L−1⋅min–1 |
| Area under the insulin curve during the 2nd phase of test | AUCINS–2P | pmol⋅L−1⋅min–1 |
| Disappearance rate of glucose before insulin injection | KG1 | %/min |
| Disappearance rate of glucose after insulin injection | KG2 | %/min |
| Insulin sensitivity | SI | 10−4⋅min−1/(μU⋅mL−1) |
| Glucose effectiveness | SG | min–1 |
| Distribution volume of glucose | V | L |
| Basal insulin effect of glucose effectiveness | BIE | 10–3⋅min–1 |
| Glucose effectiveness at zero insulin | GEZI | 10–2⋅min–1 |
| Mean of suprabasal insulin in the time interval 3–8 min | AIR | pmol⋅L−1 |
| Mean of suprabasal C-peptide in the time interval 3–8 min | ACPR | pmol⋅L−1 |
| Disposition index | DI | min–1 |
| Basal secretion rate | BSR | pmol⋅L−1⋅min−1 |
| β-cell responsivity to glucose | Φ1c | (pmol⋅L−1⋅min−1)/(mg⋅dL−1) |
| Area under the secretion curve during the entire test | AUCSECR | pmol |
| Area under the secretion curve during the 1st phase of test | AUCSECR–1P | pmol |
| Area under the secretion curve during the 2nd phase of test | AUCSECR–2P | pmol |
| Mean insulin clearance during the entire test | CLMEAN | L⋅min−1 |
| Mean insulin clearance during the 1st phase of test | CLMEAN–1P | L⋅min−1 |
| Mean insulin clearance during the 2nd phase of test | CLMEAN–2P | L⋅min−1 |
| Extra-hepatic insulin clearance | CLP | L⋅min−1 |
| Hepatic insulin clearance | FEL | % |
| Peak insulin after glucose injection | IPEAK–FIRST | μU⋅mL−1 |
| Peak insulin after insulin injection | IPEAK–INJECT | μU⋅mL−1 |
| Peak C-peptide | CPEAK | ng⋅dL−1 |
| Glucose dose injected | DOSE | g |
FIGURE 1Orange workflow for data analysis.
Characteristics of the preprocessed data set.
| Characteristics | NON-PROG ( | PROG ( | |
| age | 33.3 ± 4.2 | 36.6 ± 4.7 |
|
| BW | 65.8 [15.0] | 75.0 [18.8] |
|
| h | 164 [12] | 158 [12] |
|
| BMI | 25.4 ± 3.9 | 30.6 ± 6.4 |
|
| gb | 84 [8] | 96 [11.50] |
|
| GMEAN | 5.1 [0.7] | 5.9 [0.8] |
|
| IMEAN | 201.8 [78.3] | 211.9 [102.8] | n.s. |
| CpMEAN | 182.4 [97.2] | 240.9 [65.4] |
|
| AUCINS–1P | 2125.5 [1053.2] | 1520.9 [1120.0] |
|
| AUCINS–2P | 34468.1 [1417.8] | 36992.9 [18872.1] | n.s. |
| KG1 | 1.93 [0.88] | 1.56 [0.63] |
|
| KG2 | 4.7 ± 1.8 | 3.4 ± 1.8 |
|
| SI | 4.7 [2.7] | 3.1 [2.0] |
|
| SG | 0.022 [0.005] | 0.018 [0.008] |
|
| V | 13.4 [1.36] | 13.9 [1.00] | n.s. |
| BIE | 3 [2.4] | 2.4 [1.5] | n.s. |
| GEZI | 1.9 [0.7] | 1.7 [0.6] | n.s. |
| AIR | 194.4 [126.5] | 132.8 [103.0] |
|
| ACPR | 254.0 [138.8] | 158.2 [150.0] |
|
| DI | 1.36 [1.25] | 0.52 [0.76] |
|
| BSR | 31.8 [10.4] | 39.6 [10.9] |
|
| Φ1c | 67.80 [31.90] | 42.02 [33.31] |
|
| AUCSECR | 24734.9 [15607.7] | 32324.0 [11317.6] |
|
| AUCSECR–1P | 7146.2 [3591.8] | 6252.9 [3285.6] |
|
| AUCSECR–2P | 17450.7 [15298.7] | 27324.0 [9849.6] |
|
| CLMEAN | 0.69 [0.42] | 0.78 [0.47] |
|
| CLMEAN–1P | 3.30 [0.92] | 4.17 [1.76] | n.s. |
| CLMEAN–2P | 0.53 [0.39] | 0.68 [0.35] |
|
| CLP | 0.39 ± 0.42 | 0.56 ± 0.53 | n.s. |
| FEL | 0.53 [0.14] | 0.50 [0.13] | n.s. |
| IPEAK–FIRST | 53.2 [39.5] | 36.0 [39.3] |
|
| IPEAK–INJECT | 493 [281] | 559 [150] | n.s. |
| CPEAK | 465 [197] | 380 [140] |
|
| DOSE | 19.7 [4.5] | 22.5 [5.6] |
|
Data are presented as mean ± standard deviation or median [interquartile range]. Significance level: p-values < 0.05. n.s., not significant. Bold values indicate significant differences.
FIGURE 2Best predictive features according to decision tree classification algorithm.
FIGURE 3Confusion matrix for decision tree, Naïve Bayes, and logistic regression.
FIGURE 4Receiver operating characteristics (ROC) for decision tree (purple), Naïve Bayes (green), and logistic regression (orange).
Performance of the three classification algorithms.
| Classification algorithm | |||
|
| |||
| Performance measures | Tree | Naïve Bayes | Logistic regression |
| AUC | 0.795 | 0.831 | 0.884 |
| CA | 0.827 | 0.813 | 0.840 |
| Precision | 0.830 | 0.854 | 0.834 |
| Sensitivity | 0.827 | 0.813 | 0.840 |
| Specificity | 0.700 | 0.821 | 0.662 |
| F1 | 0.828 | 0.824 | 0.836 |
Comparison of the performance measures among classification algorithms through Bayesian interpretation of the pairwise Student’s t-tests.
| AUC | |||
| Tree | Naïve Bayes | Logistic regression | |
| Tree | 26.2 | 11.3 | |
| Naïve Bayes | 73.8 | 28.3 | |
| Logistic regression | 88.7 | 71.7 | |
|
| |||
|
| |||
| Tree | Naïve Bayes | Logistic regression | |
|
| |||
| Tree | 57.5 | 42.5 | |
| Naïve Bayes | 42.5 | 38.0 | |
| Logistic regression | 57.5 | 62.0 | |
|
| |||
|
| |||
| Tree | Naïve Bayes | Logistic regression | |
|
| |||
| Tree | 35.9 | 42.0 | |
| Naïve Bayes | 64.1 | 54.0 | |
| Logistic regression | 58.0 | 46.0 | |
|
| |||
|
| |||
| Tree | Naïve Bayes | Logistic regression | |
|
| |||
| Tree | 57.5 | 42.5 | |
| Naïve Bayes | 42.5 | 38.0 | |
| Logistic regression | 57.5 | 62.0 | |
|
| |||
|
| |||
| Tree | Naïve Bayes | Logistic regression | |
|
| |||
| Tree | 13.3 | 57.6 | |
| Naïve Bayes | 86.7 | 86.6 | |
| Logistic regression | 42.4 | 13.4 | |
|
| |||
|
| |||
| Tree | Naïve Bayes | Logistic regression | |
|
| |||
| Tree | 52.0 | 47.2 | |
| Naïve Bayes | 48.0 | 46.0 | |
| Logistic regression | 52.8 | 54.0 | |
Probability that the score for the classification algorithm in the row is higher than that of the classification algorithm in the column is reported.
FIGURE 5Venn diagrams with (A) correct classifications and (B) misclassifications.