| Literature DB >> 33480091 |
Lin Li1, Chuang-Chung Lee2, Fang Liz Zhou1, Cliona Molony2, Zoran Doder1, Evgeny Zalmover1, Kristen Sharma1, Juhaeri Juhaeri1, Chuntao Wu1.
Abstract
PURPOSE: To assess the performance of different machine learning (ML) approaches in identifying risk factors for diabetic ketoacidosis (DKA) and predicting DKA.Entities:
Keywords: AUC; diabetic ketoacidosis; least absolute shrinkage and selection operator; logistic regression; machine learning; prediction model
Year: 2021 PMID: 33480091 PMCID: PMC8049019 DOI: 10.1002/pds.5199
Source DB: PubMed Journal: Pharmacoepidemiol Drug Saf ISSN: 1053-8569 Impact factor: 2.890
FIGURE 1Data processing flowchart. DKA, diabetic ketoacidosis; LASSO, least absolute shrinkage and selection operator; T1D, type 1 diabetes
Study population attrition procession
| Patient counts (January 01, 2007–September 30, 2018) | ||
|---|---|---|
| Individuals in Optum® de‐identified Electronic Health Record database | 95 823 300 | |
| Individuals with diabetes | 7 153 077 | |
| Individuals with type 1 diabetes | 169 779 | |
| Individuals aged ≥18 years and within an Integrated Delivery Network | 130 052 | |
| Individuals with at least 1 HbA1c measurement and at least 1 year of clinical activity any time | 105 816 | |
| DKA case and control selection | ||
| Potential candidates |
Potential DKA Cases
|
Potential Controls
|
| After application of the study criteria on potential DKA cases before matching: Type 1 diabetes for at least 365 days before index date Treated with insulin within 365 days before index date At least 1 HbA1c measurement within 183 days before index date Without pregnancy within 365 days before index date Without off‐label use of antihyperglycemic agents indicated for type 2 diabetes only (except for metformin) within 365 days before or on index date | 3400 | NA |
| Control selection via incidence density sampling based on 1:10 matching ratio | 3400 | 34 000 |
| After application of the same study criteria defined above on controls | 3400 | 11 780 |
Abbreviations: DKA, diabetic ketoacidosis; HbA1c, hemoglobin A1c.
A control could not develop DKA before or at the matched index date but could become a case after the index date.
Selected characteristics of DKA cases and controls
| DKA cases | Controls |
| |
|---|---|---|---|
| Calendar year of T1D cohort entry (%) | |||
| 2007 | 554 (16.3) | 1831 (15.5) | 0.111 |
| 2008 | 578 (17.0) | 1910 (16.2) | |
| 2009 | 367 (10.8) | 1175 (10.0) | |
| 2010 | 414 (12.2) | 1469 (12.5) | |
| 2011 | 376 (11.1) | 1286 (10.9) | |
| 2012 | 370 (10.9) | 1257 (10.7) | |
| 2013 | 283 (8.3) | 1009 (8.6) | |
| 2014 | 228 (6.7) | 847 (7.2) | |
| 2015 | 131 (3.9) | 532 (4.5) | |
| 2016 | 80 (2.4) | 361 (3.1) | |
| 2017 | 19 (0.6) | 103 (0.9) | |
| Age, years, mean (SD) | 42.9 (16.5) | 45.4 (16.4) | <0.001 |
| Sex (%) | |||
| Female | 1818 (53.5) | 5594 (47.5) | <0.001 |
| Male | 1579 (46.4) | 6184 (52.5) | |
| Unknown | 3 (0.1) | 2 (0.0) | |
| Race (%) | |||
| Caucasian | 2929 (86.1) | 10 653 (90.4) | <0.001 |
| African American | 346 (10.2) | 584 (5.0) | |
| Asian | 12 (0.4) | 71 (0.6) | |
| Other/Unknown | 113 (3.3) | 472 (4.0) | |
| Annual household income, $, mean (SD) | 41 773 (8302) | 42 935 (8938) | <0.001 |
| Insurance type (%) | |||
| Commercial | 1306 (38.4) | 6342 (53.8) | <0.001 |
| Medicare | 621 (18.3) | 1519 (12.9) | |
| Medicaid | 514 (15.1) | 727 (6.2) | |
| Other payor type | 134 (3.9) | 353 (3.0) | |
| Uninsured | 202 (5.9) | 255 (2.2) | |
| Unknown | 623 (18.3) | 2584 (21.9) | |
| Geographic region (%) | |||
| Midwest | 2042 (60.1) | 6792 (57.7) | <0.001 |
| Northeast | 315 (9.3) | 1389 (11.8) | |
| South | 646 (19.0) | 2237 (19.0) | |
| West | 269 (7.9) | 1008 (8.6) | |
| Other/Unknown | 128 (3.8) | 354 (3.0) | |
| Lifestyle risk factors within 365 days before index date (%) | |||
| Alcohol abuse | 175 (5.1) | 226 (1.9) | <0.001 |
| Controlled substance abuse | 523 (15.4) | 454 (3.9) | <0.001 |
| Health service use within 365 days before index date (%) | |||
| Visit to endocrinologist | 1789 (52.6) | 7129 (60.5) | <0.001 |
| Visit to primary care | 1781 (52.4) | 4959 (42.1) | <0.001 |
| Chronic comorbidities any time between study start date and index date (%) | |||
| Cardiovascular disease | 2042 (60.1) | 5927 (50.3) | <0.001 |
| Diabetic microvascular complications | 1981 (58.3) | 5160 (43.8) | <0.001 |
| Chronic liver disease | 244 (7.2) | 435 (3.7) | <0.001 |
| Chronic kidney disease | 859 (25.3) | 1328 (11.3) | <0.001 |
| Dementia | 137 (4.0) | 217 (1.8) | <0.001 |
| Psychiatric disorder | 1743 (51.3) | 3580 (30.4) | <0.001 |
| Autoimmune disorders | 432 (12.7) | 1577 (13.4) | 0.315 |
| Cancer | 377 (11.1) | 1264 (10.7) | 0.575 |
| Acute medical conditions (%) | |||
| Infection within 7 days before index date | 230 (6.8) | 96 (0.8) | <0.001 |
| Major surgery within 7 days before index date | 45 (1.3) | 6 (0.1) | <0.001 |
| Non‐DKA hospitalization within 30 days before index date | 610 (17.9) | 168 (1.4) | <0.001 |
| Treatments (%) | |||
| Insulin pump within 7 days before index date | 154 (4.5) | 153 (1.3) | <0.001 |
| Insulin type within 7 days before index date | |||
| Intermediate/long‐acting insulin | 218 (6.4) | 309 (2.6) | <0.001 |
| Rapid/short‐acting insulin | 285 (8.4) | 512 (4.3) | <0.001 |
| Premixed insulin | 10 (0.3) | 15 (0.1) | 0.061 |
| Other medications within 30 days before index date | |||
| Systemic steroids | 81 (2.4) | 101 (0.9) | <0.001 |
| Diuretics | 132 (3.9) | 221 (1.9) | <0.001 |
| Antipsychotics | 60 (1.8) | 57 (0.5) | <0.001 |
| Laboratory test results or vital signs within 183 days before index date, mean (SD) | |||
| HbA1c, % | 9.3 (1.8) | 8.0 (1.4) | <0.001 |
| Random blood glucose level, mg/dl | 194.6 (60.6) | 169.9 (66.3) | <0.001 |
| eGFR, ml/min/1.73m2 | 83.1 (36.9) | 96.4 (30.5) | <0.001 |
| Total cholesterol, mg/dl | 178.5 (46.7) | 173.3 (38.3) | <0.001 |
| Systolic blood pressure, mm Hg | 126.8 (15.6) | 124.4 (13.8) | <0.001 |
| BMI, kg/m2 | 26.4 (5.8) | 27.9 (5.7) | <0.001 |
| Height, cm | 169.6 (10.2) | 171.1 (10.2) | <0.001 |
| White blood cell count, x103 per microliter | 9.0 (3.3) | 7.6 (2.7) | <0.001 |
| Platelet count, x103 per microliter | 268.4 (79.7) | 254.5 (72.4) | <0.001 |
| Temperature, °C | 36.7 (0.3) | 36.7 (0.3) | <0.001 |
| Pulse rate, beats per minute | 85.0 (12.8) | 78.7 (11.9) | <0.001 |
| Respiratory rate, breaths per minute | 17.4 (2.1) | 16.7 (2.1) | <0.001 |
| Hemoglobin, g/dl | 12.6 (2.1) | 13.4 (1.9) | <0.001 |
| Oxygen saturation, SpO2 (pulse oximetry) | 97.6 (1.5) | 97.6 (1.5) | 0.264 |
Abbreviations: BMI, body mass index; DKA, diabetic ketoacidosis; eGFR, estimated glomerular filtration rate; HbA1c, hemoglobin A1c; SD, standard deviation.
Based on univariate analysis.
Based on non‐missing values.
The performance of study models with full set of features in the test data set
| Models | AUC (95% CI) | Accuracy (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) |
|---|---|---|---|---|
| Logistic regression |
0.821 (0.804–0.837) |
0.827 (0.814–0.839) |
0.409 (0.321–0.497) |
0.947 (0.925–0.969) |
| LASSO |
0.821 (0.805–0.838) |
0.827 (0.814–0.839) |
0.407 (0.318–0.496) |
0.948 (0.925–0.970) |
| XGBoost |
0.819 (0.802–0.836) |
0.825 (0.813–0.837) |
0.414 (0.311–0.518) |
0.944 (0.916–0.971) |
| DRF |
0.817 (0.799–0.834) |
0.827 (0.815–0.839) |
0.420 (0.319–0.522) |
0.944 (0.917–0.971) |
| Feedforward network |
0.817 (0.799–0.834) |
0.825 (0.812–0.837) |
0.400 (0.291–0.508) |
0.947 (0.920–0.975) |
Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval; DRF, distributed random forest; LASSO, least absolute shrinkage and selection operator.
Top 10 features by each study model
| Rank | Conventional machine learning | Flexible machine learning | |||
|---|---|---|---|---|---|
| Logistic regression | LASSO | XGBoost | DRF | Feedforward network | |
| 1 | Insurance type – uninsured | HbA1c | HbA1c | HbA1c | Race – Asian |
| 2 | HbA1c | Non‐DKA hospitalization | Non‐DKA hospitalization | White blood cell count | Insurance type – uninsured |
| 3 | Non‐DKA hospitalization | Insurance type ‐ uninsured | White blood cell count | Non‐DKA hospitalization | Geographic region – Northeast |
| 4 | BMI | BMI | Hemoglobin | Hemoglobin | Race – African American |
| 5 | Pulse rate | Pulse rate | Pulse rate | Pulse rate | Geographic region – West |
| 6 | Psychiatric disorder | Psychiatric disorder | BMI | Random glucose level | Platelet count |
| 7 | Age | Age | Oxygen saturation | Respiratory rate | Gender ‐ female |
| 8 | Calendar year of diabetes cohort entry | Calendar year of diabetes cohort entry | Random glucose level | Platelet count | HbA1c |
| 9 | White blood cell count | White blood cell count | Platelet count | eGFR | Non‐DKA hospitalization |
| 10 | Acute infection | Acute infection | eGFR | BMI | White blood cell count |
Abbreviations: BMI, body mass index; DKA, diabetic ketoacidosis; DRF, distributed random forest; eGFR, estimated glomerular filtration rate; HbA1c, hemoglobin A1c; LASSO, least absolute shrinkage and selection operator.