| Literature DB >> 27232332 |
Paul Thottakkara1,2, Tezcan Ozrazgat-Baslanti1, Bradley B Hupf1, Parisa Rashidi3, Panos Pardalos2, Petar Momcilovic2, Azra Bihorac1.
Abstract
OBJECTIVE: To compare performance of risk prediction models for forecasting postoperative sepsis and acute kidney injury.Entities:
Mesh:
Year: 2016 PMID: 27232332 PMCID: PMC4883761 DOI: 10.1371/journal.pone.0155705
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Development flow from raw data to model building.
Sequence of steps from aggregation of raw data, data preparation and leading to model building. The R functions used at each stage is represented in bold italics. R functions are available in github repository: https://github.com/PRISMUF/ML_Algorithm_Postoperative.git. GAM, generalized additive model; SVM, support vector machine; LASSO; least absolute shrinkage and selection operator; PCA, principal component analysis.
Characteristics of input variables.
| Variable | Type of Variable | Data Source | Number of categories | Type of Preprocessing |
|---|---|---|---|---|
| Age (years) | Continuous | Derived | Imputation of outliers | |
| Gender | Binary | Raw | 2 | |
| Race | Nominal | Raw | 5 | Optimization of categorical features |
| Primary Insurance | Nominal | Raw | 4 | Optimization of categorical features |
| Residency area characteristics | ||||
| Zip code | Nominal | Raw | 10,000 | Transformation through link to Census data |
| County | Nominal | Raw | 71 | Optimization of categorical features |
| Rural area | Binary | Derived | 2 | |
| Total Population | Continuous | Derived | Obtained using residency zip code with linkage to US Census data | |
| Median Income | Continuous | Derived | Obtained using residency zip code with linkage to US Census data | |
| Total Proportion of African-Americans | Continuous | Derived | Obtained using residency zip code with linkage to US Census data | |
| Total Proportion of Hispanic | Continuous | Derived | Obtained using residency zip code with linkage to US Census data | |
| Population Proportion Below Poverty | Continuous | Derived | Obtained using residency zip code with linkage to US Census data | |
| Distance from Residency to Hospital (km) | Continuous | Derived | Calculated using residency zip code; Imputation of outliers | |
| Day of admission | Nominal | Derived | 12 | Optimization of categorical features |
| Month of admission | Nominal | Derived | 12 | Optimization of categorical features |
| Weekend admission | Binary | Derived | 2 | |
| Attending Surgeon | Nominal | Raw | 520 | Optimization of categorical features |
| Admission Source | Nominal | Raw | 3 | Optimization of categorical features |
| Admission Type | Binary | Raw | 2 | |
| Admitting service type | Binary | Derived | 2 | |
| Time of surgery from admission (days) | Continuous | Derived | Imputation of outliers | |
| Surgery Type | Nominal | Derived | 12 | Optimization of categorical features |
| Primary surgical procedure | Nominal | Derived | 1555 | Forest tree analysis of ICD-9-CM codes |
| Charlson's comorbidity index | Nominal | Derived | 18 | Optimization of categorical features |
| Major Diagnosis Category | Nominal | Raw | 28 | Optimization of categorical features |
| Myocardial Infarction | Binary | Derived | 2 | |
| Congestive Heart Failure | Binary | Derived | 2 | |
| Peripheral Vascular Disease | Binary | Derived | 2 | |
| Cerebrovascular Disease | Binary | Derived | 2 | |
| Chronic Pulmonary Disease | Binary | Derived | 2 | |
| Connective Tissue Disease-Rheumatic Disease | Binary | Derived | 2 | |
| Diabetes | Binary | Derived | 2 | |
| Cancer | Binary | Derived | 2 | |
| Liver Disease | Binary | Derived | 2 | |
| Chronic kidney disease stage | Binary | Derived | 2 | |
| Indicator of receiving Aminoglycosides on admission day | Binary | Derived | 2 | |
| Indicator of receiving Bicarbonate on admission day | Binary | Derived | 2 | |
| Indicator of receiving Diuretics on admission day | Binary | Derived | 2 | |
| Indicator of receiving Steroid on admission day | Binary | Derived | 2 | |
| Indicator of receiving Vancomycin on admission day | Binary | Derived | 2 | |
| Indicator of receiving ACE Inhibitors on admission day | Binary | Derived | 2 | |
| Indicator of receiving NSAIDS on admission day | Binary | Derived | 2 | |
| Indicator of receiving Aspirin on admission day | Binary | Derived | 2 | |
| Indicator of receiving Antiemetic on admission day | Binary | Derived | 2 | |
| Indicator of receiving Betablokers on admission day | Binary | Derived | 2 | |
| Indicator of receiving statin on admission day | Binary | Derived | 2 | |
| Indicator of receiving naloxone on admission day | Binary | Derived | 2 | |
| Indicator of receiving pressors on admission day | Binary | Derived | 2 | |
| Indicator of receiving inotropes on admission day | Binary | Derived | 2 | |
| Reference serum creatinine | Continuous | Derived | Imputation of outliers | |
| Reference estimated glomerular filtration rate | Continuous | Calculated from baseline creatinine | Imputation of outliers | |
| MDRD creatinine | Continuous | Derived | Imputation of outliers | |
| Ratio of reference creatinine to MDRD Cr | Continuous | Derived | Imputation of outliers | |
| Hematocrit | Continuous | Raw | Imputation of outliers | |
| Hemoglobin, g/dl | Continuous | Raw | Imputation of outliers | |
| Urine protein, mg/dL | Nominal | Raw | 4 | Optimization of categorical features |
| Urinal Hemoglobin, mg/dL | Nominal | Raw | 5 | Optimization of categorical features |
| Urinal Glucose, mg/dL | Nominal | Raw | 5 | Optimization of categorical features |
| No of complete blood count tests | Nominal | Raw | 4 | Optimization of categorical features |
| No of urine tests | Nominal | Derived | 4 | Optimization of categorical features |
a For continuous variables, observations that fell in the top and bottom 1% of the distribution were removed and imputed being considered as outliers.
b Nonlinear risk function was calculated for continuous functions entered to the model.
c For categorical variables with more than two levels, levels were transformed to a numeric value as detailed in Methods section.
d Using residency zip code, we linked to US Census data to calculate residing neighborhood characteristics and distance from hospital.
e Surgical procedure codes were optimized using forest tree analysis of ICD-9-CM codes as detailed in Methods section.
f Variable entered into model as 'Indicator of receiving pressors or inotropes on admission day".
Summary of overall cohort.
Abbreviations. GFR, Glomerular filtration rate, CBC, complete blood count.
| Overall (N = 50318) | |
|---|---|
| Age, median (25th-75th) | 56 (43, 68) |
| Female Gender, n (%) | 24670 (49.0) |
| Race, n (%) | |
| White | 40515 (82.2) |
| African-American | 6183 (12.5) |
| Hispanic | 1534 (3.1) |
| Other | 1064 (2.2) |
| Primary Insurance Group, n (%) | |
| Medicare | 19469 (38.7) |
| Medicaid | 6518 (13.0) |
| Private | 20592 (40.9) |
| Uninsured | 3736 (7.4) |
| Neighborhood characteristics | |
| Rural area, n (%) | 16098 (32.1) |
| Total Population, median (25th-75th) | 17085 (10002, 27782) |
| Median Income, median (25th-75th) | 33293 (28451, 40309) |
| Total Proportion of African-Americans, median (25th-75th) | 0.10 (0.04, 0.20) |
| Total Proportion of Hispanic, median (25th-75th) | 0.04 (0.02, 0.06) |
| Population Proportion Below Poverty, median (25th-75th) | 0.13 (0.09, 0.19) |
| Distance from Residency to Hospital (km), median (25th-75th) | 53(26, 118) |
| County (top 3 categories), n (%) | |
| Alachua | 8667 (17.2) |
| Marion | 4807 (9.6) |
| Lake | 2155 (4.3) |
| Charlson's comorbidity index (CCI), median (25th-75th) | 1 (0, 2) |
| Cancer, n (%) | 10121 (20.1) |
| Diabetes, n (%) | 8332 (16.6) |
| Chronic Pulmonary Disease, n (%) | 8179 (16.3) |
| Peripheral Vascular Disease, n (%) | 5953 (11.8) |
| Cerebrovascular Disease, n (%) | 4175 (8.3) |
| Congestive Heart Failure, n (%) | 3946 (7.8) |
| Myocardial Infarction, n (%) | 3290 (6.5) |
| Liver Disease, n (%) | 2482 (4.9) |
| Number of diagnoses, median (25th-75th) | 8 (5, 13) |
| Major Diagnosis Category (top 3 categories), n (%) | |
| Musculoskeletal System and Connective Tissue | 9924 (19.7) |
| Circulatory System | 7507 (14.9) |
| Nervous System | 6675 (13.3) |
| Weekend admission, n (%) | 6895 (13.7) |
| Admission day (top 3 categories), n (%) | |
| Tuesday | 10174 (19.8) |
| Wednesday | 9065 (17.6) |
| Monday | 8843 (17.2) |
| Admission month (top 3 categories), n (%) | |
| March | 4549 (9.0) |
| October | 4471 (8.9) |
| January | 4449 (8) |
| Number of operating surgeons, n | 520 |
| Number of procedures per operating surgeon, n (%) | |
| First rank | 1905 (3.8) |
| Second rank | 1602 (3.2) |
| Third rank | 1532 (3.0) |
| Admission Source, n(%) | |
| Emergency room | 13066 (26.4) |
| Outpatient setting | 29826 (60.1) |
| Transfer | 6699 (13.5) |
| Emergent surgery status, n (%) | 22820 (45.4) |
| Admission to Surgical service, n (%) | 44652 (88.7) |
| Time of surgery from admission (days), n (%) | |
| 0 | 28613 (56.9) |
| 1–2 | 11397 (22.6) |
| > = 3 | 10308 (20.5) |
| Surgery Type, n (%) | |
| Neurologic Surgery | 8385 (16.7) |
| Orthopedic Surgery | 7472 (14.9) |
| Cardiothoracic Surgery | 6755 (13.5) |
| Trauma/Burn Surgery | 5650 (11.2) |
| General Gastrointestinal Surgery | 4120 (8.2) |
| Transplant Surgery | 2765 (5.5) |
| Urological Surgery | 2640 (5.3) |
| vVascular Surgery | 2601 (5.2) |
| Gynecologic Surgery | 2437 (4.8) |
| General Oncology Surgery | 2188 (4.4) |
| General Colorectal Surgery | 1833 (3.6) |
| Other Surgeries | 3472 (6.9) |
| Surgery procedure type | |
| Primary Procedure codes, n | 1555 |
| Primary Procedure (top 3 categories), n (%) | |
| 01.59 Other excision or destruction of lesion or tissue of brain | 1330 (2.6) |
| 81.54 Total knee replacement | 1223 (2.4) |
| 39.51 Clipping of aneurysm | 1099 (2.2) |
| Reference creatinine (mg/dl), median (25th-75th) | 0.8 (0.7, 1.03) |
| Estimated reference GFR (mL/min/1.73 m2), median (25th-75th) | 92.3 (71.1, 107.9) |
| Hemoglobin, g/dL median (25th-75th) | 11.7 (10.2, 13.2) |
| Hematocrit, median (25th-75th) | 34.3 (30.1, 38.6) |
| Dipstik urine protein, mg/dL n (%) | |
| Missing | 41948 (83.4) |
| Negative | 5502 (10.9) |
| 30 | 1756 (3.5) |
| 100 | 753 (1.5) |
| > = 300 | 359 (0.7) |
| Number of CBC tests, n (%) | |
| 0 | 14620 (29.1) |
| 1 | 27554 (54.8) |
| 2 | 5912 (11.8) |
| 3 or more | 2232 (4.4) |
| Admission medication types, n | 40 |
| Admission Day Medications (top 3 categories), n (%) | |
| Antiemetic drugs | 28783 (57.2) |
| Beta blockers | 11750 (23.4) |
| Diuretics | 5886 (11.7) |
| Statin | 5790 (11.5) |
| Angiotensin-Converting-Enzyme Inhibitors | 5066 (10.1) |
| Aspirin | 3428 (6.8) |
| Pressors/Inotropes | 2864 (5.7) |
| Bicarbonate | 2070 (4.1) |
| Naloxone | 575 (1.1) |
| KDIGO-AKI | 18246 (36%) |
| Severe sepsis | 2589 (5%) |
a Other surgeries include ear-nose-throat, ophthalmology, and plastic surgeries.
Comparison of time required for model building in seconds.
| Acute Kidney Injury | Severe Sepsis | |||
|---|---|---|---|---|
| Time required for model building (in seconds) | ||||
| Model | Before data preprocessing or optimization | After data preprocessing or optimization | Before data preprocessing or optimization | After data preprocessing or optimization |
| Logistic Regression Model | 4640 s | 7 s | 5530 s | 8 s |
| Generalized Additive Models | 6520 s | 48 s | 6980 s | 73 s |
| Naïve Bayes Model | 26 s | 22 s | 19 s | 24 s |
Comparison of model performances.
Abbreviations. AUC, area under the receiver operating characteristics curve; CI, confidence interval; GAM, generalized additive model; SVM, support vector machine; LASSO; least absolute shrinkage and selection operator; PPV, positive predicted value. Bootstrap sampling was used to obtain 95% confidence intervals and comparisons were made using nonparametric methods.
| Model | Acute Kidney Injury | Severe Sepsis | ||||
|---|---|---|---|---|---|---|
| Accuracy (95% CI) | AUC (95% CI) | PPV (95% CI) | Accuracy (95% CI) | AUC (95% CI) | PPV (95% CI) | |
| 0.752 (0.746,0.758) | 0.824 (0.818,0.828) | 0.725 (0.714,0.737) | 0.773 (0.762,0.781) | 0.851 (0.840,0.8560) | 0.811 (0.785,0.833) | |
| 0.756 (0.751,0.761) | 0.827 (0.821,0.832) | 0.719 (0.706,0.729) | 0.775 (0.766,0.783) | 0.852 (0.840,0.863) | 0.806 (0.779,0.832) | |
| 0.744 (0.738,0.749) | 0.797 (0.791,0.803) | 0.545 (0.534,0.558) | 0.805 (0.798,0.811) | 0.83 (0.819,0.841) | 0.689 (0.659,0.716) | |
| 0.767 (0.757,0.774) | 0.819 (0.811,0.828) | 0.662 (0.648,0.676) | 0.71 (0.689,0.731) | 0.762 (0.733,0.782) | 0.677 (0.619,0.722) | |
| 0.753 (0.747,0.757) | 0.824 (0.818,0.830) | 0.726 (0.714,0.738) | 0.772 (0.760,0.780) | 0.85 (0.838,0.863) | 0.812 (0.781,0.838) | |
| 0.757 (0.752,0.762) | 0.828 (0.822,0.833) | 0.72 (0.706,0.732) | 0.774 (0.766,0.780) | 0.851 (0.842,0.862) | 0.806 (0.783,0.831) | |
| 0.744 (0.737,0.750) | 0.797 (0.789,0.804) | 0.545 (0.533,0.556) | 0.806 (0.800,0.813) | 0.831 (0.817,0.841) | 0.69 (0.659,0.711) | |
| 0.767 (0.759,0.774) | 0.82 (0.812,0.829) | 0.665 (0.646,0.685) | 0.697 (0.684,0.713) | 0.757 (0.736,0.779) | 0.689 (0.652,0.732) | |
| 0.774 (0.769,0.781) | 0.853 (0.849,0.859) | 0.758 (0.746,0.767) | 0.818 (0.809,0.824) | 0.904 (0.895,0.913) | 0.854 (0.841,0.880) | |
| 0.773 (0.768,0.777) | 0.858 (0.853,0.862) | 0.784 (0.771,0.793) | 0.826 (0.819,0.833) | 0.909 (0.902,0.917) | 0.86 (0.843,0.878) | |
| 0.741 (0.735,0.747) | 0.819 (0.814,0.826) | 0.666 (0.651,0.677) | 0.805 (0.797,0.815) | 0.882 (0.874,0.890) | 0.839 (0.822,0.866) | |
| 0.777 (0.767,0.782) | 0.857 (0.850,0.862) | 0.735 (0.725,0.750) | 0.85 (0.737,0.897) | 0.877 (0.828,0.904) | 0.751 (0.667,0.850) | |
a p<0.05 for AUC comparison with respect to logistic regression model without any data reduction.
b p<0.05 for AUC comparison with respect to GAMs model without any data reduction.
Fig 2Predicted risk functions for the association between (A) acute kidney injury and (B) severe sepsis and continuous variables.
Risk functions were generated from multivariate generalized additive models and logistic regression models. GAM, generalized additive model; DoF, degree of freedom; GFR, glomerular filtration rate.