| Literature DB >> 25734117 |
Jenna Wiens1, Wayne N Campbell2, Ella S Franklin3, John V Guttag1, Eric Horvitz4.
Abstract
BACKGROUND: Although many risk factors are well known, Clostridium difficile infection (CDI) continues to be a significant problem throughout the world. The purpose of this study was to develop and validate a data-driven, hospital-specific risk stratification procedure for estimating the probability that an inpatient will test positive for C difficile.Entities:
Keywords: Clostridium difficile; data-driven methods; electronic medical records; machine learning; risk stratification
Year: 2014 PMID: 25734117 PMCID: PMC4281796 DOI: 10.1093/ofid/ofu045
Source DB: PubMed Journal: Open Forum Infect Dis ISSN: 2328-8957 Impact factor: 3.835
Figure 1.Study population flow diagram.
Descriptive Characteristics of Study Population
| Variable | Statistic ( |
|---|---|
| Female gender (%) | 56.72 |
| Age (%) | |
| 18–25 | 6.36 |
| 25–45 | 20.87 |
| 45–60 | 25.23 |
| 60–70 | 18.74 |
| 70–80 | 15.37 |
| 80–100 | 10.37 |
| ≥100 | 2.97 |
| Hospital admission type (%) | |
| Emergency | 58.53 |
| Routine elective | 19.36 |
| Urgent | 12.43 |
| Term pregnancy | 9.41 |
| Hospital admission source (%) | |
| Admitted from home | 79.34 |
| Transferred from another health institution | 12.02 |
| Outpatient | 6.20 |
| Other* | 2.42 |
| Hospital service (%) | |
| Medicine | 45.54 |
| Cardiology | 12.41 |
| Surgery | 11.41 |
| Obstetrics | 10.72 |
| Psychiatry | 4.21 |
| Other† | 15.71 |
| Hemodialysis performed (%) | 5.02 |
| Diabetic (%) | 31.46 |
| Medications (%) | |
| Immunosuppressants (solid-organ transplant) | 1.84 |
| Corticosteroids | 11.31 |
| Antimicrobials assoc** | 36.67 |
| Antimicrobials rarely assoc | 18.30 |
| Proton pump inhibitors | 34.92 |
| CDI (%) | 1.05 |
| Median LOS in days (IQR) | 4.01 (2.40–7.12) |
| Previous visit in last 90 days (%) | 21.85 |
| History of CDI, 1 year (%) | 1.45 |
Abbreviations: assoc, ; CDI, Clostridium difficile infection; IQR, interquartile range; LOS, length of stay.
* Other includes routine admission (unscheduled), transferred form a nursing home, referred and admitted by family physician.
† Other includes burn, gynecology, neurosurgery, open heart surgery, oncology, orthopedics, trauma, vascular.
** assoc refers to known associations between antimicrobials and CDI.
Variable Descriptions*
| Variable Name | Description |
|---|---|
| Curated Variables Based on Well Known Risk Factors (All Variables Collected During First 24 H of Admission) | |
| age_70 | (Time of Admission - Birthday) ≥70 years [ |
| admission_source:TE | Transfer from nursing home [ |
| day90_hospit | Recent hospitalization in the previous 90 days [ |
| hist_cdi | Previous CDI within the last year [ |
| hemodialysis | Procedure code for dialysis [ |
| gastro_tube | Procedure code associated with nasogastric or esophagostomy tube [ |
| ccsteroids | POE for corticosteroids [ |
| immunosuppressants | POE for solid-organ transplant immunosuppressants |
| chemo_cdi | POE for chemotherapeutic agents associated with CDI |
| chemo_entero | POE for chemotherapeutic agents associated with enteropathy |
| antimicrobials_assoc | POE for antimicrobials frequently associated with CDI [ |
| antimicrobials_rarely | POE for antimicrobials rarely associated with CDI [ |
| ppi | POE for proton pump inhibitors [ |
| abdominal_surgery | Procedure codes for abdominal surgery associated with CDI [ |
| Variable Category | Description |
| Categories of Additional Variables Extracted From the EMR | |
| previous visits | Statistics on previous LOS (within 90 days) lengths (total, max, avg) |
| dxcodes | Highest level of ICD9 codes coded during most recent visit |
| labresults | Any laboratory test that was observed within 24 h with flag (high, low, critical) |
| vitals | All vitals with flags (high, low) collected during first 24 h |
| procedures | All procedure codes collected during first 24 h |
| medications | All POE for previous visit and during first 24 h of current visit |
| admission_type | Admission type |
| admission_source | Admission source |
| hospital_service | Hospital service |
| age | Discretized [15, 25, 45, 60, 70, 80, 100] |
| city | City where the patient resides |
| colonization_pressure | Unit and hospital-wide colonization pressure on day of admission |
Abbreviations: avg, average; CDI, Clostridium difficile infection; EMR, electronic medical record; LOS, length of stay; max, maximum; POE, physician order entry.
* We describe each patient admission using 2 sets of variables. We refer to the first set of variables as Curated. The second set of variables consists of all additional data procured from the structured fields of patients' electronic health records.
Performance of 3 Models Varying in Complexity on the Test Data (n = 34 722)*
| Model | Dimensionality | AUC RP > 24 (95% CI) | AUC RP > 48 (95% CI) |
|---|---|---|---|
| EMR | 1017 | 0.8129 (.79–.83) | 0.7886 (.76–.82) |
| Curated | 14 | 0.7163 (.69–.75) | 0.6900 (.66–.72) |
| EMRall | 10 859 | 0.8140 (.80–.83) | 0.7896 (.76–.81) |
Abbreviations: AUC, receiver operating characteristic curve; CI, confidence interval; EMR, electronic medical record; RP, risk period.
* We measure performance in terms of AUC of predictions applied to all of the patients present in the hospital 24 h after admission (who have not yet tested positive for Clostridium difficile) and also a subset of patients with an RP >48 h.
Figure 2.The area under the receiver operating characteristic curve (AUROC) achieved when both the electronic medical record (EMR) and the Curated models were applied to patients in the validation set. Each comparison considers a different subset of patients based on the length of their risk periods. For example, in the third comparison from the left, all patients have a risk period of at least 72 hours.
Figure 3.(A) Receiver operating characteristic (ROC) curves for the first 2 models listed in Table 3. The thin dotted lines represent the 95% confidence bounds generated using 100 bootstrap samples from the test data. A(i) shows the ROC curve generated on all admissions in the test data, whereas A(ii) considers only those patients with a risk period of at least 48 hours. A(iii) focuses on only a portion of the ROC curve presented in 3A(ii). (B) B(i) shows the calibration for the EMR model, and B(ii) shows the calibration for the Curated model. The black dashed lines represent perfect calibration, ie, where the predicted probability aligns with the likelihoods seen when the classifier is applied to the test patients (45 degree line).
Figure 4.(A) Net reclassification improvement (NRI) of using the electronic medical record (EMR) model to classify patients as high risk or low risk versus the Curated model. (B) Confusion matrices for both the EMR model and the Curated model, using a decision threshold based on the 95th percentile. (C) Histogram of when patients in the validation set tested positive for pathogenic C difficile and the number patients correctly identified as high risk in each group, using the EMR model with same decision threshold as in 4(B).