| Literature DB >> 28883645 |
Ishan Taneja1,2,3, Bobby Reddy1,2,3, Gregory Damhorst1,2, Sihai Dave Zhao4, Umer Hassan1,2, Zachary Price1,2, Tor Jensen1,2, Tanmay Ghonge1,2, Manish Patel1,2, Samuel Wachspress1,2,3, Jackson Winter1,2,3, Michael Rappleye1,2, Gillian Smith1,2, Ryan Healey1,2, Muhammad Ajmal2, Muhammad Khan2, Jay Patel2, Harsh Rawal2, Raiya Sarwar2, Sumeet Soni2, Syed Anwaruddin2, Benjamin Davis2, James Kumar2, Karen White2, Rashid Bashir5,6, Ruoqing Zhu7.
Abstract
Sepsis is a leading cause of death and is the most expensive condition to treat in U.S. hospitals. Despite targeted efforts to automate earlier detection of sepsis, current techniques rely exclusively on using either standard clinical data or novel biomarker measurements. In this study, we apply machine learning techniques to assess the predictive power of combining multiple biomarker measurements from a single blood sample with electronic medical record data (EMR) for the identification of patients in the early to peak phase of sepsis in a large community hospital setting. Combining biomarkers and EMR data achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.81, while EMR data alone achieved an AUC of 0.75. Furthermore, a single measurement of six biomarkers (IL-6, nCD64, IL-1ra, PCT, MCP1, and G-CSF) yielded the same predictive power as collecting an additional 16 hours of EMR data(AUC of 0.80), suggesting that the biomarkers may be useful for identifying these patients earlier. Ultimately, supervised learning using a subset of biomarker and EMR data as features may be capable of identifying patients in the early to peak phase of sepsis in a diverse population and may provide a tool for more timely identification and intervention.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28883645 PMCID: PMC5589821 DOI: 10.1038/s41598-017-09766-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Early to Peak Phase Sepsis Illustration. The red curve represents a hypothetical patient’s severity of sepsis as a function of time, where the apex of the curve corresponds to their worst case state. A patient is considered to be in the early to peak phase of sepsis if they exist on a point on the curve that is not shaded. We marked the boundaries of peak phase sepsis to illustrate the fact that a patient is in the peak phase of sepsis if they are within some tolerance of their worst case state.
Patient characteristics where septic population is determined by clinical adjudication labels. Blood culture results and SIRS criteria measurements reported were taken at the time of the biomarker measurement.
| Characteristic | Early to Peak Phase Sepsis Population (N = 76) | Non-septic and Recovery Phase SepsisPopulation (N = 368) | |
|---|---|---|---|
| 61, 19 | 61, 20 | ||
| 47% male, 53% female | 56% male, 44% female | ||
| Cancer | 20% | 17% | |
| Diabetes | 29% | 19% | |
| Chronic Kidney Disease | 9% | 9% | |
| COPD | 12% | 10% | |
| Asthma | 1% | 1% | |
| 89% | 67% | ||
| Pneumonia | 28% | 23% | |
| Cellulitis | 14% | 19% | |
| UTI | 41% | 19% | |
| GI infections | 11% | 5% | |
| 38% | 22% | ||
| 93% | 64% | ||
| SIRS (exactly 2/4 criteria) | 25% | 33% | |
| SIRS (exactly 3/4 criteria) | 39% | 25% | |
| SIRS (exactly 4/4 criteria) | 29% | 8% | |
| 79% | 42% | ||
| Infection + SIRS (exactly 2/4 criteria) | 22% | 22% | |
| Infection + SIRS (exactly 3/4 criteria) | 36% | 16% | |
| Infection + SIRS (exactly 4/4 criteria) | 21% | 4% | |
| 3.77, 3.39 | 2.17, 2.45 | ||
| 1.46, .92 | 1.07, .94 | ||
AUCs using individual biomarkers as the sole feature. SVMs were used with the clinical adjudication label set to calculate the AUC for each biomarker.
| Individual Biomarker | AUC |
|---|---|
| TNF-α | 0.60 |
| IL-1β | 0.49 |
| GCSF | 0.51 |
| IL6 | 0.77 |
| PCT | 0.71 |
| sTREM1 | 0.43 |
| IL18 | 0.44 |
| MMP9 | 0.42 |
| TNFR1 | 0.63 |
| TNFR2 | 0.65 |
| IP10 | 0.52 |
| MCP1 | 0.60 |
| IL1ra | 0.68 |
| NGAL | 0.58 |
| CD64 | 0.65 |
AUCs using individual EMR parameters as the sole feature. SVMs were used with the clinical adjudication label set to calculate the AUC for EMR parameters commonly used in clinical suspicion of sepsis.
| Individual EMR Parameters | AUC |
|---|---|
| Leukocyte Count | 0.67 |
| Lactic Acid | 0.43 |
| Systolic Blood Pressure | 0.43 |
| Pulse | 0.54 |
| Temperature | 0.45 |
| Respirations | 0.40 |
Algorithm’s AUC as a function of feature set for clinical adjudication labels. The AUC is calculated for each algorithm with and without its respective feature selection method. The EMR data that is used is constrained to up to 48 hours before the biomarker measurement and 1 hour after.
| Algorithm | All features | EMR | Biomarkers |
|---|---|---|---|
| Logistic Regression | 0.78 | 0.72 | 0.78 |
| Logistic Regression w/ feature selection | 0.79 | 0.73 | 0.79 |
| SVM | 0.79 | 0.73 | 0.78 |
| SVM w/ feature selection | 0.81 | 0.75 | 0.80 |
| Random Forest | 0.77 | 0.70 | 0.74 |
| Random Forest w/ feature selection | 0.77 | 0.72 | 0.77 |
| Adaboost | 0.81 | 0.75 | 0.79 |
| Adaboost w/ feature selection | 0.81 | 0.76 | 0.80 |
| Naive Bayes | 0.76 | 0.69 | 0.77 |
| Naive Bayes w/feature selection | 0.80 | 0.73 | 0.79 |
Figure 2Normalized feature coefficients outputted by SVM for clinical adjudication label set. The absolute value of each feature coefficient in SVM corresponds to its relative importance.
Figure 3SVM w/feature selection performance as a function of time for clinical adjudication label set. (A) ROC curves are displayed for various feature sets. The EMR data that is used is constrained to up to 48 hours before the biomarker measurement and 1 hour after. (B) A plot of the AUC as a function of the number of hours of EMR data used post biomarker measurement.
Regression method’s Spearman coefficient as a function of feature set. The Spearman coefficient is calculated for each regression technique with and without its respective feature selection method. The EMR data that is used is constrained to up to 48 hours before the biomarker measurement and 1 hour after.
| Regression Model | All features (μ,σ) | EMR (μ,σ) | Biomarkers (μ,σ) |
|---|---|---|---|
| Logistic Regression | 0.62, 0.10 | 0.38,0.13 | 0.52, 0.11 |
| Logistic Regression w/ feature selection | 0.60,0.10 | 0.36, 0.14 | 0.52, 0.11 |
| SVM | 0.63, 0.09 | 0.45, 0.12 | 0.53, 0.11 |
| SVM w/ feature selection | 0.59. 11 | 0.47, 0.11 | 0.46, 0.12 |
| Random Forest | 0.69, 0.08 | 0.50, 0.11 | 0.56, 0.10 |
| Random Forest w/ feature selection | 0.69,0.07 | 0.51, 0.11 | 0.57,0.10 |
| Adaboost | 0.67, 0.08 | 0.47, 0.12 | 0.57, 0.10 |
| Adaboost w/ feature selection | 0.62, 0.09 | 0.47, 0.12 | 0.56, 0.10 |
Figure 4Normalized feature coefficients outputted by Random Forest for SOFA score label set.
Figure 5Heatmap for clinical adjudication label set. The x-axis corresponds to which category the patient was adjudicated to be in (see Materials and Methods) and the y-axis corresponds to the rank of the patient according to the probability outputted by SVM. For each patient, a line is plotted. The x coordinate of the line corresponds to which category the patient is labelled to be in and whose y coordinate corresponds to the rank of the patient is plotted. The color of this line is based on the probability that the patient is in early or peak phase according to SVM. The mapping from probability to color is displayed at the right of the figure. The vertical dotted white line separates the septic (categories 2–5) from the non-septic patients (categories 1, 6–11). The horizontal dotted line represents the patient whose probability of having sepsis was 0.50 according to SVM. The upper left quadrant represents the false positives, the upper right quadrant represents the true positives, the lower left quadrant represents the true negatives, and the lower right quadrant represents the false negatives. The black background corresponds to empty entries in the heatmap.
SVM performance as a function of category. We report the sensitivity/specificity and average probability outputted by SVM for each category. Each number in the second column refers to sensitivity if the row corresponds categories 2–5 (patients considered positive) and specificity if the row corresponds to categories 1, 6–11 (patients considered negative).
| Category | Sensitivity/Specificity | SVM probability (µ, σ) | Number of samples |
|---|---|---|---|
| 1, 9–11 | 0.82 | (0.32,0.16) | 269 |
| 2 | 0.62 | (0.52,0.15) | 26 |
| 3 | 0.79 | (0.55,0.14) | 24 |
| 4 | 0.85 | (0.62,.13) | 20 |
| 5 | 1.0 | (0.62,0.07) | 6 |
| 6 | 0.67 | (0.39,0.17) | 58 |
| 7 | 0.42 | (0.50,0.17) | 58 |
| 8 | 0.57 | (0.40,0.18) | 7 |
Figure 6Feature coefficient outputted by SVM as a function of category.