| Literature DB >> 24367543 |
Kun Wang1, Vineet Bhandari2, Sofya Chepustanova3, Greg Huber4, Stephen O'Hara3, Corey S O'Hern5, Mark D Shattuck6, Michael Kirby3.
Abstract
We address the identification of optimal biomarkers for the rapid diagnosis of neonatal sepsis. We employ both canonical correlation analysis (CCA) and sparse support vector machine (SSVM) classifiers to select the best subset of biomarkers from a large hematological data set collected from infants with suspected sepsis from Yale-New Haven Hospital's Neonatal Intensive Care Unit (NICU). CCA is used to select sets of biomarkers of increasing size that are most highly correlated with infection. The effectiveness of these biomarkers is then validated by constructing a sparse support vector machine diagnostic classifier. We find that the following set of five biomarkers capture the essential diagnostic information (in order of importance): Bands, Platelets, neutrophil CD64, White Blood Cells, and Segs. Further, the diagnostic performance of the optimal set of biomarkers is significantly higher than that of isolated individual biomarkers. These results suggest an enhanced sepsis scoring system for neonatal sepsis that includes these five biomarkers. We demonstrate the robustness of our analysis by comparing CCA with the Forward Selection method and SSVM with LASSO Logistic Regression.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24367543 PMCID: PMC3867385 DOI: 10.1371/journal.pone.0082700
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of the Canonical Correlation Analysis and Forward Selection of the biomarkers.
|
| Correlation | Enter | Leave | Forward Selection |
| 1 | 0.563 | Bands | Bands | |
| 2 | 0.615 | Plt | Plt | |
| 3 | 0.633 | Hgb | Hgb | |
| 4 | 0.643 | CD64 | CD64 | |
| 5 | 0.653 | Segs, WBC | Hgb | Segs |
| 6 | 0.660 | Hgb | WBC | |
| 7 | 0.663 | Age | Age | |
| 8 | 0.664 | Lymph | Hct | |
| 9 | 0.666 | Mono | Lymph | |
| 10 | 0.668 | Hct | Mono |
By applying CCA for all possible -combinations , the subset of biomarkers with the highest correlation with the sepsis score is determined. The ‘Enter’ column indicates which biomarker is added to achieve the highest correlation at each . The ‘Leave’ column indicates which biomarker is eliminated from the combination at that particular . A biomarker will stay in the combination until it occurs in ‘Leave’ column. For instance, for the -combination, the most correlated biomarkers include Bands, Plt, CD64, Segs, and WBC. Hgb, which was present in the -combination, is replaced by Segs and WBC at level . The ‘Forward Selection’ column is the biomarker selected by the forward selection method when applied one biomarker at a time.
Parameters for the classifier at k = 5.
| m | Biomarker | Mean | Standard Deviation | Weight | Standard Error of Weight |
| 1 | WBC | 14.04 | 8.70 | 0.373 | 0.009 |
| 2 | Plt | 231.37 | 103.38 | −0.876 | 0.012 |
| 3 | Segs | 39.64 | 17.25 | −0.699 | 0.008 |
| 4 | Bands | 7.92 | 9.61 | 2.691 | 0.018 |
| 5 | CD64 | 2.96 | 2.42 | 0.446 | 0.012 |
The parameters of the classifier for the sepsis score given in Equation (1), including the standard errors for each biomarker weight .
Performance of the classifier at k = 5 for SSVM and LLR.
| Method | TPR | TNR | PPV | NPV | ACC |
| SSVM | 0.838 | 0.905 | 0.893 | 0.856 | 0.875 |
| LLR | 0.740 | 0.960 | 0.945 | 0.797 | 0.853 |
Prediction measures for the classifier at k = 5 built by SSVM and LLR: true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC).
Figure 1Prediction measures obtained from the (A) Sparse Support Vector Machine and (B) LASSO Logistic Regression methods.
True positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC) are shown for each -combination of biomarkers selected.
Figure 2Receiver operating characteristic (ROC) curves.
ROC curves of TPR versus FPR for optimal sets of biomarkers where averaged over SSVM models. The shaded region in the inset shows the standard deviation for .
Figure 3Exhaustive evaluation of statistical measures.
The highest TPR, TNR, PPV, NPV, ACC values when SSVM was applied for all possible combinations of biomarkers (blue circles) from . The solid red circles are the values for models built using the best biomarkers selected by CCA.