| Literature DB >> 35184750 |
Nicholas Bodkin1, Melissa Ross2, Micah T McClain3,4,5, Emily R Ko3,6, Christopher W Woods3,4,5, Geoffrey S Ginsburg3,7, Ricardo Henao3,8, Ephraim L Tsalik9,10,11.
Abstract
BACKGROUND: Measuring host gene expression is a promising diagnostic strategy to discriminate bacterial and viral infections. Multiple signatures of varying size, complexity, and target populations have been described. However, there is little information to indicate how the performance of various published signatures compare to one another.Entities:
Keywords: Biomarkers; Diagnostics; Gene expression; Infectious disease; Machine learning
Mesh:
Substances:
Year: 2022 PMID: 35184750 PMCID: PMC8858657 DOI: 10.1186/s13073-022-01025-x
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Study flow diagram. The performance of 28 published gene expression signatures and one composite signature was evaluated using leave-one-out cross-validation (LOOCV) in 51 publicly available datasets. LOOCV was performed for both bacterial vs. non-bacterial classification and viral vs. non-viral classification. LOOCV was also performed to measure the performance of signatures in 13 publicly available COVID-19 datasets. Performance was then measured by area under the receiving operating characteristic curve (AUC) values and individual subject predictions. Relative gene importance was characterized by the relative gene weights in each generated model
Characterization of the 28 identified host gene expression signatures
| Signature code | Publication first author | Publication last author | Number of genes | Discovery age group | Discovery phenotypes |
|---|---|---|---|---|---|
| TS1 [ | Tang | Schughart | 1 | Adults | Viral (influenza), healthy |
| HL2a [ | Herberg | Levin | 2 | Pediatrics | Bacterial, viral |
| LC2 [ | Lei | Chen | 2 | All | Bacterial, viral, healthy, SIRS |
| XW2 [ | Xu | Wang | 2 | All | Bacterial, viral, healthy, SIRS |
| GS3 [ | Gomez-Carballo | Salas | 3 | All | Bacterial, viral |
| LS3 [ | Li | Sriskandan | 3 | Adults | Viral (w/ COVID), bacterial, SIRS |
| SB4a [ | Sampson | Brandon | 4 | All | Viral, SIRS |
| SK7a [ | Sweeney | Khatri | 7 | All | Bacterial, viral, SIRS |
| SB8 [ | Sampson | Brandon | 8 | All | Viral, SIRS |
| RC10 [ | Ravichandran | Chandra | 10 | All | Bacterial, viral, healthy |
| SN10 [ | Sampson | Noursadeghi | 10 | All | Bacterial, viral, healthy, SIRS |
| SR10 [ | Suarez | Ramilo | 10 | Adults | Bacterial, viral, co-infection, healthy |
| AK11 [ | Andres-Terre | Khatri | 11 | All | Bacterial, viral, healthy, SIRS |
| BF11 [ | Bhattacharya | Falsey | 11 | Adults | Bacterial, viral |
| NC19 [ | Ng | Chiu | 19 | Adults | Viral (w/ COVID), bacterial, healthy |
| SL20 [ | Song | Lei | 20 | All | Bacterial, viral, healthy, SIRS |
| MW23 [ | McClain | Woods | 23 | All | Viral (w/ COVID), bacterial, healthy |
| ZG25a [ | Zaas | Ginsburg | 25 | Adults | Viral, healthy |
| MS29 [ | Mayhew | Sweeney | 29 | All | Bacterial, viral, healthy, SIRS |
| PT29 [ | Parnell | Tang | 29 | Adults | Bacterial, viral, healthy, SIRS |
| RC31 [ | Ramilo | Chaussabel | 31 | Pediatrics | Bacterial, viral |
| HS33 [ | Hu | Storch | 33 | Infants | Bacterial, viral, healthy |
| HL34 [ | Herberg | Levin | 34 | Pediatrics | Bacterial, viral |
| ZG48 [ | Zaas | Ginsburg | 48 | Adults | Viral, healthy |
| MR59 [ | Mahajan | Ramilo | 59 | Neonates | Bacterial, viral, healthy |
| TW96 [ | Tsalik | Woods | 96 | Adults | Bacterial, viral, SIRS |
| MW139 [ | McClain | Woods | 139 | All | Viral (w/COVID), Bacterial, healthy |
| AK398 [ | Andres-Terre | Khatri | 398 | All | Bacterial, viral, healthy, SIRS |
| All | - | - | 864 | - | - |
Published host gene expression signatures varied in size and discovery cohort characteristics. Signatures were named using the first and last author’s initials, followed by the number of unique genes in the signature. Neonates include subjects <3 months of age; infants include subjects <3 years of age; Pediatrics includes subjects <18 years of age. aSignature is a subset of another published signature
Fig. 2Signature classification performance. A Box-plots were generated for each signature’s AUCs as measured across the validation datasets for bacterial vs. non-bacterial and viral vs. non-viral classification. B Signature AUC distributions were compared against each other with the Wilcoxon rank-sum test, and p-values were plotted in a heatmap for bacterial classification (top) and viral classification (bottom). P-values were corrected for multiple comparisons with the Benjamini/Hochberg method. * indicates p-value ≤ 0.05. C Linear regression was applied to the relationship between the number of genes in a signature (log-transformed) and the signature’s median AUC across the validation datasets for bacterial classification (left) and viral classification (right)
Summarized performance of gene signatures in bacterial and viral classification
| Signature | Bacterial vs. non-bacterial | Viral vs. non-viral | ||||||
|---|---|---|---|---|---|---|---|---|
| Weighted Mean AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | DOR (95% CI) | Weighted Mean AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | DOR (95% CI) | |
| TS1 | 0.547 [0.445–0.658] | 0.825 [0.74–0.887] | 0.592 [0.473–0.702] | 7 [5–10] | 0.826 [0.729–0.892] | 0.872 [0.813–0.914] | 0.867 [0.806–0.911] | 44 [25–79] |
| HL2 | 0.778 [0.667–0.857] | 0.812 [0.723–0.877] | 0.862 [0.787–0.913] | 27 [14–52] | 0.861 [0.803–0.903] | 0.81 [0.74–0.865] | 0.905 [0.84–0.945] | 41 [21–80] |
| LC2 | 0.764 [0.652–0.843] | 0.873 [0.786–0.928] | 0.799 [0.706–0.868] | 27 [14–54] | 0.641 [0.55–0.737] | 0.808 [0.709–0.88] | 0.7 [0.558–0.812] | 10 [6–17] |
| XW2 | 0.690 [0.608–0.763] | 0.741 [0.63–0.828] | 0.803 [0.711–0.871] | 12 [7–19] | 0.851 [0.775–0.894] | 0.815 [0.747–0.867] | 0.874 [0.824–0.911] | 30 [17–53] |
| GS3 | 0.796 [0.655–0.910] | 0.836 [0.75–0.897] | 0.92 [0.852–0.958] | 58 [23–147] | 0.843 [0.767–0.901] | 0.812 [0.735–0.871] | 0.876 [0.813–0.92] | 31 [15–62] |
| LS3 | 0.764 [0.692–0.838] | 0.778 [0.718–0.829] | 0.792 [0.705–0.858] | 13 [8–21] | 0.831 [0.782–0.876] | 0.804 [0.752–0.848] | 0.849 [0.774–0.903] | 23 [13–41] |
| SB4 | 0.776 [0.662-0.876] | 0.83 [0.743-0.892] | 0.792 [0.707–0.858] | 19 [10–36] | 0.868 [0.817–0.914] | 0.807 [0.76–0.846] | 0.896 [0.851–0.929] | 36 [22–60] |
| SK7 | 0.863 [0.812–0.922] | 0.861 [0.813–0.898] | 0.88 [0.821–0.921] | 45 [24–87] | 0.901 [0.849–0.936] | 0.866 [0.822–0.9] | 0.883 [0.829–0.922] | 49 [26–91] |
| SB8 | 0.835 [0.734–0.914] | 0.86 [0.759–0.923] | 0.859 [0.778–0.913] | 37 [17–83] | 0.884 [0.836–0.924] | 0.832 [0.788–0.869] | 0.884 [0.842–0.915] | 38 [23–63] |
| RC10 | 0.849 [0.708–0.931] | 0.886 [0.817–0.931] | 0.909 [0.817–0.958] | 78 [27–227] | 0.922 [0.871–0.955] | 0.892 [0.85–0.923] | 0.901 [0.841–0.939] | 75 [35–161] |
| SN10 | 0.854 [0.758–0.918] | 0.868 [0.825–0.901] | 0.867 [0.786–0.921] | 43 [20–90] | 0.897 [0.846–0.937] | 0.837 [0.787–0.878] | 0.915 [0.872–0.944] | 55 [30–103] |
| SR10 | 0.844 [0.759–0.909] | 0.843 [0.779–0.891] | 0.89 [0.839–0.927] | 43 [23–82] | 0.915 [0.865–0.950] | 0.902 [0.86–0.932] | 0.903 [0.859–0.934] | 85 [42–171] |
| AK11 | 0.794 [0.708–0.879] | 0.794 [0.703–0.862] | 0.875 [0.81–0.92] | 27 [14–51] | 0.887 [0.826–0.923] | 0.849 [0.784–0.897] | 0.862 [0.813–0.9] | 35 [19–66] |
| BF11 | 0.812 [0.742–0.866] | 0.85 [0.794–0.892] | 0.815 [0.746–0.869] | 25 [13–48] | 0.801 [0.748–0.843] | 0.816 [0.768–0.856] | 0.768 [0.706–0.82] | 15 [10–22] |
| NC19 | 0.832 [0.753–0.897] | 0.864 [0.792–0.914] | 0.839 [0.784–0.882] | 33 [18–61] | 0.885 [0.825–0.921] | 0.86 [0.819–0.892] | 0.858 [0.805-0.899] | 37 [21-66] |
| SL20 | 0.850 [0.748–0.907] | 0.84 [0.78–0.886] | 0.88 [0.83–0.917] | 38 [18–81] | 0.915 [0.868–0.948] | 0.899 [0.858–0.929] | 0.889 [0.842–0.924] | 71 [37–136] |
| MW23 | 0.826 [0.732–0.889] | 0.874 [0.82–0.914] | 0.829 [0.738–0.893] | 34 [17–66] | 0.894 [0.838–0.933] | 0.875 [0.826–0.911] | 0.885 [0.839–0.919] | 54 [28–105] |
| ZG25 | 0.817 [0.716–0.889] | 0.843 [0.767–0.898] | 0.849 [0.785–0.896] | 30 [15–60] | 0.882 [0.815–0.926] | 0.86 [0.792–0.909] | 0.892 [0.849–0.924] | 51 [27–95] |
| MS29 | 0.873 [0.766–0.938] | 0.912 [0.855–0.948] | 0.883 [0.796–0.936] | 78 [30–206] | 0.894 [0.826–0.937] | 0.826 [0.754–0.881] | 0.885 [0.827–0.925] | 37 [19–71] |
| PT29 | 0.810 [0.716–0.889] | 0.846 [0.769–0.901] | 0.837 [0.777–0.884] | 28 [15–55] | 0.873 [0.821–0.911] | 0.827 [0.773–0.87] | 0.845 [0.797–0.883] | 26 [15–44] |
| RC31 | 0.842 [0.755–0.903] | 0.868 [0.804–0.913] | 0.849 [0.764–0.907] | 37 [17–79] | 0.891 [0.836–0.927] | 0.857 [0.816–0.89] | 0.871 [0.815–0.912] | 40 [24–67] |
| HS33 | 0.854 [0.771–0.913] | 0.878 [0.812–0.923] | 0.864 [0.796–0.912] | 46 [21–98] | 0.891 [0.820–0.934] | 0.861 [0.796–0.908] | 0.893 [0.838–0.931] | 52 [26–103] |
| HL34 | 0.814 [0.690–0.895] | 0.833 [0.752–0.892] | 0.871 [0.809–0.915] | 34 [17-67] | 0.898 [0.831–0.942] | 0.871 [0.811–0.915] | 0.906 [0.858–0.939] | 65 [33–128] |
| ZG48 | 0.847 [0.760–0.914] | 0.912 [0.841–0.953] | 0.883 [0.8–0.935] | 78 [32–194] | 0.876 [0.799–0.928] | 0.846 [0.79–0.889] | 0.886 [0.84–0.919] | 43 [23–77] |
| MR59 | 0.829 [0.724–0.904] | 0.909 [0.843–0.949] | 0.846 [0.76–0.905] | 55 [24–127] | 0.864 [0.797–0.918] | 0.816 [0.746–0.87] | 0.881 [0.833–0.916] | 33 [18–61] |
| TW96 | 0.844 [0.757–0.921] | 0.908 [0.83–0.952] | 0.91 [0.843–0.95] | 99 [36–271] | 0.871 [0.808–0.935] | 0.923 [0.867–0.957] | 0.898 [0.836–0.938] | 106 [39–284] |
| MW139 | 0.834 [0.750–0.908] | 0.906 [0.835–0.949] | 0.869 [0.788–0.922] | 64 [28–146] | 0.871 [0.808–0.923] | 0.887 [0.841–0.921] | 0.884 [0.808–0.932] | 60 [26–138] |
| AK398 | 0.886 [0.820–0.951] | 0.939 [0.887–0.968] | 0.923 [0.868–0.956] | 184 [61–553] | 0.896 [0.850–0.946] | 0.912 [0.867–0.942] | 0.932 [0.866–0.966] | 141 [51–392] |
| All | 0.905 [0.842–0.957] | 0.927 [0.881–0.956] | 0.927 [0.875–0.959] | 162 [65–401] | 0.933 [0.898–0.965] | 0.934 [0.9–0.957] | 0.92 [0.872–0.951] | 164 [73–369] |
Weighted mean AUC, sensitivity, specificity, and diagnostic odds ratio (DOR) for each host gene expression signature are presented. Values were weighted based on the number of subjects in the validation dataset. Sensitivity, specificity, and DOR values and their confidence intervals were calculated using hierarchical summary ROC modeling.
Overall signature classification performance stratified by dataset characteristics
| Parameter | Bacterial vs. non-bacterial | Viral vs. non-viral | ||||||
|---|---|---|---|---|---|---|---|---|
| Median AUC | IQR | N | Median AUC | IQR | N | |||
| 0.832 | [0.796–0.849] | - | 31 | 0.884 | [0.864–0.896] | - | 37 | |
| - | - | - | - | - | - | - | - | |
| Adult only | 0.846 | [0.826–0.870] | - | 14 | 0.916 | [0.906–0.935] | - | 12 |
| Pediatric only | 0.798 | [0.756–0.818] | 15 | 0.860 | [0.841–0.870] | 24 | ||
| - | - | - | - | - | - | - | - | |
| 2 phenotypes | 0.871 | [0.846–0.894] | - | 12 | 0.911 | [0.880–0.923] | - | 22 |
| >2 phenotypes | 0.819 | [0.788–0.835] | 19 | 0.855 | [0.838–0.874] | 15 | ||
| - | - | - | - | - | - | - | - | |
| Whole blood | 0.838 | [0.802–0.859] | - | 28 | 0.884 | [0.865–0.900] | - | 32 |
| PBMC | 0.705 | [0.668–0.753] | 3 | 0.831 | [0.791–0.882] | 4 | ||
AUCs were calculated for each of the 29 evaluated signatures and then stratified by different dataset characteristics. Mean AUCs were first generated for each signature across the datasets in the parameter group, weighted by the number of subjects in each validation dataset. The median of the weighted AUC values and IQR were then calculated and presented here. N represents the number of datasets for the specified cohort composition.
Fig. 3Signature classification performance by age. A Weighted mean AUCs were generated for each signature’s classification of bacterial patients and viral patients across pediatric-only (red) and adult-only (blue) datasets. Values were weighted based on the number of subjects in a validation dataset. The distributions of such weighted mean AUCs were plotted, and significance was determined by the Wilcoxon rank-sum test. B After pooling samples across datasets, each signature’s accuracy was calculated and plotted for five age groups (<3 months, 3 months–1 year, 2–11 years, 12–18 years, and adult). This plot shows the median and IQR of each signature’s accuracy in each age group for bacterial and viral classification. * indicates p < 0.05 as compared to the adult population
Overall accuracy of gene signatures in distinct patient populations
| Parameter | Bacterial vs. non-bacterial | Viral vs. non-viral | ||||
|---|---|---|---|---|---|---|
| Accuracy (%) | Accuracy (%) | |||||
| 79 (78–80) | - | 2887/31 | 84 (83–85) | - | 3584/37 | |
| - | - | - | - | - | - | |
| All Bacteriala | 81 (79–83) | - | 951/31 | - | - | - |
| | 90 (85–93) | 45/2 | - | - | - | |
| | 84 (79–89) | 0.972 | 64/7 | - | - | - |
| | 83 (79–87) | 0.972 | 118/8 | - | - | - |
| | 91 (88–94) | 58/4 | - | - | - | |
| | 83 (76–89) | 0.972 | 39/6 | - | - | - |
| | 82 (75–88) | 0.972 | 43/8 | - | - | - |
| Intracellular bacteria | 83 (79–88) | 0.178 | 100/4 | - | - | - |
| Extracellular bacteria | 84 (81–86) | 0.173 | 415/17 | - | - | - |
| All Virala | - | - | - | 82 (80–83) | - | 1679/37 |
| Adenovirus | - | - | - | 74 (62–84) | 30/2 | |
| Enterovirus | - | - | - | 84 (76–89) | 0.937 | 58/3 |
| Influenza | - | - | - | 89 (87–90) | 431/19 | |
| Rhinovirus | - | - | - | 74 (70–78) | 209/9 | |
| RSV | - | - | - | 81 (79–84) | 0.083 | 406/13 |
| - | - | - | - | - | - | |
| Adulta | 82 (80–83) | - | 1183/18 | 88 (86–89) | - | 1268/14 |
| 12–18 years | 82 (78–85) | 0.299 | 132/6 | 88 (84–92) | 0.631 | 95/6 |
| 2–11 years | 70 (67–73) | 373/7 | 79 (76–82) | 352/10 | ||
| 3 months–1 year | 73 (69–77) | 183/8 | 80 (78–82) | 576/17 | ||
| <3 months | 85 (82–88) | 320/8 | 81 (79–84) | 547/16 | ||
| - | - | - | - | - | - | |
| All Subjectsa | 77 (76–79) | - | 1389/12 | 80 (78–81) | - | 1157/12 |
| Asian | 84 (79–89) | 87/9 | 84 (76–91) | 0.277 | 33/7 | |
| Black | 79 (75–82) | 0.059 | 311/11 | 77 (73–81) | 0.277 | 254/12 |
| White | 76 (73–78) | 684/11 | 80 (78–82) | 0.784 | 686/12 | |
| Other | 71 (64–78) | 72/5 | 74 (67-82) | 0.277 | 79/6 | |
| - | - | - | - | - | - | |
| Not Hispanic or Latinoa | 75 (73–77) | - | 407/4 | 79 (77–81) | 474/5 | |
| Hispanic or Latino | 80 (77–84) | 302/9 | 85 (81–88) | 220/11 | ||
| - | - | - | - | - | - | |
| Non-ICUa | 73 (66–79) | - | 43/2 | 83 (79–86) | - | 117/3 |
| ICU | 69 (66–73) | 0.279 | 182/8 | 86 (82–90) | 0.105 | 107/7 |
Average accuracies and 95% confidence intervals of bacterial and viral classification, stratified by different clinical parameters. Only groups with at least fifteen subjects across at least two datasets were evaluated. P-values represent statistical significance, comparing the group to its reference population. N is represented by the number of subjects/the number of datasets used for validation. The “Intracellular Bacteria” group includes subjects with B. pseudomallei, S. typhi, and Mycoplasma infection. The “Extracellular Bacteria” group includes all other subjects with bacterial infection for which an identified pathogen was available. For comparisons related to “Race,” the “All Subjects” group represents all subjects for which racial information was available. a Indicates the reference population used for determination of significance