| Literature DB >> 35042868 |
Ljubomir Buturovic1, Hong Zheng2,3, Benjamin Tang4,5,6,7, Kevin Lai8, Win Sen Kuan9, Mark Gillett10, Rahul Santram11, Maryam Shojaei4,5, Raquel Almansa12,13, Jose Ángel Nieto14, Sonsoles Muñoz14, Carmen Herrero14, Nikolaos Antonakos15, Panayiotis Koufargyris15, Marina Kontogiorgi15, Georgia Damoraki15, Oliver Liesenfeld1, James Wacker1, Uros Midic1, Roland Luethy1, David Rawling1, Melissa Remmel1, Sabrina Coyle1, Yiran E Liu2,16, Aditya M Rao2,17, Denis Dermadi2,3, Jiaying Toh2,17, Lara Murphy Jones2,18, Michele Donato2,3, Purvesh Khatri2,3, Evangelos J Giamarellos-Bourboulis15, Timothy E Sweeney19.
Abstract
Predicting the severity of COVID-19 remains an unmet medical need. Our objective was to develop a blood-based host-gene-expression classifier for the severity of viral infections and validate it in independent data, including COVID-19. We developed a logistic regression-based classifier for the severity of viral infections and validated it in multiple viral infection settings including COVID-19. We used training data (N = 705) from 21 retrospective transcriptomic clinical studies of influenza and other viral illnesses looking at a preselected panel of host immune response messenger RNAs. We selected 6 host RNAs and trained logistic regression classifier with a cross-validation area under curve of 0.90 for predicting 30-day mortality in viral illnesses. Next, in 1417 samples across 21 independent retrospective cohorts the locked 6-RNA classifier had an area under curve of 0.94 for discriminating patients with severe vs. non-severe infection. Next, in independent cohorts of prospectively (N = 97) and retrospectively (N = 100) enrolled patients with confirmed COVID-19, the classifier had an area under curve of 0.89 and 0.87, respectively, for identifying patients with severe respiratory failure or 30-day mortality. Finally, we developed a loop-mediated isothermal gene expression assay for the 6-messenger-RNA panel to facilitate implementation as a rapid assay. With further study, the classifier could assist in the risk assessment of COVID-19 and other acute viral infections patients to determine severity and level of care, thereby improving patient management and reducing healthcare burden.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35042868 PMCID: PMC8766462 DOI: 10.1038/s41598-021-04509-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Study flow. (a) Clinical data flows for training and testing. (b) Machine learning worfklow used to develop and validate the 6-mRNA viral severity classifier. LOSO Leave-One-Study-Out, CV cross-validation, AUROC Area Under ROC curve.
Characteristics of viral infection studies used for training. COPD chronic pulmonary obstruction disorder, ICU intensive care unit, TB tuberculosis, CAP community-acquired pneumonia.
| Study identifier | First author or PI | Study description | Timing of sample collection | N (survivors/non-survivors) | Age (median, IQR) | Male (n (%)) | Country | Platform |
|---|---|---|---|---|---|---|---|---|
| E-MEXP-3589 | Almansa | Patients hospitalized with COPD exacerbation | Hospital/ICU admission | 5 (5/0) | Unk | 5(100) | Spain | Agilent |
| E-MTAB-1548 | Almansa | Surgical patients with sepsis (EXPRESS) | Average post-operation day 4 | 3 (3/0) | 78.0 (71.5–79.5) | 3(100) | Spain | Agilent |
| E-MEXP-3162 | Van de Weg | Uncomplicated dengue | Within 48 h of onset | 21 (21/0) | Unk | Unk | Indonesia | Affymetrix |
| GSE13015 (GPL6102) | Pankla | Sepsis, many cases from burkholderia | Within 48 h of diagnosis | 3 (2/1) | 54.0 (46.0–55.5) | 1(33) | Thailand | Illumina |
| GSE13015 (GPL6947) | 2 (2/0) | 64.5 (56.2–72.8) | 1(50) | Illumina | ||||
| GSE21802 | Bermejo-Martin | Pandemic H1N1 in ICU | Within 48 h of ICU admission | 6 (5/1) | Unk | Unk | Canada | Illumina |
| GSE22098 | Berry | Patients with active TB and other inflammatory and infectious diseases | At admission | 39 (39/0) | 31.0 (19.0–47.0) | 6(15) | UK, South Africa | Illumina |
| GSE27131 | Berdal | Severe H1N1 influenza | Admission to ICU | 3 (2/1) | 38.0 (31.5–46.0) | 3(100) | Norway | Affymetrix |
| GSE28991 | Naim | Acute dengue fever | Within 72 h of onset | 11 (11/0) | Unk | Unk | Singapore | Illumina |
| GSE32707 | Dolinay | Critically ill patients in ICU (Sepsis, SIRS and/or ARDS) | Admission to ICU | 7 (5/2) | 45.0 (39.0–50.5) | 4(57) | USA | Illumina |
| GSE40012 | Parnell | Bacterial or influenza A pneumonia or SIRS | Admission to ICU | 11 (11/0) | Unk | 4(36) | Australia | Illumina |
| GSE54514 | Parnell | Sepsis patients in ICU | Admission to ICU | 2 (2/0) | 62.5 (60.2–64.8) | 1(50) | Australia | Illumina |
| GSE51808 | Kwissa | Acute dengue fever | 1–8 days after onset | 28 (28/0) | Unk | Thailand | Affymetrix | |
| GSE60244 | Suarez | Lower respiratory tract infections | Within 24 h of admission | 62 (62/0) | 59.0 (50.0–74.5) | 24(39) | USA | Illumina |
| GSE65682 | Scicluna | Suspected but negative for CAP | Within 24 h of ICU admission | 9 (7/2) | 67.0 (63.0–73.0) | 7(78) | Netherlands | Affymetrix |
| GSE68310 | Zhai | Outpatients with acute respiratory viral infections | Within 48 h of onset | 75 (75/0) | 21.0 (20.4–22.3) | 34(45) | USA | Illumina |
| GSE82050 | Tang | Moderate and severe influenza infection | Within 24 h of admission | 17 (17/0) | 55.0 (45.0–72.0) | Unk | Germany | Agilent |
| GSE95233 | Venet | Septic shock patients in ICU | Admission to ICU | 7 (5/2) | 47.0 (42.0–65.0) | 5(71) | France | Affymetrix |
| Australia/WIMR | Tang | Community or hospital clinics with influenza-like illness | At presentation | 332 (321/11) | 48.0 (32.0–63.5) | 129(39) | Australia | Nanostring |
| Stanford ICU databank | Rogers | Suspected sepsis with ARDS risk factors | Admission to ICU | 8 (6/2) | 62.0 (55.5–67.2) | 4(50) | USA | Nanostring |
| PROMPT | Giamarellos-Bourboulis | Suspected infection with 2 + SIRS | Admission to ED | 1 (1/0) | 78.0 | 0(0) | Greece | Nanostring |
| PREVISE | Herrero | Outpatient urgent care with suspected CAP | At presentation | 53 (52/1) | 78.0 (66.0–87.0) | 33(62) | Spain | Nanostring |
Prognostic power of the 6-mRNA signature classifier and comparator scores and markers in the independent prospective COVID-19 cohort. Shown are AUROCs for non-missing data, plus 95% CI. The fourth column is a ‘fair’ assessment of the 6-mRNA signature classifier, i.e. the performance on the subset of patients that was available to the comparator.
| Comparator marker | Number available | Comparator AUROC | 6-mRNA classifier AUROC | P value |
|---|---|---|---|---|
| 6-mRNA classifier | 97 | 0.89 (0.82–0.95) | ||
| SOFA | 96 | 0.89 | 0.31 | |
| APACHE II | 93 | 0.83 | 0.19 | |
| Age | 96 | 0.78 | 0.07 | |
| PCT | 76 | 0.80 | 0.07 | |
| CRP | 97 | 0.86 | 0.58 | |
| Lactate | 45 | 0.75 | 0.52 | |
| IL-6 | 97 | 0.73 | < 1e−3 | |
| suPAR | 97 | 0.79 | 0.04 | |
| 6-mRNA classifier | 97 | 0.78 (0.64–0.92) | ||
| SOFA | 96 | 0.72 | 0.43 | |
| APACHE II | 93 | 0.76 | 0.84 | |
| Age | 96 | 0.74 | 0.68 | |
| PCT | 76 | 0.73 | 0.63 | |
| CRP | 97 | 0.74 | 0.62 | |
| Lactate | 45 | 0.78 | 0.84 | |
| IL-6 | 97 | 0.57 | 0.003 | |
| suPAR | 97 | 0.74 | 0.71 | |
Figure 2Training data for the 6-mRNA classifier. (a) Visualization of 705 samples across 21 studies in low dimension using t-SNE. (b) Logistic regression model selection. Each dot corresponds to a model defined by a combination of logistic regression hyperparameters and a decision threshold. Entire search space (100 hyperparameter configurations) is shown. (c) ROC plot for the best model. The plot is constructed using pooled probabilities from cross-validation folds. (d) Expression of the 6 genes used in the logistic regression model according to mortality outcomes.
Figure 3Validation of the 6-mRNA classifier in the independent retrospective non-COVID-19 cohorts. (a) Visualization of the samples using t-SNE. (b) Expression of the 6 genes used in the logistic regression model in patients with clinically relevant subgroups. (c) 6-mRNA classifier accurately distinguishes non-severe and severe patients with COVID-19 as well as those who died. (d) ROC plot for the subgroups. Excluding healthy samples, about 65% of the blood samples were estimated to have been collected “early”, and 35% “late”. See “Standardized severity assignment for retrospective non-COVID-19 patient samples” for details.
Demographics, severity scores, and severity markers for the prospective COVID-19 cohort, overall and split by mortality. P-values correspond to Mann–Whitney tests for difference of means and chi-square tests for difference of proportions between the survival and mortality groups. Unless indicated otherwise, numbers shown are median [IQR].
| Variable | Overall | Death | Survival | P value |
|---|---|---|---|---|
| N | 97 | 16 | 81 | |
| Age (years) | 62 [52, 72.25] | 68.50 [62.75, 84.25] | 60.00 [50.75, 70.25] | 0.003 |
| Gender = male (%) | 68 (70.1) | 12 (75.0) | 56 (69.1) | 0.865 |
| White blood cells (/mm3) | 6770 [5145, 10,227.50] | 8540.00 [5542.50, 12,510.00] | 6480.00 [5145.00, 9622.50] | 0.275 |
| Neutrophils (%) | 78.10 [68.35, 86.60] | 88.95 [86.40, 93.03] | 77.09 [65.22, 83.75] | < 0.001 |
| Lymphocytes (%) | 12.70 [7.20, 21.15] | 6.70 [3.65, 9.65] | 14.03 [9.00, 22.42] | < 0.001 |
| Platelets (/mm3) | 215,000 [172,900, 266,000] | 249,050 [180,750, 298,000] | 214,000 [172,600, 260,800] | 0.176 |
| D-dimer (ng/ml) | 977.90 [476.25, 2560.00] | 4480.00 [2440.00, 13,161.50] | 850.00 [437.50, 1947.50] | < 0.001 |
| CRP (mg/l) | 107.00 [31.60, 222.50] | 224.75 [142.89, 260.75] | 79.10 [28.80, 202.00] | 0.002 |
| SOFA score | 3.00 [1.00, 6.00] | 5.50 [4.00, 6.25] | 2 [1, 6] | 0.006 |
| APACHE II | 7.00 [5.00, 11.00] | 11.00 [8.00, 13.50] | 7 [4, 9] | 0.001 |
| Length of hospital stay | 13.00 [11.00, 20.00] | 13 [8.75, 17.25] | 13 [11, 20] | 0.410 |
| Severe respiratory failure (%) | 50 (51.5) | 16 (100.0) | 34 (42.0) | < 0.001 |
Figure 4Validation of the 6-mRNA classifier in the COVID-19 cohorts. (a) Visualization of prospective (Greece, N = 97) and retrospective (Albany, US, N = 100) samples in the independent validation cohorts using t-SNE. (b) Expression of the 6 genes used in the logistic regression model in patients with severe/fatal and non-severe SARS-CoV-2 viral infection. (c) 6-mRNA classifier accurately distinguishes non-severe and severe patients with COVID-19 as well as those who died. (d) ROC plot for non-severe COVID-19 vs. severe or death (samples from healthy controls not included).
Figure 5Validation of the 6-mRNA classifier in the COVID-19 and non-COVID-19 cohorts (pooled results of data in Figs. 3, 4, excluding healthy subjects). The total number of samples was 951. The number of cases (severe, including fatal, disease) was 187. The remaining samples (764) had non-severe disease. AUROC = 0.90 (95% CI 0.87–0.92).
Test characteristics of the 6-mRNA score in non-COVID-19 and COVID-19 patients using the three-band test report. “Severe in band” is the number of patients with severe viral infection assigned to the corresponding band. “Non-severe in band” is the number of patients with non-severe viral infection assigned to the corresponding band. The “Percent severe in band” is the percentage of patients in the band who had severe outcome. The “In-band” column is the percentage of patients assigned by the classifier to the corresponding band.
| Band | Severe in band | Non-severe in band | Percent severe in band (%) | Sensitivity (%) | Specificity (%) | Likelihood ratio | In-band (%) |
|---|---|---|---|---|---|---|---|
| Low risk | 12 | 570 | 2.0 | 85 | 85 | 0.18 | 77 |
| Intermediate risk | 43 | 78 | 36 | 54 | 88 | 4.6 | 16 |
| High risk | 25 | 26 | 49 | 31 | 96 | 8.1 | 6.7 |
| Low risk | 2 | 420 | 0.47 | 98 | 62 | 0.04 | 56 |
| Intermediate risk | 68 | 246 | 22 | 85 | 64 | 2.3 | 42 |
| High risk | 10 | 8 | 56 | 12 | 99 | 11 | 2.4 |
| Low risk | 2 | 18 | 10 | 96 | 38 | 0.10 | 21 |
| Intermediate risk | 20 | 28 | 42 | 40 | 40 | 0.67 | 50 |
| High risk | 28 | 1 | 97 | 56 | 98 | 26 | 30 |
| Low risk | 0 | 12 | 0 | 100 | 26 | 0.0 | 12 |
| Intermediate risk | 17 | 32 | 35 | 34 | 32 | 0.50 | 51 |
| High risk | 33 | 3 | 92 | 66 | 94 | 10 | 37 |
| Low risk | 14 | 37 | 27 | 75 | 86 | 0.28 | 51 |
| Intermediate risk | 33 | 6 | 85 | 58 | 86 | 4.1 | 39 |
| High risk | 10 | 0 | 100 | 18 | 100 | Inf | 10 |
| Low risk | 1 | 16 | 6 | 98 | 37 | 0.047 | 17 |
| Intermediate risk | 14 | 24 | 37 | 25 | 44 | 0.44 | 38 |
| High risk | 42 | 3 | 93 | 74 | 93 | 11 | 45 |