| Literature DB >> 35346346 |
Yiran E Liu1,2,3, Sirle Saul3, Shirit Einav3,4, Purvesh Khatri5,6, Aditya Manohar Rao1,7, Makeda Lucretia Robinson3,8, Olga Lucia Agudelo Rojas9, Ana Maria Sanz9, Michelle Verghese8, Daniel Solis8, Mamdouh Sibai8, Chun Hong Huang8, Malaya Kumar Sahoo8, Rosa Margarita Gelvez10, Nathalia Bueno10, Maria Isabel Estupiñan Cardenas10, Luis Angel Villar Centeno10, Elsa Marina Rojas Garrido10, Fernando Rosso9,11, Michele Donato1,12, Benjamin A Pinsky3,8.
Abstract
BACKGROUND: Each year 3-6 million people develop life-threatening severe dengue (SD). Clinical warning signs for SD manifest late in the disease course and are nonspecific, leading to missed cases and excess hospital burden. Better SD prognostics are urgently needed.Entities:
Keywords: Biomarkers; Dengue; Gene signature; Host response; Machine learning; Prognostic; Severe dengue
Mesh:
Year: 2022 PMID: 35346346 PMCID: PMC8959795 DOI: 10.1186/s13073-022-01034-w
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Publicly available datasets used for discovery of the 8-gene set and training of the 8-gene XGBoost model. Healthy controls, convalescent patients, and patients with other febrile illnesses were removed. Longitudinal samples were excluded for gene set discovery and model training but included for temporal gene expression analysis (included in “Total samples used”). WB, whole blood; PBMC, peripheral blood mononuclear cells
| Dataset | Platform | Year | Reference | Country | Age | Tissue | Samples used in discovery | Total samples used |
|---|---|---|---|---|---|---|---|---|
| GSE40628 | GPL16021 (Lymphochip) | 2007 | Simmons CP [ | Vietnam | Adults | WB | 14 | 14 |
| GSE18090 | GPL570 (Affymetrix) | 2009 | Nascimento EJ [ | Brazil | Adults | PBMC | 18 | 18 |
| GSE13052 | GPL2700 (Illumina) | 2009 | Long HT [ | Vietnam | Children | WB | 18 | 18 |
| GSE25001 | GPL6104 (Illumina) | 2010 | Hoang LT [ | Vietnam | Children/adults | WB | 96 | 168 |
| GSE17924 | GPL4133 (Agilent) | 2010 | Devignot S [ | Cambodia | Children | WB | 48 | 48 |
| GSE38246 | GPL15615 (Illumina) | 2012 | Popper SJ [ | Nicaragua | Children | PBMC | 41 | 102 |
| GSE43777 | GPL201 (Affymetrix) | 2013 | Sun P [ | Venezuela | Children/adults | PBMC | 26 | 112 |
| GSE43777 | GPL570 (Affymetrix) | 2013 | Sun P [ | Venezuela | Children/adults | PBMC | 20 | 74 |
| GSE51808 | GPL13158 (Affymetrix) | 2014 | Kwissa M [ | Thailand | Adults | WB | 28 | 28 |
| GSE94892 | GPL16791 (Illumina) | 2017 | Banerjee A [ | India | Children/adults | PBMC | 31 | 31 |
| GSE100299 | GPL17586 (Affymetrix) | 2017 | Simon-Lorière E [ | Cambodia | Children | PBMC | 25 | 25 |
|
| 365 | 638 | ||||||
Summary of demographic information and clinical parameters of the independent prospective Colombia cohort. For days from sample to severe dengue (SD) onset, “0” indicates patients whose sample was collected on the day of (at least several hours prior to) the appearance of SD manifestations. WS, warning signs; NS1 Ag, nonstructural protein 1 antigen; DENV, dengue virus
| Dengue ( | Dengue with WS ( | Severe dengue ( | ||
|---|---|---|---|---|
|
| Adult | 39 | 86 | 13 |
| Child (<17 years) | 54 | 176 | 9 | |
|
| Male | 49 | 137 | 6 |
| Female | 44 | 125 | 16 | |
|
| 93 | 262 | 22 | |
|
| Mean (range) | 5.0 (1–10) | 5.2 (0–10) | 4.8 (0–7) |
|
| Median (range) | - | - | − 1 (− 3, 0) |
|
| Positive NS1 Ag | 60 | 215 | 17 |
| Positive DENV IgM | 53 | 149 | 9 | |
|
| Primary | 29 | 59 | 5 |
| Secondary | 59 | 186 | 15 | |
| Undetermined | 5 | 17 | 2 | |
|
| DENV-1 | 40 | 135 | 10 |
| DENV-2 | - | 3 | - | |
| DENV-3 | 1 | 3 | 3 | |
| DENV-4 | 2 | 3 | 1 | |
| DENV co-infected | - | 1 | - | |
| Unknown | 50 | 117 | 8 |
Fig. 1Multi-cohort analysis identifies eight genes robustly associated with progression to SD. A Schematic of multi-cohort analysis method with Monte Carlo sampling at the dataset level. In each of 100 cross-validation (CV) iterations, we randomly selected seven datasets for “training” (gray), identified differentially expressed genes (DEGs) using MetaIntegrator, and examined them in the remaining four “validation” (blue) datasets. DEGs that passed significance thresholds (as denoted by asterisks) in both training and validation were considered significant for that iteration. We then did a greedy forward search on DEGs significant in greater than 50% of all iterations and identified the eight most predictive DEGs. B Representative plots of the distribution of effect size (log2) in training (gray) and validation (blue) across the 100 iterations for over-expressed (LTF) and under-expressed (TGFBR3) genes that passed significance thresholds in >50% of iterations. Regardless of the combination of datasets in training or validation, the distribution of effect sizes for all 25 genes did not contain 0. C Forest plot of the effect size of the eight genes in each discovery dataset. Two genes (RASSF5 and GDPD5) were not measured in every dataset. The black lines indicate the 95% confidence interval (CI) of the effect size for a given gene in a given dataset, and the size of the black box is proportional to the sample size of each dataset. The summary effect size of each gene across all datasets is indicated by the red diamond; the width of the diamond indicates the 95% CI. D Standardized expression of each of the eight genes over the disease course (days post-symptom onset) in patients who remained non-severe (blue) or progressed to SD (purple). Seven discovery datasets that reported day of sample collection were included in longitudinal analysis. Lines represent the local regression (LOESS) curve fit for non-severe patients and SD progressors. Gray bands represent the 95% CI
Fig. 2The 8-gene XGBoost-based model predicts progression to SD in public datasets. A Relative contribution of each of the eight genes to the XGBoost model. B Violin plot of predicted probabilities of progression to SD for samples across all public datasets. The dotted horizontal line indicates the Youden optimal threshold for the public datasets. C ROC curves of the 8-gene model predictions for distinguishing non-severe patients from SD progressors in datasets profiling children (red), adults (blue), or both children and adults (orange). The DeLong test p-value for children vs. adults is 0.205
Fig. 3The locked 8-gene XGBoost model predicts progression to SD in an independent prospective dengue cohort. A Description of independent Colombia cohort. Blood samples were collected upon presentation from dengue patients presenting with or without warning signs. B Confusion matrix depicting the number of patients with an initial diagnosis of D or DWS upon presentation and final diagnosis of D, DWS, or SD. C ROC curve of the locked 8-gene XGBoost model in predicting progression to SD in the independent cohort. The black point indicates the sensitivity and specificity of the 8-gene model at the Youden threshold in the independent cohort. The red point indicates the sensitivity and specificity of clinical warning signs in predicting progression to SD in the independent cohort. D 8-gene model predictions on samples collected throughout the disease course, on days 0–3, 4–6, or 7–10 post-fever onset. E Violin plot of the predicted probabilities of progression to SD for SD progressors in the independent cohort who initially presented with or without warning signs. F Predicted probabilities using the 8-gene model for the 22 patients in the independent Colombia cohort who progressed to SD, by days from sample collection to the appearance of severe manifestations (“Days to SD Onset”). “0” indicates patients whose sample was collected on the day of—but at least several hours prior to—the appearance of SD manifestations. The dotted horizontal line indicates the Youden threshold in the Colombia cohort
Performance of the 8-gene XGBoost model and clinical warning signs in the independent cohort. 95% confidence intervals (CIs) from bootstrapping are shown in parentheses for each metric. For the NNP of warning signs, the lower CI bound is omitted as the 95% CI contained negative values due to the sum of PPV and NPV being less than 1 (indicating no gain in certainty according to the Predictive Summary Index). LR+, positive likelihood ratio; LR−, negative likelihood ratio; PPV, positive predictive value; NPV, negative predictive value; NNP, number needed to predict
| Sensitivity % | Specificity % | LR+ | LR− | PPV % | NPV % | NNP | |
|---|---|---|---|---|---|---|---|
|
| 86.4 (68.2–100.0) | 79.7 (75.5–83.9) | 4.3 (3.2–5.5) | 0.2 (0.01–0.4) | 20.9 (16.7–25.6) | 99.0 (97.7–100.0) | 5.0 (4.0–6.8) |
|
| 77.3 (58.3–94.1) | 39.7 (34.7–44.9) | 1.3 (0.9–1.6) | 0.6 (0.2–1.1) | 7.4 (4.3–10.9) | 96.6 (93.3–99.3) | 25.4 (NA–185.6) |