| Literature DB >> 32743614 |
Katherine A Overmyer1,2, Evgenia Shishkova1,3, Ian J Miller1,3, Joseph Balnis4,5, Matthew N Bernstein2, Trenton M Peters-Clarke1,6, Jesse G Meyer1,3, Qiuwen Quan1,3, Laura K Muehlbauer1,6, Edna A Trujillo1,6, Yuchen He1,3, Amit Chopra4, Hau C Chieng4, Anupama Tiwari4,7, Marc A Judson4, Brett Paulson1,3, Dain R Brademan1,6, Yunyun Zhu1,3, Lia R Serrano1,6, Vanessa Linke1,6, Lisa A Drake4,5, Alejandro P Adam5,8, Bradford S Schwartz2, Harold A Singer5, Scott Swanson2, Deane F Mosher3, Ron Stewart2, Joshua J Coon1,2,3,6, Ariel Jaitovich4,5.
Abstract
We performed RNA-Seq and high-resolution mass spectrometry on 128 blood samples from COVID-19 positive and negative patients with diverse disease severities. Over 17,000 transcripts, proteins, metabolites, and lipids were quantified and associated with clinical outcomes in a curated relational database, uniquely enabling systems analysis and cross-ome correlations to molecules and patient prognoses. We mapped 219 molecular features with high significance to COVID-19 status and severity, many involved in complement activation, dysregulated lipid transport, and neutrophil activation. We identified sets of covarying molecules, e.g., protein gelsolin and metabolite citrate or plasmalogens and apolipoproteins, offering pathophysiological insights and therapeutic suggestions. The observed dysregulation of platelet function, blood coagulation, acute phase response, and endotheliopathy further illuminated the unique COVID-19 phenotype. We present a web-based tool (covid-omics.app) enabling interactive exploration of our compendium and illustrate its utility through a comparative analysis with published data and a machine learning approach for prediction of COVID-19 severity.Entities:
Year: 2020 PMID: 32743614 PMCID: PMC7388490 DOI: 10.1101/2020.07.17.20156513
Source DB: PubMed Journal: medRxiv
Demographics and baseline characteristics of COVID-19 and non-Covid 19 patients in ICU and non-ICU setting
| Variables | Total | non-ICU | ICU | Total | non-ICU | ICU |
|---|---|---|---|---|---|---|
| 3.37 (1–5) | 2.78 (1–3) | 3.96 (1–6) | 0.97 (1–1) | 0.9 (0.8–1) | 0.94(1–1) | |
| Male | 64 (62.7%) | 30 (58.8%) | 34 (66.7%) | 13(50%) | 4 (40%) | 9 (56%) |
| Female | 38 (37.3%) | 21 (41.2%) | 17(33.3%) | 13(50%) | 6 (60%) | 7 (44%) |
| Mean (IQR) | 61.3(50.0–74.3) | 59.7 (49.0–80.0) | 62.9 (55.0–73.0) | 63.8 (52.3–76.8) | 60.4 (47.3–74.0) | 66 (55.3–80.3) |
| White | 46 (45.1%) | 28 (54.9%) | 18 (35.3%) | 21 (80.8%) | 8 (80%) | 13(81.2%) |
| Black | 11 (10.8%) | 5 (9.8%) | 6(11.8%) | 4(15.4%) | 2 (20%) | 2 (12.5%) |
| Asian | 2 (1.9%) | 0 (0%) | 2 (3.9%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Hispanic | 21 (20.6%) | 7 (13.7%) | 14 (27.5%) | 1 (3.8%) | 0 (0%) | 1 (6.3%) |
| Other | 22 (21.6%) | 11 (21.6%) | 11 (21.6%) | 0 (0%) | 0 (0%) | 0 (0%) |
| 30.39 (25.30–32.24) | 29.84 (26.09–32.37) | 30.92 (24.50–32.05) | 30.36 (26.53–33.10) | 27.20 (23.68–30.38) | 32.34 (26.98–37.67) | |
| Charlson comorbidity index | 3.3 (1–5) | 3.16(1–5) | 3.49 (2–5) | 4.35 (2–6) | 3.3 (1–5) | 5(3–7) |
| APACHEII | N/A | N/A | 21.6(15–27) | N/A | N/A | 20.6 (12–26) |
| SOFA | N/A | N/A | 8.2 (6–11) | N/A | N/A | 8.6 (3–11) |
| SAPSII | N/A | N/A | 51.8(45–62) | N/A | N/A | 47.6 (33–65) |
| Ferritin (ng/mL) | 938.9 (301.8–1203.8) | 782.6 (206.0–934.5) | 1076.9 (378.0–1294.0) | 250.5 (80.5–382.5) | 205.3(58.0–411.0) | 285.7 (92.0–438.5) |
| C-Reactive protein (mg/L) | 140.9 (52.0–204.3) | 120.6 (44.7–155.0) | 158.9 (61.7–248.3) | 73.8 (20.0–110.2) | 34.7 (8.9–56.8) | 99.8(37.8–175.2) |
| D-dimer (mg/L FEU) | 11.7(1.0–12.8) | 2.3 (0.6–1.73) | 18.6 (1.7–21.6) | 5.3 (0.5–4.6) | 5.2 (0.4–1.9) | 5.5(0.6–10.2) |
| Procalcitonin (ng/mL) | 3.2 (0.2–1.8) | 1.7(0.2–1.0) | 4.4 (0.3–2.3) | 2.1 (0.2–0.7) | 2.2 (0.1–3.4) | 2.1 (0.3–1.21) |
| Lactate (mmol/L) | 1.2 (0.9–1.5) | 1.2 (0.9–1.4) | 1.3 (0.9–1.5) | 2.1 (0.9–2.5) | 1.2 (0.8–1.5) | 2.53 (0.9–3.4) |
| Fibrinogen (mg/dL) | 543.6 (413.0–667.0) | 559.3 (420.0–703.0) | 531.7 (391.5–663.0) | 362.3 (257.3–550.0) | 348.0(256.75–441.5) | 373 (257.3–572.0) |
| Albumin (mg/L) | 2.9 (2.6–3.3) | 3.2 (2.9–3.5) | 2.7 (2.4–2.9) | 3.4 (2.9–3.8) | 3.8 (3.4–4.1) | 3.19(2.6–3.8) |
| White blood cells (K/uL) | 10.8(6.1–12.5) | 7.1 (4.9–8.5) | 14.4(8.4–15.4) | 12.7(7.2–17.3) | 8.3 (6.7–9.7) | 15.4 (8.2–20.9) |
| Hemoglobin (g/dL) | 11.2 (9.7–12.6) | 11.6 (10.2–13.0) | 10.7(9.4–12.1) | 12.4 (9.9–14.7) | 12.8 (10.45–14.85) | 12.3(9.6–14.5) |
| Mean corpuscular volume (fL) | 87.1 (84.5–93.7) | 88.0 (85.6–94.2) | 86.2 (82.5–93.0) | 92.3 (88.6–95.4) | 91.2 (87.2–94.6) | 93.0 (89.4–97.8) |
| Platelet (K/uL) | 266.0(192.5–320.5) | 269.2 (209.0–334) | 262.8(187.0–317.0) | 203.5 (151.8–247.8) | 228.1 (163.5–278.0) | 188.2(127.5–229.5) |
| Neutrophils (%) | 76.2 (68.5–86.0) | 69.7(61.0–82.0) | 82.8 (80.0–90.0) | 77.7 (74.0–87.0) | 73.1 (58.8–82.5) | 80.5 (79.25–89.25) |
| Lymphocytes (%) | 13.8(5.0–18.5) | 19.4 (9.0–26.0) | 8.3(4.0–11.0) | 12.7 (6.0–18.0) | 16.9 (7.0–26.0) | 10.1 (4.3–10.8) |
| Monocytes (%) | 7.1 (4.0–9.0) | 8.8(6.0–11.0) | 5.5 (3.0–8.0) | 8.0 (4.0–9.3) | 7.7(4.0–10.3) | 8.2 (4.0–9.0) |
| Eosinophils (%) | 0.8(0.0–1.0) | 1.1 (0.0–1.0) | 0.5 (0.0–1.0) | 1.0(0.0–1.25) | 1.8 (0.0–3.3) | 0.44 (0.0–1.0) |
| PaO2/FiO2 Ratio | N/A | N/A | 161.6(98–211) | N/A | N/A | 149.4 (73–184) |
| Positive-end expiratory pressure (cmH2O) | N/A | N/A | 10.8 (10–12) | N/A | N/A | 6.6 (73–184) |
| Inspiratory Plateau (cmH2O) | N/A | N/A | 22.8 (19.7–25.3) | N/A | N/A | 23.9(19.8–28.8) |
| Renal Relacement Therapy ( | 12 (11.8%) | 3 (5.9%) | 9(17.6%) | 3(11.5%) | 0 (0%) | 3(18.8%) |
| Hydroxychloroquine | 87 (85.3%) | 43 (84.3%) | 44 (86.3%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Antibiotics | 98 (96.1%) | 47 (92.2%) | 51 (100%) | 16(61.5%) | 3 (30.0%) | 13(81.3%) |
| Antiviral | 1 (0.98%) | 0 (0%) | 1 (1.9%) | 0 (0%) | 0 (0%) | 0 (0%) |
| IL6- Antagoinist | 4 (3.9%) | 1 (1.9%) | 2 (3.9%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Convalescent Plasma | 26 (25.5%) | 8(15.7%) | 18 (35.3%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Steroid | 46(45.1%) | 12 (23.5%) | 34 (66.7%) | 4(15.4%) | 1 (10.0%) | 3 (18.8%) |
| Therapeutic Anticoagulation | 37 (36.3%) | 2 (3.9%) | 35 (68.6%) | 8 (30.8%) | 1 (10.0%) | 7 (43.8%) |
Figure 1.Overview of sample cohort and experimental design.
a Age and sex distributions of COVID-19 (n = 102) and non-COVID-19 (n = 26) groups. b Distributions of hospital-free days over a continuous 45-day period aggregated with survival (HFD-45, see outcomes selection in the methods section) among COVID-19 and non-COVID-19 groups. c The proportion (%) of female and male patients who were admitted to the intensive care unit (ICU) and required support of a mechanical ventilator. d Overview of the study design, experimental approaches, and primary outcomes. Notice that the leukocytes were separated by filtering (see methods for details).
Figure 2.Multi-omics analysis reveals strong molecular signatures associated with COVID-19 status and severity.
a Principal component analysis using quantitative values from all omics data (leukocyte transcripts, and plasma proteins, lipids, and small molecules) shows principal components 1 and 2 capture 16% and 10% of the variance between patient samples. Plotting samples by these two components show a linear tread with hospital free days at 45 days (HFD-45). b Associations of biomolecules with COVID-19 status was determined using differential expression analysis (EBseq) for transcripts, and linear regression log-likelihood tests for plasma biomolecules, the adjusted p-values (1 - posterior probability or Benjamini Hochberg-adjusted pvalues, respectively) are plotted relative to the log2 fold change of mean values between COVID and non-COVID samples. In total, 2,537 leukocyte transcripts, 146 plasma proteins, 168 plasma lipids, and 13 plasma metabolites had adjusted p-values < 0.05. c Associations between biomolecules and HFD-45 was estimated using a univariate linear regression (HFD-45 ~ biomolecule abundance + age + sex) resulting in 7,408 biomolecules. A multivariate linear regression with elastic net penalty was applied to each omics dataset separately to further refine features of interest and resulted in 946 features. In total 219 features were determined as most important for distinguishing COVID status and severity. d The 219 features abundances were visualized via a heat map and clustered with hierarchical clustering. Features that were elevated (e) or reduced (f) with COVID status and severity were used for GO-term and molecular class enrichment analysis.
Figure 3:Leveraging the value of multi-omic data through cross-ome correlation analysis.
a Hierarchical clustering of Kendall Tau coefficients calculated for correlations between abundances of proteins (rows) and small molecules (lipids and metabolites; columns) in the pairwise fashion. Significance of their association with HFD-45 and COVID-19 status is indicated above the biomolecule clusters. b Re-clustering of biomolecules found in the clusters highlighted in panel a with molecule annotations. c Enrichment analyses of protein GO terms (purple) and small molecule classes (green) present in the cluster in panel b. d A schematic of a high-density lipoprotein (HDL) particle containing APOA1 and APOA2 proteins surrounded by various lipids, specifically plasmalogens. SAA2, also detected in the cluster in panel b, can replaced APOA1 within the particle. e Relative abundance measurements of plasma gelsolin (pGSN), cellular gelsolin (cGSN), and total gelsolin obtained using parallel reaction monitoring (PRM) on representative peptide sequences. * and ** indicate p-values < 0.05 and 0.001, respectively. f Regression analysis of plasma gelsolin levels and SOFA scores (R2 = 0.267, p = 4.53 × 10−5).
Figure 4.Biological processes dysregulated in COVID-19.
a Volcano plots highlighting proteins (pink) and transcripts (purple) assigned with the GO term 0043312 “Neutrophil Degranulation.” Increased point size signified the inclusion of the biomolecule in the list of 219 features most significantly associated with COVID-19 status and severity (Figure 2e). b Linear regressions of protein abundance vs. HFD-45 for the indicated proteins as measured in COVID-19 (left) and non-COVID-19 patients (right). Resulting R2 values and their associated +/− slope indicate the goodness of fit and change in abundance of a given protein with severity (HFD-45). Proteins that are more decreased in severe cases appear blue, while proteins that are increased in severe cases appear red. Significance of the protein vs. HFD-45 correlation is denoted by a dot (p-value < 0.01). c Relative abundance measurements of peptides attributed to plasma fibronectin (pFN) and cellular fibronectin (cFN). d Relative abundance measurements of VWF multimer and VWF Antigen-2 (VWF Ag2), as estimated based on relative abundances of its unique peptides. Peptide- and protein-level data are log2-transformed and grouped into four categories, according to patient status: COVID-19 ICU (red), COVID-19 non-ICU (orange), non-COVID-19 ICU (blue), and non-COVID-19 non-ICU (green). * and ** indicate p-values < 0.05 and 0.001, respectively.
Figure 5.Overview of the COVID-19 Multi-omics Web Tool.
a The home page provides principal component analysis (PCA) scores and loadings plots. Selected biomolecules are presented in a barplot and a boxplot. Each page provides buttons to navigate to the other web tools. b The differential expression page displays a multi-omics volcano plot with the y-axis representing −log10(p-values) where the p-values derive from the analysis in Figure 2 c The linear regression page allows users to select any combination of biomolecule and clinical measurement to analyze via univariate linear regression. R2 and p-values for the F-statistic are displayed on the plot. d The Clustergrammer page offers an interactive clustered heatmap.
Figure 6.Results from analyses demonstrating use-cases of this multi-omic resource.
a The top-ten enriched gene sets ranked by their adjusted p-value. For the gene set “TFNA signalling via NFKB”, we show a heatmap (right) of the z-score normalized expression data (in units of log transcripts per million) partitioned by whether the data came from the COVID-19-ARDS patients (right) or the non-COVID-19-ARDS (i.e. sepsis ARDS) patients from Englert et al. (left). The first row of each heatmap depicts the hospital-free days of each patient. We note that hospital-free days are not available for the Englert et al. dataset. The gene names labelling each row are colored according to whether the gene was deemed by EBSeq to be more highly expressed in COVID-19 ARDS (orange) or non-COVID-19 ARDS (blue). b Similar to (a); however, we instead analyze DE genes that are more lowly expressed in COVID-19 ICU patients. c Data splitting scheme for training and test sets from the 100 COVID patients with all four omic datasets. A random 20% was held out to be used for model evaluation, and the remaining 80% was used to determine the best hyperparameters with 5-fold cross validation. d Extra trees classifier performance metrics on the test set after hyperparameter optimization using each of the four omic datasets separately for training or all omic data combined. e Macro-averaged receiver-operator characteristic curves for the models trained with multi-omic data, Charlson score, or both multi-omic data and charlson score. f Test set predictions of the extra trees model trained on the combined multi-omic dataset showing correct predictions as a function of the disease severity defined by hospital free days. G Top 5 most important predictive features for each of the models trained on the four omic subsets. Feature importance for each set was normalized to the most important feature.