| Literature DB >> 35336960 |
Chad Pickering1, Bo Zhou1, Gege Xu1, Rachel Rice1, Prasanna Ramachandran1, Hector Huang1, Tho D Pham2, Jeffrey M Schapiro3, Xin Cong1, Saborni Chakraborty4, Karlie Edwards4, Srinivasa T Reddy5, Faheem Guirgis6, Taia T Wang4, Daniel Serie1, Klaus Lindpaintner1.
Abstract
Glycosylation is the most common form of post-translational modification of proteins, critically affecting their structure and function. Using liquid chromatography and mass spectrometry for high-resolution site-specific quantification of glycopeptides coupled with high-throughput artificial intelligence-powered data processing, we analyzed differential protein glycoisoform distributions of 597 abundant serum glycopeptides and nonglycosylated peptides in 50 individuals who had been seriously ill with COVID-19 and in 22 individuals who had recovered after an asymptomatic course of COVID-19. As additional comparison reference phenotypes, we included 12 individuals with a history of infection with a common cold coronavirus, 16 patients with bacterial sepsis, and 15 healthy subjects without history of coronavirus exposure. We found statistically significant differences, at FDR < 0.05, for normalized abundances of 374 of the 597 peptides and glycopeptides interrogated between symptomatic and asymptomatic COVID-19 patients. Similar statistically significant differences were seen when comparing symptomatic COVID-19 patients to healthy controls (350 differentially abundant peptides and glycopeptides) and common cold coronavirus seropositive subjects (353 differentially abundant peptides and glycopeptides). Among healthy controls and sepsis patients, 326 peptides and glycopeptides were found to be differentially abundant, of which 277 overlapped with biomarkers that showed differential expression between symptomatic COVID-19 cases and healthy controls. Among symptomatic COVID-19 cases and sepsis patients, 101 glycopeptide and peptide biomarkers were found to be statistically significantly abundant. Using both supervised and unsupervised machine learning techniques, we found specific glycoprotein profiles to be strongly predictive of symptomatic COVID-19 infection. LASSO-regularized multivariable logistic regression and K-means clustering yielded accuracies of 100% in an independent test set and of 96% overall, respectively. Our findings are consistent with the interpretation that a majority of glycoprotein modifications observed which are shared among symptomatic COVID-19 and sepsis patients likely represent a generic consequence of a severe systemic immune and inflammatory state. However, there are glycoisoform changes that are specific and particular to severe COVID-19 infection. These may be representative of either COVID-19-specific consequences or susceptibility to or predisposition for a severe course of the disease. Our findings support the potential value of glycoproteomic biomarkers in the biomedical understanding and, potentially, the clinical management of serious acute infectious conditions.Entities:
Keywords: COVID-19; SARS-CoV-2; biomarkers; glycoproteomics; glycosylation
Mesh:
Substances:
Year: 2022 PMID: 35336960 PMCID: PMC8951729 DOI: 10.3390/v14030553
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Cohort summary, to the extent annotations were available.
| Phenotype | Source | Serum | Plasma | Train Set | Test Set | Male | Female | Med. Age (IQR) | |
|---|---|---|---|---|---|---|---|---|---|
| Symptomatic COVID-19+ | Kaiser Permanente | 50 | 39 | 11 | 38 | 12 | 25 | 15 ** | 55.5 (50.5, 67.3) |
| Bacterial sepsis | U of Florida, Jacksonville | 16 | 0 | 16 | 12 | 4 | 11 | 5 | 60.5 (57, 73.3) |
| Common cold coronavirus | Stanford Blood Bank | 12 | 0 | 12 | 9 | 3 | n/a | n/a | n/a |
| Asymptomatic COVID-19+ | Stanford Blood Bank | 22 | 22 | 0 | 16 | 6 | 10 | 12 | 49 (40.3, 61) |
| Healthy control | Stanford Blood Bank | 15 | 15 | 0 | 11 | 4 | n/a | n/a | n/a |
* Excluding 5 outliers; ** 10 symptomatic COVID-19 patients do not have reported sex.
Figure 1Visualization of top two principal components in PCA of all 115 subjects included in the analysis (subjects are colored by phenotype).
Figure 2Venn diagram indicating number of differentially expressed biomarkers at FDR < 0.05 between healthy controls and given phenotype group(s).
Figure 3Volcano plots showing log-transformed multiplicative fold changes and respective log-transformed false discovery rates (FDRs) for each biomarker in differential expression analysis with healthy controls used as the reference for each phenotype. Biomarkers in red represent those that are statistically significantly differentially expressed at FDR < 0.05. Biomarkers marked with a circle are glycopeptides, while an X represents nonglycosylated peptides.
Figure 4Heatmap in which all 115 patients and 597 biomarkers are represented, clustered into phenotype groups column-wise and hierarchically clustered row-wise. Row-wise Z-scores determine the color of each cell.
Figure 5Twenty-two biomarkers that achieve FDR < 0.01 in differential expression analysis between healthy controls and all four of the other phenotype groups separately. Seventy-seven biomarkers achieve FDR < 0.05; a more conservative threshold was chosen for clarity of the heatmap. Subjects are clustered into phenotype groups column-wise and hierarchically clustered row-wise. Row-wise Z-scores determine the color of each cell.
Figure 6Thirty-eight biomarkers that achieve FDR < 0.01 in differential expression analysis between bacterial sepsis and symptomatic COVID-19 patients. One hundred one biomarkers achieve FDR < 0.05; a more conservative threshold was chosen for clarity of the heatmap. Subjects are clustered into phenotype groups column-wise and hierarchically clustered row-wise. Row-wise Z-scores determine the color of each cell. CC: Common cold coronavirus.
Figure 7Thirty-four biomarkers that achieve FDR < 0.05 in differential expression analysis between symptomatic COVID-19 and all four of the other phenotype groups separately. Subjects are clustered into phenotype groups column-wise and hierarchically clustered row-wise. Row-wise Z-scores determine the color of each cell. CC: Common cold coronavirus.
Figure 8Fort--six biomarkers that achieve FDR < 0.05 in differential expression analysis between sepsis and all four of the other phenotype groups separately. Subjects are clustered into phenotype groups column-wise and hierarchically clustered row-wise. Row-wise Z-scores determine the color of each cell. CC: Common cold coronavirus.
Figure 9Dot plot showing log-transformed fold changes for each nonglycosylated peptide, using healthy controls as the reference against the sepsis and symptomatic COVID-19 (by which this is sorted) phenotype groups, each indicated using its own symbol. Red symbols represent those that are statistically significant at FDR < 0.05.
Figure 10Results from K-means clustering, using only the 34 biomarkers that statistically significantly differentiate symptomatic COVID-19 patients from all of the other phenotype groups, visualized via principal component analysis.
Allocation to predicted clusters based on K-means clustering.
| Phenotype | Predicted Cluster | ||
|---|---|---|---|
|
|
|
| |
| Symptomatic COVID-19 | 47 | 0 | 3 |
| Bacterial sepsis | 2 | 0 | 14 |
| Other phenotype | 0 | 49 | 0 |
* True phenotype denotes clinically determined phenotype.
Figure 11Predicted probabilities of symptomatic COVID-19 generated from LASSO-regularized logistic regression model, stratified by true phenotype group, and colored by training or testing set assignment.
Figure 12Heatmap showing retained biomarkers in LASSO-regularized classifier for all patients in both training and testing sets. Subjects are clustered into phenotype groups column-wise and hierarchically clustered row-wise. Row-wise Z-scores determine the color of each cell.
Figure 13Statistical significance levels of differential activation of canonical pathways among healthy vs. symptomatic COVID-19 patients.
Top 12 upstream regulators (acute-phase response signaling).
| Upstream Regulator | Molecule Type | Target Molecules in Dataset | |
|---|---|---|---|
| HNF1A | transcription regulator | 1.05 × 10−14 | AGT, AHSG, APOH, ATP, C1S, C4BPA, F2, HPX, ITIH4, SERPINA1, SERPING1, TTR |
| IL6 | cytokine | 1.24 × 10−13 | A2M, AGT, APOA1, ATP, CP, FN1, HP, HPX, ORM1, SERPINA1, SERPINA3, TF, TTR |
| HNF4A | transcription regulator | 1.1 × 10−10 | AGT, AHSG, APOA1, APOH, ATP, C1S, CP, HPX, ITIH4, ORM1, ORM2, SERPINA1, SERPINA3, TF, TTR |
| Tcf 1/3/4 | group | 1.66 × 10−8 | AHSG, APOH, TTR |
| Hmgn3 | other | 3.26 × 10−8 | AHSG, APOA1, SERPINA1, TTR |
| OSM | cytokine | 4.43 × 10−8 | A2M, C1S, C4BPA, FN1, HP, SERPINA1, SERPINA3, SERPING1 |
| CEBPB | transcription regulator | 8.56 × 10−8 | AGT, CP, FN1, HP, HPX, ORM1, SERPINA1, TF |
| STAT1 | transcription regulator | 1.21 × 10−7 | A2M, AGT, APOA1, C1S, FN1, SERPINA3, SERPING1 |
| STAT3 | transcription regulator | 1.66 × 10−7 | A2M, AGT, AHSG, ATP, FN1, HP, SERPINA1, SERPINA3 |
| IL6ST | transmembrane receptor | 6.29 × 10−7 | A2M, HP, HPX, ORM1 |
| TNF | cytokine | 1.13 × 10−6 | A2M, AGT, APOA1, ATP, CP, FN1, HP, ORM1, SERPINA3, SERPIND1, TF |
| IL6R | transmembrane receptor | 1.58 × 10−6 | A2M, FN1, HP, SERPINA3 |
Figure 14Network of 12 most statistically significantly altered upstream regulators (acute-phase response signaling).
Figure 15Statistical significance levels of differential activation of canonical pathways among asymptomatic vs. symptomatic COVID-19 patients.
Figure 16Acute-phase proteins identified by this study and the study by Shen et al. Adapted from [21].
Figure 17Statistical significance levels of differential activation of canonical pathways among nonsevere vs. severe COVID-19 cases.