| Literature DB >> 33024578 |
A J Venkatakrishnan1, Nikhil Kayal1, Praveen Anand2, Andrew D Badley3, George M Church4, Venky Soundararajan1.
Abstract
The hand of molecular mimicry in shaping SARS-CoV-2 evolution and immune evasion remains to be deciphered. Here, we report 33 distinct 8-mer/9-mer peptides that are identical between SARS-CoV-2 and the human reference proteome. We benchmark this observation against other viral-human 8-mer/9-mer peptide identity, which suggests generally similar extents of molecular mimicry for SARS-CoV-2 and many other human viruses. Interestingly, 20 novel human peptides mimicked by SARS-CoV-2 have not been observed in any previous coronavirus strains (HCoV, SARS-CoV, and MERS). Furthermore, four of the human 8-mer/9-mer peptides mimicked by SARS-CoV-2 map onto HLA-B*40:01, HLA-B*40:02, and HLA-B*35:01 binding peptides from human PAM, ANXA7, PGD, and ALOX5AP proteins. This mimicry of multiple human proteins by SARS-CoV-2 is made salient by single-cell RNA-seq (scRNA-seq) analysis that shows the targeted genes significantly expressed in human lungs and arteries; tissues implicated in COVID-19 pathogenesis. Finally, HLA-A*03 restricted 8-mer peptides are found to be shared broadly by human and coronaviridae helicases in functional hotspots, with potential implications for nucleic acid unwinding upon initial infection. This study presents the first scan of human peptide mimicry by SARS-CoV-2, and via its benchmarking against human-viral mimicry more broadly, presents a computational framework for follow-up studies to assay how evolutionary tinkering may relate to zoonosis and herd immunity.Entities:
Keywords: Environmental microbiology; Proteomics
Year: 2020 PMID: 33024578 PMCID: PMC7529588 DOI: 10.1038/s41420-020-00321-y
Source DB: PubMed Journal: Cell Death Discov ISSN: 2058-7716
Fig. 1Molecular mimicry and immunomodulatory potential.
a n-mer peptide generation. b Mimicked peptides between SARS-CoV-2 and human proteomes. c Comparison of human–protein mimicking SARS-CoV-2 peptides with peptides from other human coronaviruses. d Immunomodulatory potential of mimicked peptides from SARS-CoV-2.
SARS-CoV-2 peptides mimicking human proteins, with experimental evidence of positive MHC binding from the immune epitope database.
| Viral peptide (coronavirus) | Viral protein (NCBI) | Human epitope | Human protein | MHC restriction (positive response) | Epitope ID (IEDB) | Pubmed ID (PMID) |
|---|---|---|---|---|---|---|
PGSGVPVV (SARS-CoV-2) | ORF1ab polyprotein (RNA-dependent RNA polymerase) (YP_009725307.1 :227-234) | KEPGSGVPVVL | PAM (P19021: 860–867) | HLA-B*40:02 D or E acid at peptide position 2 (P2) and M, F, or aliphatic residues at the C terminus (PMID:24366607) | 609309 | 27920218 |
ESGLKTIL (SARS-CoV-2) | ORF1ab polyprotein (NSP2) (YP_009725298.1 :210–217) | VESGLKTIL | ANXA7 (P20073: 342–350) | HLA-B*40:01 HLA-B*40:02 D or E acid at peptide position 2 (P2) and M, F, or aliphatic residues at the C terminus. (PMID:24366607) | 579215 | 31844290 31530632 27841757 |
VTLIGEAV (SARS-CoV-2) | ORF1ab polyprotein (EndoRNAse) (YP_009725310.1 :165–172) | VPVTLIGEAVF | PGD (P52209: 278–285) | HLA-B*35:01 P at position 2 (p2) and Y at the last position (pΩ) (and to a lesser extend F, M, L, or I) (PMID:26758079) | 638710 | 31844290 29615400 28228285 |
SLKELLQN (SARS-CoV-2) | ORF1ab polyprotein (3C-like Proteinase) (YP_009725301.1 :267–274) | QSLKELLQNW | CENPI (Q92674: 496–503) | HLA-B*57:01 HLA-B*58:01 HLA-B*57:03 [A,T,S] at P2; [L,F,W] at P9 (PMID: 30410026) | 600524 | 31844290 30315122 29437277 30410026 |
PEANMDQE (SARS-CoV-2 SARS-CoV) | ORF1ab polyprotein (NSP10) (YP_009742617.1) | PEANMDQESF (antigen source: SARS) | ALOX5AP (Splicing Variant) (ENSP00000479870.1: 53-60) | HLA-B*40:01 D or E acid at peptide position 2 (P2) and M, F, or aliphatic residues at the C terminus. (PMID:24366607 | 47238 | 1000425 (RefID) |
YNYEPLTQ (SARS-CoV-2; SARS-CoV) | ORF1ab polyprotein (3C-like proteinase) (YP_009725301.1 :237–244) | RVYNYEPLTQLK | MCM8 (inferred) (Q9UJA3: 199–206) | HLA-A*03:01 common hydrophobic amino acids at P2 and K or R anchor residues at the C-terminus (PMID:7504010,) | 624802 | 31844290 30315122 28228285 26992070 |
NVAITRAK (SARS-CoV-2; SARS-CoV; Seasonal HCoV) | ORF1ab polyprotein (Helicase) (YP_009725308.1 :561-568) | RFNVAITRAK (antigen source: SARS) | DNA2 (inferred) (P51530: 1000–1007) | HLA-A*03:01 common hydrophobic amino acids at P2 and K or R anchor residues at the C-terminus (PMID:7504010) HLA-A*11:01 (P2-Thr;P9-Lys - PMID:31723204) HLA-A*68:01 V, I, T, L, Y or F at P2 and K at P9 (PMID: 10449296) HLA-A*31:01 R at P9 (PMID: 31618895) | 53748 | 1000425 (RefID) |
RFNVAITR (SARS-CoV-2; SARS-CoV; MERS; Seasonal HCoV) | ORF1ab polyprotein (Helicase) (YP_009725308.1: 559–566) | RFNVAITRAK (antigen source: SARS) | MOV10L1 (inferred) (Q9BXT6: 1130–1137) | HLA-A*03:01 HLA-A*11:01 (P2-Thr;P9-Lys - PMID:31723204) HLA-A*68:01 V, I, T, L, Y or F at P2 and K at P9 (PMID: 10449296) HLA-A*31:01 R at P9 (PMID: 31618895) | 53748 | 1000425 (RefID) |
QGPPGTGK (SARS-CoV-2; SARS-CoV; MERS; Seasonal HCoV) | ORF1ab polyprotein (Helicase) (YP_009725308.1: 280–287) | LQGPPGTGK (antigen source: SARS) | ZNFX1 (inferred) (Q9P2E3: 617–624) | HLA-A*11:01 (P2-Thr;P9-Lys - PMID:31723204) HLA-A*03:01 common hydrophobic amino acids at P2 and K or R anchor residues at the C-terminus (PMID:7504010) HLA-A*31:01 R at P9 (PMID: 31618895) | 38844 | 1000425 (RefID) |
The viral-human mimicked 8-mer/9-mer peptides are highlighted in green text.
Amino acid sequence conservation of the SARS-CoV-2 peptides mimicking human proteins.
| SARS-CoV-2 mimicked epitopes | SARS-CoV2 (GISAID) | SARS | MERS | HCoV-229E | HCoV-NL63 | HCoV-OC43 | HCoV-HKU1 |
|---|---|---|---|---|---|---|---|
| PGSGVPVV | 46079/46513 [ORF1ab/NS12; 99.06%] | 0/659 | 0/572 | 0/293 | 0/478 | 0/319 | 0/236 |
| ESGLKTIL | 44750/46513 [ORF1ab/NS2; 96.21%] | 0/659 | 0/572 | 0/293 | 0/478 | 0/319 | 0/236 |
| VTLIGEAV | 43710/46513 [ORF1ab/NS15; 93.97] | 0/659 | 0/572 | 0/293 | 0/478 | 0/319 | 0/236 |
| SLKELLQN | 45888/46513 [ORF1ab/NS5; 98.66%] | 0/659 | 0/572 | 0/293 | 0/478 | 0/319 | 0/236 |
| YNYEPLTQ | 45927/46513 [ORF1ab/NS5; 98.74%] | 0/659 | 0/572 | 0/293 | 0/478 | 0/319 | 0/236 |
| NVAITRAK | 45834/46513 [ORF1ab/NS13; 98.54%] | 196/659 [nsp13-pp1ab; 29.74%] | 0/572 | 16/293 [ORF1ab|NSP13; 5.46%] | 37/478 [ORF1ab|NSP13; 7.74%] | 66/319 [NTPase/HEL; 20.68%] | 39/236 [NSP13; 16.52%] |
| RFNVAITR | 45842/46513 [ORF1ab/NS13; 98.55%] | 196/659 [nsp13-pp1ab; 29.74%] | 329/572 [nsp13-pp1ab; 57.51%] | 16/293 [ORF1ab|NSP13; 5.46%] | 28/478 [ORF1ab;NSP13; 5.85%] | 69/319 [NTPase/HEL; 21.63%] | 39/236 [NSP13; 16.52%] |
| QGPPGTGK | 46150/46513 [ORF1ab/NS13; 99.22%] | 177/659 [nsp13-pp1ab; 26.85%] | 335/572 [nsp13-pp1ab; 58.56%] | 0/293 | 0/478 | 69/319 [NTPase/HEL; 21.63] | 39/236 [NSP13; 16.52%] |
The PGSGVPVV peptide from the NSP12 protein is present in 46,079 out of 46,513 SARS-CoV-2 sequences (99.1% conserved; mimics human PAM protein), the ESGLKTIL peptide from the NSP2 protein is present in 44750 out of 46,513 SARS-CoV-2 sequences (96.2% conserved; mimics human ANXA7), the VTLIGEAV peptide from the endoRNAase protein is present in 43,710 of 46,513 SARS-CoV-2 sequences (94% conserved; mimics human PGD); and the SLKELLQN peptide from the 3C-like proteinase is present in 45,888 of 46,513 SARS-CoV-2 sequences (98.7% conserved; mimics human CENPI). Furthermore, the PGSGVPVV (NSP12 peptide mimicking PAM), ESGLKTIL (NSP2 peptide mimicking ANXA7), VTLIGEAV (endoRNAase peptide mimicking PGD), and SLKELLQN (3C-like proteinase mimicking CENPI) were not found in any of the proteins from seasonal coronavirus strains downloaded from ViPRdb as on 06/15/2020—HCoV-229E (756 protein sequences), HCoV-HKU1 (1310 protein sequences), HCoV-NL63 (1462 protein sequences), and HCoV-OC43 (1921 protein sequences). The YNYEPLTQ peptide from the 3C-like proteinase is present in 45,927 out of 46,513 SARS-CoV-2 sequences (98.7% conserved; mimics human helicase MCM8 protein), the NVAITRAK peptide from the viral helicase is present in 45,834 out of 46,513 SARS-CoV-2 sequences (98.5% conserved; mimics human helicase DNA2), the RFNVAITR peptide from the viral helicase is present in 45,842 of 46,513 SARS-CoV-2 sequences (98.6% conserved; mimics human helicase MOV10L1); and the QGPPGTGK peptide from the viral helicase is present in 46,150 of 46,513 SARS-CoV-2 sequences (99.2% conserved; mimics human ZNFX1). Moreover, NVAITRAK, RFNVAITR, and QGPPGTGK peptides were found in 158/319 (49.5%), 161/319 (50.5%), and 69/319 (21.6%) strains of HCoV-OC43 in the NSP10 (NTPase/HEL) protein. QGPPGTGK peptide was also found in 39/236 seasonal HCoV-HKU1 strains in the NSP13 protein. YNYEPLTQ peptide was not found in any of the seasonal human coronavirus strains.
Fig. 2Multi-omics analysis of human PAM.
a (Left) Universal bulk RNA-seq analysis of all available human data shows pancreatic islets, heart, artery, aorta, and embryonic stem cells harbor PAM significantly. (Right) Single-cell RNA-seq (scRNA-seq) confirms high PAM-expressing cells, include multiple pancreatic cells, cardiomyocytes, goblet cells of the lung, bronchus and intestines, stromal cells of the digestive system, and fibroblasts of multiple organs including the lung, trachea, bronchus, intestines, and heart. b Analysis of tissue-specific expression pattern of PAM from bulk RNA-seq (GTEx) and triangulation with IHC antibody staining data (HPA) suggests artery, aorta, and myocytes of the heart muscle as significant PAM-expressing tissues. c Severe COVID-19 patient’s lung bronchoalveolar lavage fluid shows high PAM expression in club cells, which also express the SARS-CoV-2 receptor ACE2 (nferX scRNAseq app—lung broncheoalveolar lavage fluid).
Fig. 3Evidence of ANXA7 expression from human single-cell RNA-sequencing data.
a List of high ANXA7-expressing cells across human tissues. High ANXA7-expressing cells in the lungs are shown on the right. These include macrophages, proliferating cells, mast cells, stromal cells, type-2 pneumocytes and endothelial cells. b Lung bronchoalveolar lavage fluid scRNA-seq shows multiple high ANXA7-expressing cells, including macrophages, lung epithelial cells, T-cells, club cells, proliferating cells, and plasma cells. In the lungs, activated dendritic cells and lymphatic vessel cells are also seen to express ANXA7. a, b The size and transparency of the bubbles are proportional to the strength of literature-based associations (https://academia.nferx.com/).
Fig. 4Evidence of PGD expression from human single-cell RNA-sequencing data.
Single-cell RNA-seq analysis based expression of PGD in cell types of a lungs, b lung pleura, c airway epithelia and d artery. The size and transparency of the bubbles are proportional to the strength of literature-based associations (https://academia.nferx.com/).
Fig. 5Evidence for ALOX5AP from biomedical knowledge synthesis and single-cell RNA-seq.
a Knowledge synthesis suggests involvement of ALOX5AP in ischemic stroke, myocardial infarction, atherosclerosis, cerebral infarction, and coronary artery disease. b scRNA-seq shows significant expression of ALOX5AP in proliferating cells, macrophages, T-cells, and epithelial cells from the lungs and macrophages of the brain.
Fig. 6Expression of mimicked human helicases.
scRNAseq based expression analysis of human helicases containing peptides mimicked by helicases in SARS-CoV-2 and other human coronaviruses (https://academia.nferx.com/).