| Literature DB >> 32183941 |
Alba Grifoni1, John Sidney1, Yun Zhang2, Richard H Scheuermann3, Bjoern Peters4, Alessandro Sette5.
Abstract
Effective countermeasures against the recent emergence and rapid expansion of the 2019 novel coronavirus (SARS-CoV-2) require the development of data and tools to understand and monitor its spread and immune responses to it. However, little information is available about the targets of immune responses to SARS-CoV-2. We used the Immune Epitope Database and Analysis Resource (IEDB) to catalog available data related to other coronaviruses. This includes SARS-CoV, which has high sequence similarity to SARS-CoV-2 and is the best-characterized coronavirus in terms of epitope responses. We identified multiple specific regions in SARS-CoV-2 that have high homology to the SARS-CoV virus. Parallel bioinformatic predictions identified a priori potential B and T cell epitopes for SARS-CoV-2. The independent identification of the same regions using two approaches reflects the high probability that these regions are promising targets for immune recognition of SARS-CoV-2. These predictions can facilitate effective vaccine design against this virus of high priority.Entities:
Keywords: B cell epitope; COVID-19; SARS-CoV; SARS-CoV-2; T cell epitope; coronavirus; infectious disease; sequence conservation
Mesh:
Substances:
Year: 2020 PMID: 32183941 PMCID: PMC7142693 DOI: 10.1016/j.chom.2020.03.002
Source DB: PubMed Journal: Cell Host Microbe ISSN: 1931-3128 Impact factor: 21.023
Figure 1Comparison of SARS-CoV-2 (Wuhan-Hu-1) Genome Structure with Its Closest Bat Relative (bat-SL-CoVZXC21), Tor2 SARS-CoV, and HCoV-EMC MERS-CoV
Above: Coding sequence (CDS) regions corresponding to homologous proteins between the four viruses are filled with the same color in the genome schematic to indicate homology; regions with no homology to the predicted SARS-CoV-2 proteins are colored white. Below: Table of pairwise protein similarities (expressed as % identity) between SARS-CoV-2 and the other three viruses.
IEDB Inventory of Coronavirus B and T Cell Epitopes
| Epitope set | Type | Coronavirus | Total | ||||
|---|---|---|---|---|---|---|---|
| Alpha | Beta | Gamma | |||||
| SARS-CoV | MERS-CoV | Other | |||||
| B cell | Conformational | 18 | 27 | 23 | 2 | 11 | 81 |
| Linear | 81 | 405 | 5 | 60 | 30 | 581 | |
| T cell | 61 | 164 | 25 | 54 | 16 | 320 | |
IEDB Inventory of Coronavirus B and T Cell Epitopes
| Epitope set | Host | Coronavirus | Total | ||||
|---|---|---|---|---|---|---|---|
| Alpha | Beta | Gamma | |||||
| SARS-CoV | MERS-CoV | Other | |||||
| B cell | Humans | 0 | 306 | 16 | 0 | 0 | 322 |
| Mice | 62 | 154 | 9 | 58 | 20 | 303 | |
| Other | 42 | 142 | 5 | 6 | 23 | 218 | |
| Tg mice | 0 | 0 | 0 | 0 | 0 | 0 | |
| T cell | Humans | 2 | 92 | 0 | 1 | 0 | 95 |
| Mice | 16 | 99 | 25 | 53 | 1 | 194 | |
| Other | 46 | 1 | 0 | 0 | 15 | 62 | |
| Tg mice | 0 | 29 | 0 | 0 | 0 | 29 | |
B cell includes both conformational and linear epitipes.
Totals between Tables 1 and 2 may not be equal as several epitopes are recognized in multiple species.
IEDB Inventory of Coronavirus B and T Cell Epitopes
| SARS-CoV Proteins | B Cell | T Cell |
|---|---|---|
| Spike glycoprotein | 279 | 48 |
| Nucleoprotein | 113 | 33 |
| Membrane protein | 20 | 4 |
| Replicase polyprotein 1ab | 8 | 9 |
| Protein 3a | 2 | 7 |
| Envelope small membrane protein | 2 | 0 |
| Non-structural protein 3b | 2 | 0 |
| Protein 7a | 2 | 0 |
| Protein 9b | 2 | 0 |
| Non-structural protein 6 | 1 | 0 |
| Protein non-structural 8a | 1 | 0 |
T cell epitope total includes epitopes recognized in humans and/or transgenic mice.
Figure 2B Cell Immunodominant Regions Based on SARS-Specific Epitope Mapping
RF score for each amino acid position was calculated (see STAR Methods) and plotted over the SARS-CoV consensus sequence of spike glycoprotein (A), membrane protein (B), and nucleoprotein (C).
Dominant SARS-CoV B Cell Epitope Regions
| SARS-CoV | SARS-CoV-2 | ||||
|---|---|---|---|---|---|
| Sequence | Max RF | Sequence | Proteina | Mapped Start–End | Identity (%) |
| DAVDCSQNPLAELKCSVKSFEIDKGIYQTSNF | 0.504 | DAVDCALDPLSETKCTLKSFTVEKGIYQTSN | S | 287–317 | 69 |
| VCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVIT | 0.745 | VCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVI | S | 524–598 | 80 |
| GTNASSEVAVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNN | 0.709 | GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGS | S | 601–640 | 78 |
| FSQILPDPLKPTKRSFIED | 0.365 | FSQILPDPSKPSKRSFIE | S | 802–819 | 89 |
| FGAGAALQIPFAMQMAYRFNGIG | 0.367 | FGAGAALQIPFAMQMAYRFNGI | S | 888–909 | 100 |
| MADNGTITVEELKQLLEQWNLVIG | 0.460 | MADSNGTITVEELKKLLEQWNLVI | M | 1–24 | 92 |
| PLMESELVIGAVIIRGHLRMA | 0.457 | PLLESELVIGAVILRGHLRI | M | 132–151 | 90 |
| PQGLPNNTASWFTALTQHGKEE | 0.537 | RPQGLPNNTASWFTALTQHGK | N | 42–62 | 95 |
| NNAATVLQLPQGTTLPKGFYA | 0.543 | NNNAATVLQLPQGTTLPKGF | N | 153–172 | 95 |
| KHIDAYKTFPPTEPKKDKKKKTDEAQPLPQRQKKQPTVTLLPAADMDD | 0.82 | NKHIDAYKTFPPTEPKKDKKKKTDEAQPLPQRQKKQPTVTLLPAADM | N | 355–401 | 90 |
S, surface glycoprotein; M, membrane protein; N, nucleocapsid phosphoprotein
Dominant SARS-CoV T Cell Epitopes
| SARS | SARS-CoV-2 | |||||
|---|---|---|---|---|---|---|
| Sequence | RF Score | HLA Restriction | Sequence | Protein | Mapped Start–End | Identity (%) |
| VRGWVFGSTMNNKSQSVI | 0.15 | DRB1∗04:01 | IRGWIFGTTLDSKTQSLL | S | 101–118 | 50 |
| CTFEYISDAFSLD | 0.21 | DRB1∗04:01 | CTFEYVSQPFLMD | S | 166–178 | 62 |
| DAFSLDVSEKSGN | 0.62 | DRB1∗04:01 | QPFLMDLEGKQGN | S | 173–185 | 38 |
| TNFRAILTAFSPAQDIW | 0.32 | DRB1∗04:01 | TRFQTLLALHRSYLTPGDSSSGW | S | 236–258 | 17 |
| KSFEIDKGIYQTSNFRVV | 0.40 | DRB1∗04:01, DRB1∗07:01 | KSFTVEKGIYQTSNFRVQ | S | 304–321 | 78 |
| STFFSTFKCYGVSATKL | 0.50 | DRB1∗07:01, DR8 | SASFSTFKCYGVSPTKL | S | 371–387 | 82 |
| KLPDDFMGCV | 0.55 | A∗02:01 | KLPDDFTGCV | S | 424–433 | 90 |
| NIDATSTGNYNYKYRYLR | 0.29 | Class II | NLDSKVGGNYNYLYRLFR | S | 440–457 | 56 |
| YLRHGKLRPFERDISNVP | 0.16 | DRB1∗04:01 | YLYRLFRKSNLKPFERDI | S | 451–468 | 58 |
| RPFERDISNVPFS | 0.36 | DRB1∗04:01 | KPFERDISTEIYQ | S | 462–474 | 54 |
| KSIVAYTMSLGADSSIAY | 0.15 | DRB1∗04:01, DRB1∗07:01 | QSIIAYTMSLGAENSVAY | S | 690–707 | 72 |
| SIVAYTMSL | 0.29 | A∗02:01 | SIIAYTMSL | S | 691–699 | 89 |
| TECANLLLQYGSFCTQL | 0.50 | DR8 | TECSNLLLQYGSFCTQL | S | 747–763 | 94 |
| VKQMYKTPTLKYFGGFNF | 0.20 | DRB1∗04:01 | VKQIYKTPPIKDFGGFNF | S | 785–802 | 78 |
| ESLTTTSTALGKLQDVV | 0.42 | DRB1∗04:01 | DSLSSTASALGKLQDVV | S | 936–952 | 71 |
| ALNTLVKQL | 0.29 | A∗02:01 | ALNTLVKQL | S | 958–966 | 100 |
| VLNDILSRL | 0.29 | A∗02:01 | VLNDILSRL | S | 976–984 | 100 |
| LITGRLQSL | 0.42 | A∗02:01 | LITGRLQSL | S | 996–1004 | 100 |
| QLIRAAEIRASANLAATK | 0.20 | DRB1∗04:01 | QLIRAAEIRASANLAATK | S | 1011–1028 | 100 |
| SWFITQRNFFSPQII | 0.60 | DRB1∗04:01 | HWFVTQRNFYEPQII | S | 1101–1115 | 73 |
| RLNEVAKNL | 0.42 | A∗02:01 | RLNEVAKNL | S | 1185–1193 | 100 |
| NLNESLIDL | 0.29 | A∗02:01 | NLNESLIDL | S | 1192–1200 | 100 |
| FIAGLIAIV | 0.80 | A∗02:01 | FIAGLIAIV | S | 1220–1228 | 100 |
| RFFTLGSITAQPVKI | 0.18 | B∗58:01 | RIFTIGTVTLKQGEI | Orf 3a | 6–20 | 40 |
| SITAQPVKI | 0.29 | B∗58:01 | TVTLKQGEI | Orf 3a | 12–20 | 22 |
| TLACFVLAAV | 0.59 | A∗02:01 | TLACFVLAAV | M | 61–70 | 100 |
| GLMWLSYFV | 0.59 | A∗02:01 | GLMWLSYFI | M | 89–97 | 89 |
| HLRMAGHSL | 0.40 | Class I | HLRIAGHHL | M | 148–156 | 78 |
| ALNTPKDHI | 0.29 | A∗02:01 | ALNTPKDHI | N | 138–146 | 100 |
| LQLPQGTTL | 0.29 | A∗02:01 | LQLPQGTTL | N | 159–167 | 100 |
| GETALALLLL | 0.38 | B∗40:01 | GDAALALLLL | N | 215–224 | 80 |
| LALLLLDRL | 0.29 | A∗02:01 | LALLLLDRL | N | 219–227 | 100 |
| LLLDRLNQL | 0.42 | A∗02:01 | LLLDRLNQL | N | 222–230 | 100 |
| RLNQLESKV | 0.42 | A∗02:01 | RLNQLESKM | N | 226–234 | 89 |
| TKQYNVTQAF | 0.29 | Class I | TKAYNVTQAF | N | 265–274 | 90 |
| GMSRIGMEV | 0.42 | A∗02:01 | GMSRIGMEV | N | 316–324 | 100 |
| MEVTPSGTWL | 0.42 | B∗40:01 | MEVTPSGTWL | N | 322–331 | 100 |
| QFKDNVILL | 0.50 | A∗24:02 | NFKDQVILL | N | 345–353 | 78 |
| CLDAGINYV | CLEASFNYL | Orf 1ab | 2139–2147 | 56 | ||
| WLMWFIISI | WLMWLIINL | Orf 1ab | 2292–2300 | 67 | ||
| ILLLDQVLV | ILLLDQALV | Orf 1ab | 2498–2506 | 89 | ||
| LLCVLAALV | SACVLAAEC | Orf 1ab | 2840–2848 | 56 | ||
| ALSGVFCGV | SLPGVFCGV | Orf 1ab | 2942–2950 | 78 | ||
| TLMNVITLV | TLMNVLTLV | Orf 1ab | 3639–3647 | 89 | ||
| SMWALVISV | SMWALIISV | Orf 1ab | 3661-3669 | 89 | ||
S, surface glycoprotein; M, membrane protein; N, nucleocapsid phosphoprotein.
Restrictions defined only in HLA-transgenic mice are indicated by the italicized font.
Figure 3SARS-CoV-2 Spike Glycoprotein (PDB: 6VSB)
The calculated surface of the top 13 amino acid residues predicted to be B cell epitopes based on ranking performed with Discotope 2.0 are shown in red. The monomer is shown in the upper left. The upper right and lower center present the trimer in two different orientations. 3D-rendering was performed using YASARA (Krieger and Vriend, 2014).
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| SARS-CoV-2 spike glycoprotein 3D-structure | PDB ID: | |
| Wuhan-Hu-1 RNA isolate | NCBI nuccore database | GenBank:MN908947 |
| ORF10 protein | NCBI protein database | NCBI: |
| Nucleocapsid phosphoprotein | NCBI protein database | NCBI: |
| ORF8 protein | NCBI protein database | NCBI: |
| ORF7a protein | NCBI protein database | NCBI: |
| ORF6 protein | NCBI protein database | NCBI: |
| membrane glycoprotein | NCBI protein database | NCBI: |
| envelope protein | NCBI protein database | NCBI: |
| ORF3a protein | NCBI protein database | NCBI: |
| surface glycoprotein | NCBI protein database | NCBI: |
| orf1ab polyprotein | NCBI protein database | NCBI: |
| YASARA | ||
| IEDB | ||
| BebiPred 2.0 | ||
| Discotope 2.0 | ||
| NetMHCpan EL 4.0 | ||
| Tepitool | ||