| Literature DB >> 32372051 |
Kazuma Kiyotani1, Yujiro Toyoshima2, Kensaku Nemoto2, Yusuke Nakamura2.
Abstract
To control and prevent the current COVID-19 pandemic, the development of novel vaccines is an emergent issue. In addition, we need to develop tools that can measure/monitor T-cell and B-cell responses to know how our immune system is responding to this deleterious virus. However, little information is currently available about the immune target epitopes of novel coronavirus (SARS-CoV-2) to induce host immune responses. Through a comprehensive bioinformatic screening of potential epitopes derived from the SARS-CoV-2 sequences for HLAs commonly present in the Japanese population, we identified 2013 and 1399 possible peptide epitopes that are likely to have the high affinity (<0.5%- and 2%-rank, respectively) to HLA class I and II molecules, respectively, that may induce CD8+ and CD4+ T-cell responses. These epitopes distributed across the structural (spike, envelope, membrane, and nucleocapsid proteins) and the nonstructural proteins (proteins corresponding to six open reading frames); however, we found several regions where high-affinity epitopes were significantly enriched. By comparing the sequences of these predicted T cell epitopes to the other coronaviruses, we identified 781 HLA-class I and 418 HLA-class II epitopes that have high homologies to SARS-CoV. To further select commonly-available epitopes that would be applicable to larger populations, we calculated population coverages based on the allele frequencies of HLA molecules, and found 2 HLA-class I epitopes covering 83.8% of the Japanese population. The findings in the current study provide us valuable information to design widely-available vaccine epitopes against SARS-CoV-2 and also provide the useful information for monitoring T-cell responses.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32372051 PMCID: PMC7200206 DOI: 10.1038/s10038-020-0771-5
Source DB: PubMed Journal: J Hum Genet ISSN: 1434-5161 Impact factor: 3.172
Fig. 1Summary of SARS-CoV-2-dreived T cell epitopes. a Distribution of SARS-CoV-2-dreived HLA-class I and II epitopes with the high binding affinity derived from the SARS-CoV-2 protein sequence (SARS-CoV-2_Wuhan-Hu-1) [1]. Red bars represent strong binding affinity epitopes with <0.5% rank and 2% rank, to HLA class I and class II, respectively, for each HLA molecule. b Genomic organization of SARS-CoV-2. ORF, open reading frame, S spike, E envelope, M membrane, N nucleocapsid proteins. c Similarity plot based on the full-length genome sequence of SARS-CoV-2. Genome sequences of SARS-CoV-2_WIV02 (accession number MN996527), SARS-CoV_GZ02 (AY390556), and Bat-CoV_RaTG13 (MN996532) were compared with SARS-CoV-2_Wuhan-Hu-1 (MN908947)
SARS-CoV-2-derived T cell epitopes predicted with high affinity to HLA molecules
| Protein | Length of protein (amino acids) | Number of epitopes | |||||||
|---|---|---|---|---|---|---|---|---|---|
| HLA-class I epitopes (rank ≤ 0.5%) | HLA-class II epitopes (rank ≤ 2%) | ||||||||
| All | HLA-A | HLA-B | HLA-C | All | HLA-DP | HLA-DQ | HLA-DR | ||
| ORF1ab | 7096 | 1478 | 665 | 937 | 588 | 1002 | 268 | 377 | 572 |
| S | 1273 | 248 | 105 | 159 | 110 | 154 | 48 | 54 | 88 |
| ORF3a | 275 | 69 | 30 | 48 | 28 | 74 | 33 | 25 | 46 |
| E | 75 | 18 | 11 | 11 | 5 | 11 | 3 | 0 | 11 |
| M | 222 | 72 | 36 | 35 | 25 | 57 | 16 | 21 | 35 |
| ORF6 | 61 | 8 | 4 | 4 | 2 | 28 | 10 | 12 | 13 |
| ORF7a | 121 | 26 | 11 | 18 | 9 | 16 | 10 | 0 | 6 |
| ORF8 | 121 | 22 | 11 | 14 | 10 | 18 | 4 | 10 | 8 |
| N | 419 | 60 | 21 | 35 | 21 | 32 | 4 | 17 | 12 |
| ORF10 | 38 | 12 | 4 | 10 | 7 | 7 | 7 | 0 | 0 |
| Total | 2013 | 898 | 1271 | 805 | 1399 | 403 | 516 | 791 | |
SARS-CoV-2-derived T cell epitopes common to SARS-CoV or MERS-CoV
| Protein | Length of protein (amino acids) | Number of common epitopes to SARS-CoV or MERS-CoV | |||||
|---|---|---|---|---|---|---|---|
| HLA-class I epitopes | HLA-class II epitopes | ||||||
| SARS-CoV | MERS-CoV | SARS- and MERS-CoVs | SARS-CoV | MERS-CoV | SARS- and MERS-CoVs | ||
| ORF1ab | 7096 | 633 | 33 | 30 | 362 | 10 | 9 |
| S | 1273 | 58 | 3 | 0 | 40 | 0 | 0 |
| ORF3a | 275 | 4 | 0 | 0 | 0 | 0 | 0 |
| E | 75 | 15 | 0 | 0 | 4 | 0 | 0 |
| M | 222 | 28 | 0 | 0 | 4 | 0 | 0 |
| ORF6 | 61 | 0 | 0 | 0 | 0 | 0 | 0 |
| ORF7a | 121 | 3 | 0 | 0 | 1 | 0 | 0 |
| ORF8 | 121 | 7 | 0 | 0 | 0 | 0 | 0 |
| N | 419 | 33 | 0 | 0 | 7 | 0 | 0 |
| ORF10 | 38 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 781 | 36 | 30 | 418 | 10 | 9 | |
SARS-CoV-2-derived HLA-class I epitopes with high coverage of Japanese population based on HLA-A frequency
| Protein | Position in protein | Peptide length | Peptide sequence | Population coveragea | |||||
|---|---|---|---|---|---|---|---|---|---|
| HLA-A | HLA-B | HLA-C | |||||||
| ORF1ab | 2168 | 9 | YMPYFFTLL | 83.8% | A*02:01, A*02:06, A*24:02 | 0.0% | – | 76.5% | C*01:02, C*08:01, C*12:02, C*14:02 |
| ORF1ab | 4089 | 10 | FTYASALWEI | 83.8% | A*02:01, A*02:06, A*24:02 | 22.0% | B*52:01 | 22.2% | C*12:02 |
| ORF1ab | 3653 | 10 | VYMPASWVMR | 75.7% | A*24:02, A*31:01, A*33:03 | 0.0% | – | 0.0% | – |
| S | 448 | 10 | NYNYLYRLFR | 75.7% | A*24:02, A*31:01, A*33:03 | 0.0% | – | 0.0% | – |
| ORF1ab | 3654 | 10 | YMPASWVMRI | 75.1% | A*02:01, A*24:02 | 14.1% | B*51:01 | 0.0% | – |
| ORF1ab | 1674 | 10 | CYLATALLTL | 75.1% | A*02:01, A*24:02 | 0.0% | – | 0.0% | – |
| ORF1ab | 6418 | 10 | LYLDAYNMMI | 75.1% | A*02:01, A*24:02 | 0.0% | – | 0.0% | – |
| S | 268 | 10 | GYLQPRTFLL | 75.1% | A*02:01, A*24:02 | 0.0% | – | 0.0% | – |
| ORF3a | 106 | 10 | LYLYALVYFL | 75.1% | A*02:01, A*24:02 | 0.0% | – | 0.0% | – |
| ORF1ab | 3605 | 10 | FLYENAFLPF | 72.3% | A*02:06, A*24:02 | 24.3% | B*15:01, B*46:01 | 74.7% | C*03:04, C*07:02, C*08:01, C*12:02, C*14:03 |
| ORF1ab | 3126 | 10 | IQWMVMFTPL | 72.3% | A*02:06, A*24:02 | 33.4% | – | 0.0% | – |
aCalculated based on the allele frequency of HLA (Supplementary Table 1)
Fig. 2Distribution of mutation rates of SARS-CoV-2. A total of 6421 SARS-CoV-2 sequences isolated from four different regions; 587 viruses from an Asian region, 1918 from a North American region, 3190 from European countries, and 726 from an Oceanian region, were compared with the reference protein sequence of SARS-CoV-2_Wuhan-Hu-1 [1]. 156 amino acid mutations, which were observed at more than 0.5% frequencies in at least one region, were plotted
Mutations frequently (≥10%) observed in SARS-CoV-2 isolated from four different regions
| Protein | Position in protein | Reference amino acida | Mutant amino acid | Mutation frequency | |||
|---|---|---|---|---|---|---|---|
| Asia ( | North America ( | Europe ( | Oceania ( | ||||
| ORF1ab | 265 | T | I | 0.0346 | 0.424 | 0.104 | 0.171 |
| ORF1ab | 3606 | L | F | 0.248 | 0.0438 | 0.178 | 0.17 |
| ORF1ab | 4715 | P | L | 0.154 | 0.587 | 0.737 | 0.526 |
| ORF1ab | 5828 | P | L | 0 | 0.306 | 0.00599 | 0.0783 |
| ORF1ab | 5865 | Y | C | 0 | 0.314 | 0.00530 | 0.0805 |
| S | 614 | D | G | 0.155 | 0.586 | 0.736 | 0.526 |
| ORF3a | 57 | Q | H | 0.0426 | 0.489 | 0.142 | 0.204 |
| ORF3a | 251 | G | V | 0.0991 | 0.0256 | 0.172 | 0.121 |
| ORF8 | 84 | L | S | 0.222 | 0.351 | 0.0295 | 0.206 |
| N | 13 | P | L | 0.0171 | 0.00313 | 0.00126 | 0.105 |
| N | 203 | R | K | 0.0532 | 0.0365 | 0.276 | 0.148 |
| N | 204 | G | R | 0.0532 | 0.0355 | 0.276 | 0.147 |
aProtein sequence based on the reference SARS-CoV-2_Wuhan-Hu-1 sequence (GenBank accession number MN908947)