| Literature DB >> 33013857 |
David Requena1, Aldhair Médico2, Ruy D Chacón3, Manuel Ramírez4, Obert Marín-Sánchez5.
Abstract
Coronavirus disease (COVID-19), caused by the virus SARS-CoV-2, is already responsible for more than 4.3 million confirmed cases and 295,000 deaths worldwide as of May 15, 2020. Ongoing efforts to control the pandemic include the development of peptide-based vaccines and diagnostic tests. In these approaches, HLA allelic diversity plays a crucial role. Despite its importance, current knowledge of HLA allele frequencies in South America is very limited. In this study, we have performed a literature review of datasets reporting HLA frequencies of South American populations, available in scientific literature and/or in the Allele Frequency Net Database. This allowed us to enrich the current scenario with more than 12.8 million data points. As a result, we are presenting updated HLA allelic frequencies based on country, including 91 alleles that were previously thought to have frequencies either under 5% or of an unknown value. Using alleles with an updated frequency of at least ≥5% in any South American country, we predicted epitopes in SARS-CoV-2 proteins using NetMHCpan (I and II) and MHC flurry. Then, the best predicted epitopes (class-I and -II) were selected based on their binding to South American alleles (Coverage Score). Class II predicted epitopes were also filtered based on their three-dimensional exposure. We obtained 14 class-I and four class-II candidate epitopes with experimental evidence (reported in the Immune Epitope Database and Analysis Resource), having good coverage scores for South America. Additionally, we are presenting 13 HLA-I and 30 HLA-II novel candidate epitopes without experimental evidence, including 16 class-II candidates in highly exposed conserved areas of the NTD and RBD regions of the Spike protein. These novel candidates have even better coverage scores for South America than those with experimental evidence. Finally, we show that recent similar studies presenting candidate epitopes also predicted some of our candidates but discarded them in the selection process, resulting in candidates with suboptimal coverage for South America. In conclusion, the candidate epitopes presented provide valuable information for the development of epitope-based strategies against SARS-CoV-2, such as peptide vaccines and diagnostic tests. Additionally, the updated HLA allelic frequencies provide a better representation of South America and may impact different immunogenetic studies.Entities:
Keywords: COVID-19; HLA; SARS-CoV-2; South America; allele frequency; epitope; immunoinformatics; literature review
Mesh:
Substances:
Year: 2020 PMID: 33013857 PMCID: PMC7494848 DOI: 10.3389/fimmu.2020.02008
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1PRISMA flowchart of the literature review. Flow diagram of the datasets processed. (A) Current Scenario, using only the datasets available in the Allele Frequency Net Database. (B) Updated Scenario, adding databases from scientific literature until April 10, 2020. Countries are represented by the first three letters of their names. PRISMA checklist included in the Supplementary Material (77).
Figure 2Allele frequencies of HLA-I A, B, C, and HLA-II DRB1 in South America. The pie charts represent the distribution of allele frequencies in genes HLA-I A, B, C, and HLA-II DRB1. Alleles with Weighted Allele Frequency (WAF) ≥ 5% in four or more countries are represented in scales of red (A), blue (B), brown (C), and purple (DRB1), respectively. Alleles with WAF ≥ 5% in three or fewer countries are shown in white. Alleles with WAF < 5%, are shown in gray. The green gradient represents the extent of people with HLA alleles genotyped. Detailed data is available in Tables S5, S6.
Best HLA-I candidate epitopes for South America in SARS-CoV-2 proteins.
| S | 1060 | 1068 | VVFLHVTYV | LTM, ML | 71663 | 3 | - | 5 | 3 | 6 | 1 | 2 | 3 | 9 | 2.146 | Ahmed et al. ( |
| 894 | 902 | LQIPFAMQM | ML | 38855 | 1 | - | 4 | 2 | 5 | 0 | 6 | 1 | 11 | 2.043 | ||
| ORF6 | 3 | 11 | HLVDFQVTI | ML | 24313 | 3 | - | 6 | 4 | 6 | 1 | 3 | 3 | 12 | 2.520 | New |
| NSP3 | 950 | 958 | VMYMGTLSY | 5477, 70040 | 6 | - | 7 | 5 | 7 | 1 | 5 | 2 | 8 | 2.801 | ||
| NSP5 | 219 | 227 | F | 16786 | 6 | - | 9 | 6 | 8 | 1 | 3 | 4 | 13 | 3.224 | ||
| NSP6 | 86 | 94 | MPASWVMRI | 42260, 42261 | 4 | - | 6 | 5 | 6 | 0 | 3 | 3 | 9 | 2.292 | ||
| NSP8 | 47 | 55 | SEFDRDAAM | 57419 | 3 | - | 2 | 3 | 4 | 1 | 5 | 0 | 7 | 1.832 | ||
| NSP12 | 877 | 885 | YADVFHLYL | ML | 14969 | 6 | - | 11 | 7 | 8 | 1 | 3 | 5 | 10 | 3.346 | New |
| 123 | 131 | TMADLVYAL | 65176, 65177 | 3 | - | 5 | 4 | 6 | 1 | 4 | 4 | 13 | 2.756 | |||
| 898 | 906 | HMLDMYSVM | 24342 | 3 | - | 5 | 3 | 5 | 1 | 4 | 4 | 12 | 2.591 | |||
| NSP13 | 355 | 363 | YVFCTVNAL | ML | 76266 | 6 | - | 9 | 6 | 8 | 1 | 3 | 5 | 13 | 3.335 | New |
| 291 | 299 | FAIGLA | 23758 | 6 | - | 9 | 7 | 8 | 1 | 3 | 3 | 6 | 2.838 | |||
| NSP14 | 494 | 502 | YLDAYNMMI | ML | 74593 | 5 | - | 9 | 6 | 8 | 1 | 1 | 4 | 11 | 2.823 | New |
| 500 | 508 | MMISAGFSL | 42128 | 2 | - | 5 | 4 | 6 | 1 | 4 | 3 | 13 | 2.590 | |||
| S | 869 | 877 | MIAQYTSAL | 8 | - | 10 | 8 | 9 | 1 | 5 | 6 | 15 | 4.127 | New | ||
| 691 | 699 | SIIAYTMSL | - | - | 6 | - | 8 | 6 | 9 | 1 | 5 | 5 | 16 | 3.739 | ||
| 269 | 277 | YLQPRTFLL | 6 | - | 8 | 6 | 8 | 2 | 3 | 6 | 14 | 3.646 | ||||
| NSP2 | 420 | 428 | YITGGVVQL | - | - | 5 | - | 8 | 6 | 9 | 1 | 5 | 4 | 12 | 3.382 | New |
| 265 | 273 | GLNDNLLEI | 2 | - | 4 | 2 | 5 | 1 | 1 | 3 | 7 | 1.705 | ||||
| NSP3 | 1776 | 1784 | YVNTFSSTF | 6 | - | 8 | 7 | 9 | 1 | 4 | 4 | 10 | 3.276 | New | ||
| 1452 | 1460 | YLNSTNVTI | - | - | 5 | - | 7 | 6 | 9 | 1 | 3 | 4 | 14 | 3.180 | ||
| 816 | 824 | YYHTTD | 6 | - | 9 | 8 | 9 | 1 | 2 | 5 | 8 | 3.148 | ||||
| NSP4 | 25 | 33 | YLITPVHVM | - | - | 6 | - | 7 | 6 | 9 | 1 | 6 | 5 | 14 | 3.721 | New |
| 309 | 317 | 3 | - | 3 | 3 | 5 | 1 | 6 | 1 | 9 | 2.270 | |||||
| NSP12 | 442 | 450 | FAQDGNA | - | - | 8 | - | 12 | 8 | 10 | 1 | 4 | 6 | 12 | 4.013 | New |
| 281 | 289 | KLFDRYFKY | 4 | - | 4 | 4 | 4 | 1 | 5 | 2 | 5 | 2.169 | ||||
| NSP16 | 103 | 111 | FVSDADSTL | - | - | 6 | - | 11 | 7 | 9 | 1 | 3 | 4 | 13 | 3.437 | New |
These candidate epitopes were selected from our prediction based on their scores (rank for NetMHC and IC50 for MHCflurry) and Coverage Score (CS). Experimental evidence including the IEDB ID and experiment type (LTM: linearT_MHC, ML: MHC_ligand) is provided when available. Scientific articles already proposing these candidates are mentioned. Residues in bold represent positive selection pressure sites.
Best HLA-II candidate epitopes for South America in SARS-CoV-2 proteins.
| S | 1013 | 1027 | IRAAEIRASANLAAT | CH | LTM | 100428 | 5 | 5 | 5 | 4 | 5 | 7 | 4 | 7 | 4 | 8.148 | New |
| 1014 | 1028 | RAAEIRASANLAATK | 100428 | 6 | 5 | 7 | 4 | 5 | 7 | 5 | 7 | 5 | 9.000 | ||||
| 1152 | 1166 | LDKYFK | HR2 | ML | 35205 | 4 | 4 | 4 | 3 | 4 | 5 | 5 | 6 | 3 | 6.760 | Ahmed et al. ( | |
| 1153 | 1167 | DKYFK | 9006 | 4 | 4 | 4 | 3 | 4 | 5 | 5 | 6 | 3 | 6.760 | ||||
| S | 61 | 75 | NTD | - | - | 4 | 4 | 4 | 3 | 4 | 5 | 4 | 4 | 4 | 6.474 | New | |
| 114 | 128 | TQSLLIVNNEAT | 4 | 4 | 5 | 3 | 4 | 6 | 5 | 6 | 3 | 7.045 | |||||
| 115 | 129 | QSLLIVN | 5 | 4 | 6 | 4 | 5 | 5 | 5 | 5 | 4 | 7.719 | |||||
| 116 | 130 | SLLIVN | 4 | 3 | 5 | 3 | 4 | 4 | 5 | 4 | 4 | 6.474 | |||||
| 206 | 220 | KHTPINLVRDLPQGF | 4 | 3 | 4 | 3 | 4 | 5 | 3 | 5 | 3 | 6.017 | |||||
| 207 | 221 | HTPINLVRDLPQGFS | 5 | 4 | 5 | 3 | 4 | 6 | 5 | 6 | 4 | 7.412 | |||||
| 208 | 222 | TPINLVRDLPQGFSA | 4 | 4 | 3 | 3 | 4 | 5 | 2 | 6 | 3 | 6.017 | |||||
| 216 | 230 | LPQGFSALEPLVDLP | 3 | 4 | 3 | 3 | 3 | 5 | 5 | 6 | 4 | 6.450 | |||||
| 217 | 231 | PQGFSALEPLVDLPI | 3 | 4 | 3 | 3 | 3 | 5 | 5 | 6 | 4 | 6.450 | |||||
| 308 | 322 | VEKGIYQTSNFRVQP | RBD | - | - | 4 | 5 | 5 | 3 | 4 | 7 | 5 | 7 | 3 | 7.531 | ||
| 309 | 323 | EKGIYQTSNFRVQPT | 5 | 5 | 6 | 4 | 5 | 7 | 5 | 7 | 4 | 8.490 | |||||
| 313 | 327 | YQTSNFRVQPTESIV | 3 | 5 | 3 | 3 | 3 | 6 | 5 | 6 | 3 | 6.593 | |||||
| 314 | 328 | QTSNFRVQPTESIVR | 6 | 5 | 7 | 4 | 5 | 6 | 5 | 6 | 5 | 8.714 | |||||
| 315 | 329 | TSNFRVQPTESIVRF | 6 | 5 | 7 | 4 | 5 | 7 | 5 | 7 | 5 | 9.000 | |||||
| 316 | 330 | SNFRVQPTESIVRFP | 3 | 4 | 5 | 3 | 3 | 5 | 5 | 5 | 4 | 6.593 | |||||
| 430 | 444 | TGCVIAWNSNNLDSK | 4 | 4 | 6 | 3 | 4 | 6 | 5 | 6 | 4 | 7.388 | |||||
| 689 | 703 | SQSIIAYTMSLGAEN | - | - | - | 3 | 5 | 3 | 3 | 4 | 7 | 4 | 7 | 3 | 6.879 | ||
| 690 | 704 | QSIIAYTMSLGAENS | 4 | 5 | 4 | 3 | 4 | 7 | 5 | 7 | 3 | 7.388 | |||||
| 785 | 799 | VKQIYKTPPIKDFGG | 3 | 5 | 2 | 3 | 3 | 5 | 4 | 6 | 3 | 6.107 | |||||
| 801 | 815 | FP | - | - | 5 | 4 | 5 | 3 | 4 | 6 | 4 | 6 | 4 | 7.212 | |||
| 802 | 816 | FSQILPDPSKPSKRS | 5 | 4 | 5 | 3 | 4 | 6 | 5 | 6 | 4 | 7.412 | |||||
| 1059 | 1073 | GVVFLHVTYVPAQEK | BH | - | - | 3 | 5 | 3 | 3 | 3 | 7 | 5 | 7 | 4 | 7.079 | ||
| 1060 | 1074 | VVFLHVTYVPAQEK | 3 | 4 | 2 | 3 | 3 | 5 | 4 | 6 | 4 | 6.107 | |||||
| 1098 | 1112 | SD3 | - | - | 4 | 4 | 4 | 3 | 3 | 5 | 5 | 5 | 4 | 6.617 | |||
| 1099 | 1113 | GTHWFVTQRNFYEPQ | 4 | 4 | 4 | 3 | 3 | 6 | 5 | 6 | 4 | 6.902 | |||||
| 1110 | 1124 | YEPQIITTDNTFVS | 4 | 4 | 4 | 3 | 4 | 5 | 5 | 5 | 4 | 6.817 | |||||
| 1111 | 1125 | EPQIITTD | 4 | 4 | 6 | 3 | 4 | 6 | 5 | 6 | 4 | 7.388 | |||||
| 1126 | 1140 | CDVVIGIVNNTVYDP | - | - | - | 4 | 3 | 5 | 3 | 4 | 4 | 5 | 4 | 4 | 6.474 | ||
| M | 7 | 21 | TITVEELKKLLEQWN | Virion Surface | - | - | 2 | 2 | 2 | 1 | 1 | 3 | 3 | 3 | 2 | 3.326 | New |
| E | 1 | 15 | MYSFVSEETGTLIVN | Virion Surface | - | - | 0 | 2 | 1 | 1 | 1 | 2 | 0 | 2 | 1 | 1.764 | New |
These candidate epitopes were selected from our prediction based on their scores (rank for NetMHC and IC50 for MHCflurry), Coverage Score (CS), and exposure (for Class-II only). Experimental evidence including the IEDB ID and experiment type (LTM: linearT_MHC, ML: MHC_ligand) is provided when available. Scientific articles already proposing these candidates are mentioned. Underlined residues indicate sites with predicted N-linked glycosylations. Residues in bold represent positive selection pressure sites.
Figure 3Candidate epitopes in the sequence of the Membrane (M) and Spike (S) proteins of SARS-CoV-2. (A,E,G) Show the entropy per amino acid for the S, M, and E proteins, respectively, calculated by aligning 2123 SARS-CoV-2 genomes. In (B,F,H), post-translational modifications are represented as sticks with colored circles: beige (N-linked glycosylations), pink (O-GalNAc glycosylations), and lemon (palmitoylations). (B) Regions of the S protein, indicating the subunits 1 (S1), 2 (S2), and cleavage points (scissors). Positive selection pressure is represented with (+). SP, Signal peptide; NTD, N-terminal domain; RBD, Receptor Binding Domain; RBM, Receptor Binding Motif; SD1, Sub-Domain 1; SD2, Sub-Domain 2; FP, Fusion Peptide; CR, Connecting Region; HR1, Heptad Repeat 1; CH, Central Helix; BH, B-Hairpin; SD3, Sub-Domain 3; HR2, Heptad Repeat 2; TM, Transmembrane domain; CT, Cytoplasmic tail. (C) HLA-I epitopes predicted for South American alleles with WAF ≥5%. The gradient of green represents the coverage scores. The rectangles below represent the predicted epitopes with experimental evidence in the IEDB (blue), and those without experimental evidence with CS ≥2 (light blue). Overlapping predicted epitopes are represented by a single rectangle with the number of epitopes contained (underneath). The inverted triangle on top highlights our best candidates. (D) Analogously, for HLA-II. Predicted epitopes with experimental evidence are shown in red. Without experimental evidence and CS ≥6, in yellow. Those in the RBD are highlighted in orange. (F,H) Represent the topology of the M and E proteins, respectively. Exposed (E), transmembrane (TM), and intra-virion (I) regions were extracted from annotated proteins (UniProtKB IDs: P0DTC4 and P0DTC5). The best HLA-II epitopes predicted in their exposed regions (and their CS) are shown in yellow.
Figure 4Candidate epitopes in the 3D structure of the Spike (S) protein of SARS-CoV-2. This figure shows our HLA-II candidate epitopes with experimental evidence (red), without experimental evidence and CS ≥ 6 (orange: located in RBD, yellow: located in other regions), and the binding site of the Ab CR3022 (violet). Candidate epitopes were mapped to the Spike monomer (A), and the Spike trimer (B and C). The monomer is represented in beige, and the other 2 subunits in light purple. (C) Rotation of the trimer showing the RBD region. A pyramid representing the trimer is shown as visual aid to represent the rotation.