| Literature DB >> 35349604 |
Kyle Saylor1, Ben Donnan1, Chenming Zhang1.
Abstract
The human leukocyte antigen (HLA) gene complex, one of the most diverse gene complexes found in the human genome, largely dictates how our immune systems recognize pathogens. Specifically, HLA genetic variability has been linked to vaccine effectiveness in humans and it has likely played some role in the shortcomings of the numerous human vaccines that have failed clinical trials. This variability is largely impossible to evaluate in animal models, however, as their immune systems generally 1) lack the diversity of the HLA complex and/or 2) express major histocompatibility complex (MHC) receptors that differ in specificity when compared to human MHC. In order to effectively engage the majority of human MHC receptors during vaccine design, here, we describe the use of HLA population frequency data from the USA and MHC epitope prediction software to facilitate the in silico mining of universal helper T cell epitopes and the subsequent design of a universal human immunogen using these predictions. This research highlights a novel approach to using in silico prediction software and data processing to direct vaccine development efforts.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35349604 PMCID: PMC8963548 DOI: 10.1371/journal.pone.0265644
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Common immunogen information.
| Protein | Abbr. | Residues | UniProt Source |
|---|---|---|---|
| Cholera toxin subunit B | CTB | 104 | Q55DA8 |
| Heat labile enterotoxin B | LTB | 124 | A05XG5 |
| MS2 capsid protein | MS2 | 130 | P03612 |
| Q-beta capsin protein | Qb | 133 | P03615 |
| Hepatitis C core antigen | HBcAg | 150 | Q68842 |
| Hepatitis B core antigen | HBcAg | 185 | P03148 |
| Influenza A hemogglutinin | HA | 328 | P04664 |
| Outer membrane protein C | OMPC | 367 | C6K7N1 |
| Human papillomavirus 16 L1 | HPV16L1 | 505 | A0A161GUX4 |
| Diphtheria toxin | DT | 560 | Q6NK15 |
| Bovine serum albumin | BSA | 607 | P02769 |
| Murine serum albumin | MSA | 608 | P07724 |
| Human serum albumin | HAS | 609 | P02768 |
| Pseudomonas aeruginosa exoprotein A | EPA | 638 | P11439 |
| Tetanus toxin | TT | 1315 | P04958 |
| Keyhole limpet hemocyanin 1 | KLH1 | 3125 | Q6KC56 |
| Keyhole limpet hemocyanin 2 | KLH2 | 3421 | Q6KC55 |
Fig 1Schematic overview of methodology.
(A) The overall study methodology and (B) the epitope scoring methodology. In the epitope scoring methodology, excerpts of NetMHCIIPan prediction output for the HLA-DQ/DT isotype/immunogen pairing are provided for reference. Binning of residue scores (calculated using epitope predictions) was achieved via iteration through epitope residues and epitopes within an isoform. This was the first step in both the UNC and WNC analysis and was performed for each isoform. Non-normalized residue scores (UnNC and WnNC) were then calculated by summing the score components of each residue bin. Normalized scores were calculated by dividing the non-normalized values by the average non-normalized residue scores for an endogenous immunogen (either HSA or MSA, depending upon the prediction isotype). UNC MMA scores were calculated by taking the mean of normalized residue scores from all isoforms within an isotype, subtracting three standard devi-ations, and applying a moving average (n = 13).
Fig 2HLA population frequencies.
(A) HLA-DQB1 and (B) HLA-DRB1 cumulative (line graphs with race/ethnicity data) and individual allele (bar graphs) population frequency information is displayed here. Cumulative frequency plots display summed frequencies of sequentially ordered HLA beta alleles (greatest to smallest for total population) vs. the number of alleles included in the sum. Individual allele plots display HLA alleles vs. their respective overall population frequency.
Fig 3HLA-DQ/DT epitope analysis results.
(A) DT epitope scoring (UNC) and anchor residue identification (WNC) results for the HLA-DQ isotype. Scores, in line form (black, UNC) or dot form (red, WNC), are plotted against residue number. For HLA-DQ (1st from left) and HLA-DR (2nd from left) results, the center line represents the mean score and the shaded area represents ±1 standard deviation. For IAd NetMHC (3rd from left), IAd SMM (4th from left), and IEd SMM (5th from left) results, lines represent the mean score. (B) DT UNC results for all isotypes are plotted together here for easier comparison. Scores are plotted in line form against residue number. For HLA-DQ and HLA-DR results, the shaded area represents the mean score ±1 standard deviation. For IAd NetMHC, IAd SMM, and IEd SMM results, lines represent the mean score. In both parts, shaded areas representing standard deviation could not be incorporated with the IAd and IEd results due to lack of diversity within the isotypes (these plots summarize a single immunogen/isoform prediction run). UNC, WNC, and combined results for other isotype/immunogen combinations can be found in Supporting Information.
Overview of prediction outcomes.
| Protein | Residues | DR Hits | DR Epitopes | DQ Hits | DQ Epitopes | IAd(Net) Hits | IAd(Net) Epitopes | IAd(SMM) Hits | IAd(SMM) Epitopes | IEd(SMM) Hits | IEd(SMM) Epitopes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cholera toxin subunit B | 104 | 21000 | 2 | 191100 | 3 | 525 | 2 | 525 | 3 | 525 | 2 |
| Heat labile enterotoxin B | 124 | 25800 | 3 | 234780 | 3 | 645 | 3 | 645 | 3 | 645 | 2 |
| MS2 capsid protein | 130 | 27240 | 3 | 247884 | 2 | 681 | 2 | 681 | 2 | 681 | 3 |
| Q-beta capsin protein | 133 | 27960 | 3 | 254436 | 3 | 699 | 2 | 699 | 2 | 699 | 2 |
| Hepatitis C core antigen | 150 | 32040 | 3 | 291564 | 4 | 801 | 3 | 801 | 3 | 801 | 2 |
| Hepatitis B core antigen | 185 | 40440 | 4 | 368004 | 3 | 801 | 3 | 1011 | 4 | 1011 | 5 |
| Influenza A hemogglutinin | 328 | 74760 | 8 | 680316 | 7 | 1869 | 7 | 1869 | 7 | 1869 | 8 |
| Outer membrane protein C | 367 | 84120 | 9 | 765492 | 8 | 2103 | 8 | 2103 | 9 | 2103 | 9 |
| Human papillomavirus 16 L1 | 505 | 117240 | 12 | 1066884 | 11 | 2931 | 13 | 2931 | 12 | 2931 | 9 |
| Diphtheria toxin | 560 | 130440 | 15 | 1187004 | 16 | 3261 | 13 | 3261 | 15 | 3261 | 12 |
| Bovine serum albumin | 607 | 141720 | 13 | 1289652 | 16 | 3543 | 14 | 3543 | 11 | 3543 | 14 |
| Murine serum albumin | 608 | 141960 | 13 | 1291836 | 18 | 3549 | 14 | 3549 | 13 | 3549 | 13 |
| Human serum albumin | 609 | 142200 | 13 | 1294020 | 15 | 3555 | 14 | 3555 | 11 | 3555 | 15 |
| Pseudomonas aeruginosa exoprotein A | 638 | 149160 | 18 | 1357356 | 17 | 3729 | 17 | 3729 | 16 | 3729 | 15 |
| Tetanus toxin | 1315 | 311640 | 31 | 2835924 | 33 | 7791 | 29 | 7791 | 28 | 7791 | 27 |
| Keyhole limpet hemocyanin 1 | 3125 | 746040 | 77 | 6788964 | 78 | 18651 | 73 | 18651 | 67 | 18651 | 67 |
| Keyhole limpet hemocyanin 2 | 3421 | 817080 | 82 | 7435428 | 96 | 20427 | 85 | 20427 | 73 | 20427 | 79 |
HLA-DQ epitope ranking and excision results.
| Protein | Epicenter | Residues | Peak Score | Cummulative Score | Anchor(s) | Anchor Location(s) |
|---|---|---|---|---|---|---|
| DT | 366 | VAQSIALSSLMVAQAIPLVGELVDI | 1.244903473 | 30.00208733 | 1 | 363 |
| KLH2 | 3116 | LWLGGTETYSMSSLAFSAYDPVFMI | 1.266599245 | 29.66917927 | 2 | 3113;3128 |
| KLH1 | 2126 | LKYALSSLQADTSADGFAAIASFHG | 1.203760532 | 28.79987579 | 3 | 2116;2127;2133 |
| TT | 683 | VLLLEYIPEITLPVIAALSIAESST | 1.18984899 | 28.47710714 | 1 | 685 |
| KLH2 | 3230 | LRNQPRVFAGFVLSGIYTSANVKIY | 1.155509004 | 28.26449885 | 1 | 3228 |
| EPA | 470 | GYVFVGYHGTFLEAAQSIVFGGVRA | 1.195748733 | 28.07205997 | 1 | 464 |
| KLH2 | 3030 | IPYWDWTKSMIALPAFFADSSNSNP | 1.220294602 | 27.98888619 | 1 | 3024 |
| KLH1 | 1714 | ESMKADHSSDGFQAIASFHALPPLC | 1.18117356 | 27.94054821 | 2 | 1713;1716 |
| HA | 113 | DVPDYASLRSLVASSGTLEFITEGF | 1.178072602 | 27.89447046 | 3 | 105;112;125 |
| KLH1 | 752 | EDRIYAGFLLAGIRTSANVDIFIKT | 1.194509058 | 27.84109025 | 2 | 747;752 |
| KLH2 | 330 | RAAKERTFASFILSGFGGSANVVVY | 1.1450165 | 27.76877714 | 1 | 341 |
| KLH1 | 1995 | QEHSRVFAGFLLEGFGTSATVDFQV | 1.194575351 | 27.7396872 | 2 | 1983;1994 |
| EPA | 496 | SQDLDAIWRGFYIAGDPALAYGYAQ | 1.157977165 | 27.72034364 | 2 | 491;502 |
| KLH2 | 2823 | KEERTFAAFLLHGFGASADVSFDVC | 1.19423916 | 27.50198956 | 2 | 2813;2821 |
| MSA | 243 | GERAFKAWAVARLSQTFPNADFAEI | 1.191126486 | 27.48352048 | 1 | 232 |
| EPA | 182 | LARDATFFVRAHESNEMQPTLAISH | 1.119360222 | 27.44939067 | 2 | 175;192 |
| KLH1 | 1038 | YEIAHNYIHALVGGAQPYGMASLRY | 1.140683141 | 27.35522704 | 3 | 1028;1035;1048 |
| TT | 268 | KQEIYMQHTYPISAEELFTFGGQDA | 1.133009845 | 27.30085116 | 2 | 264;272 |
| OMPC | 285 | WANKAQNFEAVAQYQFDFGLRPSLA | 1.148137149 | 27.2273307 | 1 | 289 |
| DT | 159 | EFIKRFGDGASRVVLSLPFAEGSSS | 1.130103304 | 27.11920376 | 3 | 154;159;166 |
| DT | 69 | QKGIQKPKSGTQGNYDDDWKGFYST | 0.750778926 | 18.86486125 | 0 | - |
| KLH2 | 3285 | FKYDITEVANRLNMHHDDTFNFRLE | 0.710019791 | 18.6720013 | 0 | - |
| MSA | 275 | TKVNKECCHGDLLECADDRAELAKY | 0.675856369 | 18.60844917 | 0 | - |
| OMPC | 226 | IGGAISSSKRTDAQNTAAYIGNGDR | 0.702058163 | 18.58162638 | 0 | - |
| MSA | 127 | CTKQEPERNECFLQHKDDNPSLPPF | 0.72994394 | 18.41383759 | 0 | - |
| KLH2 | 2050 | QFDRLYKYDITKTLKDMKLRYDDTF | 0.656917303 | 18.34949986 | 0 | - |
| QB | 74 | NYKVQVKIQNPTACTANGSCDPSVT | 0.676758453 | 18.24156848 | 0 | - |
| TT | 1172 | GKLNIYYRRLYNGLKFIIKRYTPNN | 0.663503833 | 18.18220649 | 0 | - |
| BSA | 127 | CEKQEPERNECFLSHKDDSPDLPKL | 0.709074042 | 17.90439636 | 0 | - |
| BSA | 329 | EKDAIPENLPPLTADFAEDKDVCKN | 0.682086932 | 17.8524391 | 0 | - |
| TT | 331 | IDSYKQIYQQKYQFDKDSNGQYIVN | 0.69292518 | 17.80870386 | 0 | - |
| HA | 77 | TLIDALLGDPHCDVFQDETWDLFVE | 0.651700228 | 17.75576302 | 0 | - |
| TT | 487 | LTFIAEKNSFSEEPFQDEIVSYNTK | 0.682363508 | 17.73314233 | 0 | - |
| HPV | 173 | CKPPIGEHWGKGSPCTNVAVNPGDC | 0.654750628 | 17.56775931 | 0 | - |
| HCV | 44 | YLLPRRGPRLGVRATRKTSERSQPR | 0.6604623 | 17.24324764 | 0 | - |
| HSA | 280 | ECCHGDLLECADDRADLAKYICENQ | 0.637717696 | 17.00537164 | 0 | - |
| HSA | 408 | KVFDEFKPLVEEPQNLIKQNCELFE | 0.632871202 | 16.65259624 | 0 | - |
| KLH1 | 262 | DCAQELLHQKMEPFSWEDNDIPLTN | 0.631582056 | 16.4462453 | 0 | - |
| BSA | 280 | CCHGDLLECADDRADLAKYICDNQD | 0.619778655 | 16.39374163 | 0 | - |
| KLH2 | 1217 | WRYDRVYKYEITQQLHDLDLHVGDN | 0.602212809 | 15.85052422 | 0 | - |
Epicenter—location (residue number) of epitope center in relation to parent protein sequence.
Peak score—highest residue UNC score found within the indicated epitope.
Cummulative score—the summation of UNC scores for all the residues found within the indicated epitope.
Anchor—a residue that achieved a WNC score of >4; indicates a residue with particular importance in MHC interactions.
Anchor locations—location (residue number) of the anchor residues in relation to the parent protein sequence.
Fig 4Design and assessment of DQ- and DR-specific UCAs and UCnAs.
The plots and heat maps shown here summarize epitope scores for (A) HLA-DQ and (B) HLA-DR isotypes. Top plots display scores for UCAs (1st from left), UCnAs (2nd from left), and a random protein of the same length (3rd from left), in line form (black, UNC) or dot form (red, WNC), plotted against residue number. Center lines represent mean scores and shaded areas represent ±1 standard deviation. Bottom heat maps display isoform vs. residue number for UCAs (1st from left), UCnAs (2nd from left), and a random protein of the same length (3rd from left). Lighter regions and darker regions within the heat map represent lower and higher immunogenicity scores, respectively.
Fig 5Comparing between prediction methods.
(A) Direct comparison of differences in individual residue scores (when necessary, averaged over all isoforms within an isotype) between isotypes/methods in boxplot format (first indicated isotype/method minus second indicated isotype/method). (B) Comparison of the absolute value of the differences in individual residue scores (when necessary, averaged over all isoforms within an isotype) between isotypes/methods in boxplot format. (C) Comparison of residue scores of isotypes/methods in boxplot format.