| Literature DB >> 32721383 |
Ge Liu1, Brandon Carter1, Trenton Bricken2, Siddhartha Jain3, Mathias Viard4, Mary Carrington4, David K Gifford5.
Abstract
We present a combinatorial machine learning method to evaluate and optimize peptide vaccine formulations for SARS-CoV-2. Our approach optimizes the presentation likelihood of a diverse set of vaccine peptides conditioned on a target human-population HLA haplotype distribution and expected epitope drift. Our proposed SARS-CoV-2 MHC class I vaccine formulations provide 93.21% predicted population coverage with at least five vaccine peptide-HLA average hits per person (≥ 1 peptide: 99.91%) with all vaccine peptides perfectly conserved across 4,690 geographically sampled SARS-CoV-2 genomes. Our proposed MHC class II vaccine formulations provide 97.21% predicted coverage with at least five vaccine peptide-HLA average hits per person with all peptides having an observed mutation probability of ≤ 0.001. We provide an open-source implementation of our design methods (OptiVax), vaccine evaluation tool (EvalVax), as well as the data used in our design efforts here: https://github.com/gifford-lab/optivax.Entities:
Keywords: COVID-19; SARS-CoV-2; combinatorial optimization; haplotype; machine learning; major histocompatibility complex; peptide vaccine; population coverage; vaccine augmentation; vaccine evaluation
Mesh:
Substances:
Year: 2020 PMID: 32721383 PMCID: PMC7384425 DOI: 10.1016/j.cels.2020.06.009
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304
Figure 1The OptiVax and EvalVax Machine Learning System for Combinatorial Vaccine Optimization and Evaluation
These methods can be used to design new peptide vaccines, evaluate existing vaccines, or augment existing vaccine designs. Peptides are scored by machine learning and immunogenicity data for population coverage optimization and evaluation.
Figure 2SARS-CoV-2 OptiVax-Robust Selected Peptide Vaccine Sets for (A) MHC Class I and (B) MHC Class II
(a) EvalVax-Robust population coverage at different per-individual number of peptide-HLA hit cutoffs for populations self-reporting as having White, Black, or Asian ancestry and average values.
(b) EvalVax-Unlinked population coverage on 15 geographic regions and averaged population coverage.
(c) Binding of vaccine peptides to each of the available alleles in MHC I and II.
(d) Peptide viral protein origins.
(e) Distribution of the number of per-individual peptide-HLA hits in populations self-reporting as having White, Black, or Asian ancestry.
(f) Vaccine peptide presence in SARS-CoV.
Comparison of Baselines, S-protein Peptides, and OptiVax Designed Peptide Vaccines (Using All SARS-CoV-2 Proteins or SMN Proteins Only) on Various Population Coverage Evaluation Metrics and Vaccine Quality Metrics (Percentage of Peptides with Mutation rate > 0.001 or with Non-zero Probability of being Glycosylated)
| Peptide Set | Vaccine Size | EvalVax-Unlinked | EvalVax-Robust | EvalVax-Robust | EvalVax-Robust | Exp. # Peptide-HLA Hits/Vaccine Size | Exp. # Peptide-HLA Hits (White) | Exp. # Peptide-HLA Hits (Black) | Exp. # Peptide-HLA Hits (Asian) | Peptides Glycosylated | Peptides Mutation Rate > 0.001 | On Cleavage Site | Protein Origins | In SARS-CoV |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OptiVax Augmented Nonredundant S-Protein | 126 + 16 | 100.00% | 100.00% | 99.97% | 99.27% | 20.50% | 27.20 | 27.68 | 32.44 | 0.00% | 0.00% | 0.00% | M, N, ORF1a, ORF1b, ORF3a, S1, S2 | 30.28% |
| S-Protein | 3795 | 99.96% | 100.00% | 99.17% | 98.29% | 0.91% | 30.84 | 32.14 | 41.13 | 15.57% | 29.99% | 0.63% | S1, S2 | 29.30% |
| OptiVax-Unlinked | 19 | 99.79% | 99.99% | 89.15% | 49.59% | 40.72% | 7.34 | 6.90 | 8.97 | 0.00% | 0.00% | 0.00% | ORF1a, ORF1b, ORF3a, S1 | 42.11% |
| Nonredundant S-protein | 126 | 99.84% | 99.93% | 97.37% | 91.69% | 16.82% | 19.20 | 19.99 | 24.38 | 0.00% | 0.00% | 0.00% | S1, S2 | 27.78% |
| OptiVax-Robust | 19 | 99.39% | 99.91% | 93.21% | 67.75% | 49.26% | 9.36 | 8.52 | 10.21 | 0.00% | 0.00% | 0.00% | ORF1a, ORF1b, ORF3a, ORF9b, S1 | 52.63% |
| OptiVax-Robust – size 15 | 15 | 99.07% | 99.89% | 86.69% | 54.36% | 54.47% | 8.17 | 7.20 | 9.14 | 0.00% | 0.00% | 0.00% | ORF1a, ORF1b, ORF9b, S1 | 53.33% |
| Nonredundant S1-subunit | 68 | 99.18% | 99.76% | 86.53% | 56.36% | 12.23% | 8.31 | 8.84 | 7.80 | 0.00% | 0.00% | 0.00% | S1 | 8.82% |
| ( | 37 | 95.86% | 99.75% | 52.94% | 16.00% | 13.51% | 5.37 | 4.99 | 4.64 | 8.11% | 37.84% | 0.00% | E, M, N, ORF10, ORF1a, ORF1b, ORF3a, ORF6, ORF7a, ORF7b, ORF8, S1 | 45.95% |
| OptiVax-Robust – S/M/N only | 26 | 97.49% | 98.15% | 67.37% | 26.24% | 22.31% | 5.31 | 5.64 | 6.45 | 0.00% | 0.00% | 0.00% | M, N, S1, S2 | 57.69% |
| ( | 52 | 90.89% | 95.82% | 56.52% | 19.99% | 9.88% | 5.20 | 4.44 | 5.77 | 7.69% | 34.62% | 0.00% | N | 55.77% |
| ( | 16 | 80.41% | 93.46% | 9.47% | 0.03% | 15.73% | 2.75 | 2.60 | 2.20 | 12.50% | 12.50% | 0.00% | N | 68.75% |
| Random subset of binders | 19 | 81.04% | 90.33% | 25.02% | 4.58% | 16.74% | 3.01 | 2.83 | 3.70 | 0.00% | 29.89% | 0.00% | N/A | 40.37% |
| ( | 5 | 71.91% | 90.10% | 0.55% | 0.00% | 33.60% | 1.93 | 1.44 | 1.67 | 0.00% | 40.00% | 0.00% | S1, S2 | 40.00% |
| ( | 13 | 78.66% | 85.29% | 58.51% | 30.56% | 44.25% | 5.59 | 4.98 | 6.69 | 7.69% | 30.77% | 0.00% | E, M, N, ORF1a, S1, S2 | 23.08% |
| ( | 10 | 69.12% | 85.13% | 3.21% | 0.01% | 19.23% | 1.68 | 1.72 | 2.37 | 0.00% | 30.00% | 0.00% | ORF1a, ORF1b, ORF3a, ORF8, S1 | 20.00% |
| ( | 51 | 68.63% | 80.80% | 1.52% | 0.00% | 3.12% | 1.90 | 1.70 | 1.17 | 11.76% | 43.14% | 5.88% | S1, S2 | 5.88% |
| ( | 10 | 66.91% | 78.49% | 23.49% | 2.72% | 28.34% | 2.93 | 2.50 | 3.07 | 10.00% | 10.00% | 0.00% | E | 80.00% |
| ( | 13 | 64.96% | 75.75% | 39.82% | 37.09% | 34.15% | 4.77 | 3.69 | 4.86 | 0.00% | 7.69% | 0.00% | E, N, ORF1a, ORF1b, S2 | 53.85% |
| ( | 31 | 49.46% | 71.24% | 0.08% | 0.00% | 3.47% | 1.09 | 1.11 | 1.02 | 3.23% | 35.48% | 0.00% | E, M, N, S1 | 41.94% |
| ( | 7 | 53.91% | 66.59% | 1.38% | 0.00% | 19.87% | 1.34 | 1.30 | 1.53 | 0.00% | 28.57% | 0.00% | E, M, N, S1, S2 | 71.43% |
| ( | 13 | 44.56% | 61.09% | 0.00% | 0.00% | 5.67% | 0.79 | 0.69 | 0.73 | 23.08% | 46.15% | 7.69% | S1, S2 | 23.08% |
| ( | 16 | 45.25% | 52.30% | 35.61% | 4.15% | 15.57% | 2.56 | 2.18 | 2.73 | 12.50% | 25.00% | 0.00% | N, S2 | 100.00% |
| ( | 5 | 29.90% | 41.77% | 0.00% | 0.00% | 8.86% | 0.56 | 0.36 | 0.41 | 0.00% | 20.00% | 0.00% | S1 | 20.00% |
| ( | 7 | 30.23% | 38.91% | 21.08% | 1.41% | 23.92% | 1.32 | 0.55 | 3.15 | 0.00% | 42.86% | 0.00% | S1, S2 | 14.29% |
| ( | 3 | 27.14% | 34.98% | 0.00% | 0.00% | 17.33% | 0.76 | 0.56 | 0.24 | 0.00% | 66.67% | 0.00% | S1, S2 | 0.00% |
| ( | 9 | 13.97% | 23.86% | 0.00% | 0.00% | 2.83% | 0.15 | 0.08 | 0.54 | 22.22% | 11.11% | 0.00% | S1, S2 | 11.11% |
| OptiVax-Unlinked | 19 | 91.67% | 99.67% | 95.94% | 83.30% | 64.45% | 14.37 | 12.71 | 9.66 | 0.00% | 0.00% | 0.00% | M, ORF1a, ORF1b, S2 | 52.63% |
| OptiVax-Robust | 19 | 90.76% | 99.67% | 97.21% | 88.48% | 76.04% | 16.64 | 15.71 | 11.00 | 0.00% | 0.00% | 0.00% | M, ORF1a, ORF1b, S1, S2 | 42.11% |
| OptiVax Augmented Nonredundant S-protein | 102 + 26 | 91.65% | 99.67% | 98.73% | 97.27% | 26.81% | 43.79 | 36.06 | 23.12 | 0.00% | 0.00% | 0.00% | M, ORF1a, ORF1b, S1, S2 | 29.69% |
| ( | 134 | 87.28% | 98.88% | 90.20% | 83.97% | 25.18% | 45.04 | 38.25 | 17.93 | 20.15% | 44.78% | 0.00% | E, M, N, S1, S2 | 30.60% |
| S-protein | 16315 | 89.80% | 98.76% | 95.99% | 95.73% | 2.22% | 492.82 | 385.60 | 208.34 | 30.01% | 57.50% | 1.43% | S1, S2 | 16.06% |
| OptiVax-Robust – S/M/N only | 22 | 86.34% | 98.57% | 85.37% | 62.49% | 42.51% | 11.31 | 9.69 | 7.05 | 0.00% | 0.00% | 0.00% | M, N, S1, S2 | 36.36% |
| Nonredundant S-protein | 102 | 84.91% | 98.56% | 82.72% | 77.19% | 16.61% | 23.54 | 17.04 | 10.23 | 0.00% | 0.00% | 0.00% | S1, S2 | 28.43% |
| Nonredundant S1-subunit | 53 | 77.14% | 95.81% | 63.43% | 41.82% | 16.33% | 13.07 | 8.74 | 4.16 | 0.00% | 0.00% | 0.00% | S1 | 3.77% |
| Random subset of binders | 19 | 72.41% | 93.61% | 58.67% | 32.40% | 31.59% | 7.72 | 6.49 | 3.79 | 0.00% | 63.79% | 0.00% | N/A | 23.55% |
| ( | 13 | 67.29% | 86.99% | 15.24% | 3.69% | 19.69% | 3.65 | 2.26 | 1.77 | 30.77% | 38.46% | 0.00% | E, M, N, ORF1a, S1, S2 | 0.00% |
| ( | 9 | 56.73% | 83.51% | 12.49% | 0.66% | 26.65% | 3.16 | 2.35 | 1.68 | 22.22% | 44.44% | 0.00% | S1, S2 | 55.56% |
| ( | 11 | 39.44% | 72.75% | 0.27% | 0.00% | 11.62% | 1.84 | 1.46 | 0.53 | 0.00% | 72.73% | 0.00% | E, M, N, ORF10, ORF6, ORF7a, ORF8 | 36.36% |
| ( | 10 | 42.30% | 69.37% | 0.00% | 0.00% | 9.83% | 1.47 | 0.91 | 0.57 | 20.00% | 90.00% | 0.00% | ORF1a, ORF1b, ORF3a, S2 | 20.00% |
| ( | 31 | 43.90% | 60.45% | 9.22% | 1.01% | 6.08% | 2.53 | 2.54 | 0.59 | 3.23% | 48.39% | 0.00% | E, M, N, S1 | 29.03% |
| ( | 7 | 41.48% | 56.29% | 0.96% | 0.00% | 14.02% | 1.44 | 1.11 | 0.39 | 0.00% | 28.57% | 0.00% | E, M, N, S1, S2 | 71.43% |
| ( | 5 | 27.69% | 54.96% | 0.00% | 0.00% | 13.08% | 0.74 | 0.72 | 0.51 | 0.00% | 20.00% | 0.00% | N, S2 | 100.00% |
| ( | 5 | 25.46% | 47.92% | 0.04% | 0.00% | 13.14% | 0.90 | 0.58 | 0.49 | 60.00% | 20.00% | 0.00% | S1, S2 | 0.00% |
| ( | 20 | 20.78% | 35.12% | 0.04% | 0.00% | 3.36% | 0.96 | 0.62 | 0.44 | 15.00% | 35.00% | 5.00% | S1, S2 | 0.00% |
| ( | 10 | 19.15% | 28.40% | 0.96% | 0.00% | 4.79% | 0.92 | 0.27 | 0.24 | 60.00% | 70.00% | 0.00% | E | 30.00% |
| ( | 3 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00 | 0.00 | 0.00 | 66.67% | 100.00% | 0.00% | S1 | 0.00% |
S-protein includes all possible S-protein peptides of lengths 8–10 (MHC class I) and 13–25 (MHC class II). Nonredundant peptide sets are a result of OptiVax analysis of nonredundant displayed peptides. The table is sorted by EvalVax-Robust p(n ≥ 1). Random subsets are generated 200 times. The binders used for generating random subsets are defined as peptides that are predicted to bind with affinity ≤ 50 nM to more than 5 of the alleles.
Figure 3OptiVax-Robust-Designed Peptide Vaccine Using Peptides for (A) MHC class I and (B) MHC class II from the SARS-CoV-2 S, M, and N Proteins Only
(a) EvalVax-Robust population coverage at different minimum number of peptide-HLA hit cutoffs for populations self-reporting as having White, Black, or Asian ancestry and average values.
(b) EvalVax-Unlinked population coverage on 15 geographic regions and averaged population coverage.
(c) Binding of vaccine peptides to each of the available alleles in MHC I and II.
(d) Peptide viral protein origins.
(e) Distribution of the number of per-individual peptide-HLA hits in populations self-reporting as having White, Black, or Asian ancestry.
(f) Vaccine peptide presence in SARS-CoV.
Figure 4OptiVax-Unlinked Selected SARS-CoV-2 Optimal Peptide Vaccine Sets for (A) MHC Class I and (B) MHC Class II
(a) EvalVax-Robust population coverage at different per-individual numbers of peptide-HLA hits cutoffs for populations self-reporting as having White, Black, or Asian ancestry and average value.
(b) EvalVax-Unlinked population coverage on 15 geographic regions and averaged population coverage.
(c) Binding of vaccine peptides to each of the available alleles in MHC I and II.
(d) Peptide viral protein origins.
(e) Distribution of the number of per-individual peptide-HLA hits in populations self-reporting as having White, Black, or Asian ancestry.
(f) Vaccine peptide presence in SARS-CoV.
Figure 5EvalVax Population Coverage Evaluation, Expectation of Per-Individual Number of Peptide-HLA Hits and Normalized Coverage for MHC Class I SARS-CoV-2 Vaccines
(A) EvalVax population coverage for OptiVax-Unlinked and OptiVax-Robust proposed vaccine at different vaccine sizes.
(B) EvalVax-Robust population coverage with peptide-HLA hits per individual, OptiVax-Robust performance is shown by the blue curve and baseline performance is shown by red crosses (labeled by name of first author).
(C) EvalVax-Robust population coverage with peptide-HLA hits.
(D) EvalVax-Robust population coverage with peptide-HLA hits.
(E) Expected number of peptide-HLA hits vs. peptide vaccine size for OptiVax-Robust and OptiVax-Unlinked, and normalized coverage (hits divided by vaccine size) at different vaccine size.
(F) Comparison of OptiVax-Robust and baselines on expected number of peptide-HLA hits. OptiVax-Robust performance is shown by the blue curve and baseline performance is shown by red crosses.
(G) Comparison between OptiVax-Robust and baselines on normalized coverage.
Figure 6EvalVax Population Coverage Evaluation, Expectation of Per-Individual Number of Peptide-HLA Hits and Normalized Coverage for MHC Class II SARS-CoV-2 Vaccines
(A) EvalVax population coverage for OptiVax-Unlinked and OptiVax-Robust proposed vaccine at different vaccine sizes.
(B) EvalVax-Robust population coverage with peptide-HLA hits per individual, OptiVax-Robust performance is shown by the blue curve and baseline performance is shown by red crosses (labeled by name of first author).
(C) EvalVax-Robust population coverage with peptide-HLA hits.
(D) EvalVax-Robust population coverage with peptide-HLA hits.
(E) Expected number of peptide-HLA hits versus peptide vaccine size for OptiVax-Robust and OptiVax-Unlinked, and normalized coverage (hits divided by vaccine size) at different vaccine size.
(F) Comparison of OptiVax-Robust and baselines on expected number of peptide-HLA hits. OptiVax-Robust performance is shown by the blue curve and baseline performance is shown by red crosses.
(G) Comparison between OptiVax-Robust and baselines on normalized coverage.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| HLA haplotype population frequency data | This paper, Mendeley Data | Mendeley Data: |
| SARS-CoV-2 vaccine designs | This paper | |
| SARS-CoV-2 proteome | GISAID ( | Sequence entry Wuhan/IPBCAMS-WH-01/2019 |
| SARS-CoV proteome | UniProt ( | UniProt: UP000000354 (Proteome ID) |
| HLA population frequency data | dbMHC, as obtained from the IEDB Population Coverage Tool download ( | |
| Human proteome | UniProt ( | UniProt: UP000005640 (Proteome ID) |
| SARS-CoV-2 experimental peptide stability data (Immunitrack) | ( | Data S1. COVID19-Intavis-Immunitrack-dataset: |
| SARS-CoV-2 cleavage regions: ORF1a and ORF1b | UniProt ( | UniProt: P0DTD1; |
| SARS-CoV-2 cleavage regions: Spike (S) | ( | Figure 1 |
| Additional SARS-CoV-2 proteomes for mutation analysis | GISAID ( | Acknowledgements and detailed GISAID accessions in |
| Experimental data of Spike N-glycosylation: Cryo-EM | ( | Table 2 |
| Experimental data of Spike N-glycosylation: tandem mass spectrometry | ( | |
| Baseline vaccine MHC I: ( | ( | Figure 2 |
| Baseline vaccine MHC I: ( | ( | Data S1; Table 6 |
| Baseline vaccine MHC I: ( | ( | Table 4 |
| Baseline vaccine MHC I: ( | ( | Table 1 |
| Baseline vaccine MHC I: ( | ( | Table 2 |
| Baseline vaccine MHC I: ( | ( | Table S5 |
| Baseline vaccine MHC I: ( | ( | Table 5 |
| Baseline vaccine MHC I: ( | ( | Table 2 |
| Baseline vaccine MHC I: ( | ( | Table 4 |
| Baseline vaccine MHC I: ( | ( | Table 2 |
| Baseline vaccine MHC I: ( | ( | Table 2 |
| Baseline vaccine MHC I: ( | ( | Table 2 |
| Baseline vaccine MHC I: ( | ( | Table 2 |
| Baseline vaccine MHC I: ( | ( | Table 1 |
| Baseline vaccine MHC I: ( | ( | Table 4a |
| Baseline vaccine MHC I: ( | ( | Sub-Section 2 in Results of the Main Text |
| Baseline vaccine MHC I: ( | ( | Table 1(C) |
| Baseline vaccine MHC I: ( | ( | Tables S1 and S2 |
| Baseline vaccine MHC I: ( | ( | Table S5 (65 33-mers, to which we applied sliding windows of lengths 8–10 to obtain the peptide set considered for MHC class I) |
| Baseline vaccine MHC I: ( | ( | Table 1 (peptides created by sliding windows of length 8–10) |
| Baseline vaccine MHC I: ( | ( | Section 3.2 "MEPVC Designing" (peptides created by sliding windows of length 8–10) |
| Baseline vaccine MHC II: ( | ( | Table S2 - "Unique Mean HBA T-Cell Epitopes" for each protein Subunit |
| Baseline vaccine MHC II: ( | ( | Table 2 |
| Baseline vaccine MHC II: ( | ( | Table 3 |
| Baseline vaccine MHC II: ( | ( | Table 2 |
| Baseline vaccine MHC II: ( | ( | Table S7 |
| Baseline vaccine MHC II: ( | ( | Table 2 |
| Baseline vaccine MHC II: ( | ( | Table 2 |
| Baseline vaccine MHC II: ( | ( | Table 1 |
| Baseline vaccine MHC II: ( | ( | Table 1(B) |
| Baseline vaccine MHC II: ( | ( | Table 3 |
| Baseline vaccine MHC II: ( | ( | Table 6 |
| Baseline vaccine MHC II: ( | ( | Table 2 |
| Baseline vaccine MHC II: ( | ( | Table S5 (65 33-mers, to which we applied sliding windows of lengths 13–25 to obtain the peptide set considered for MHC class II) |
| Baseline vaccine MHC II: ( | ( | Table 1 (peptides created by sliding windows of length 13–25) |
| Megapool MHC I: ( | ( | Table S6 |
| Megapool MHC II: ( | ( | Table S3 |
| OptiVax | This paper, GitHub | |
| EvalVax | This paper, GitHub | |
| NetMHCpan-4.0 | ( | |
| NetMHCpan-4.1 | ( | |
| NetMHCIIpan-4.0 | ( | |
| NetMHCIIpan-3.2 | ( | |
| MHCflurry 1.6.0 | ( | Version 1.6.0, |
| PUFFIN | ( | GitHub commit a63f6c563b7e2f7b04eac28 |
| Hapferret | GitHub commit e2381dc567cec97373acb | |
| Nextstrain | ( | GitHub commit 639c63f25e0bf30c900f8d |
| NetNGlyc | ( | |