| Literature DB >> 34009334 |
Edison Ong1, Michael F Cooke2,3, Anthony Huffman1, Zuoshuang Xiang4, Mei U Wong4, Haihe Wang5, Meenakshi Seetharaman3, Ninotchka Valdez3, Yongqun He1,4,6.
Abstract
Vaccination is one of the most significant inventions in medicine. Reverse vaccinology (RV) is a state-of-the-art technique to predict vaccine candidates from pathogen's genome(s). To promote vaccine development, we updated Vaxign2, the first web-based vaccine design program using reverse vaccinology with machine learning. Vaxign2 is a comprehensive web server for rational vaccine design, consisting of predictive and computational workflow components. The predictive part includes the original Vaxign filtering-based method and a new machine learning-based method, Vaxign-ML. The benchmarking results using a validation dataset showed that Vaxign-ML had superior prediction performance compared to other RV tools. Besides the prediction component, Vaxign2 implemented various post-prediction analyses to significantly enhance users' capability to refine the prediction results based on different vaccine design rationales and considerably reduce user time to analyze the Vaxign/Vaxign-ML prediction results. Users provide proteome sequences as input data, select candidates based on Vaxign outputs and Vaxign-ML scores, and perform post-prediction analysis. Vaxign2 also includes precomputed results from approximately 1 million proteins in 398 proteomes of 36 pathogens. As a demonstration, Vaxign2 was used to effectively analyse SARS-CoV-2, the coronavirus causing COVID-19. The comprehensive framework of Vaxign2 can support better and more rational vaccine design. Vaxign2 is publicly accessible at http://www.violinet.org/vaxign2.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34009334 PMCID: PMC8218197 DOI: 10.1093/nar/gkab279
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The overall workflow of Vaxign2. Users provide the input data in the form of pathogen protein or proteome (blue box). Then the users can select Vaxign2 options in the web interface and submit the prediction query (yellow boxes). A Vaxign2 summary page will display the Vaxign-ML scores, and users can perform post-prediction analysis on the selected protein (green boxes).
Benchmarking performance of Vaxign and Vaxign-ML comparing to other open-source reverse vaccinology tools
| Tools | Recall | Precision | WF1 | MCC |
|---|---|---|---|---|
|
| 0.81 | 0.75 | 0.76 | 0.51 |
|
| 0.32 | 0.79 | 0.56 | 0.27 |
|
| 0.78 | 0.71 | 0.71 | 0.42 |
|
| 0.5 | 0.52 | 0.49 | -0.02 |
Abbreviation: WF1 = weighted F1 score. MCC = Matthew's correlation coefficient.
Vaxign2 pre-computed queries with at least 10 proteomes. Full list can be found in Supplemntal Table S1
| Pathogen name | # of Proteome | # of proteins |
|---|---|---|
|
| 53 | 105 632 |
|
| 52 | 5104 |
|
| 35 | 131 070 |
|
| 33 | 86 662 |
|
| 31 | 98 888 |
|
| 23 | 104 009 |
|
| 22 | 50 267 |
|
| 15 | 64 073 |
|
| 14 | 33 665 |
|
| 13 | 48 849 |
|
| 11 | 53 932 |
|
| 10 | 17 445 |
|
| 10 | 35 130 |
|
|
|
|
Figure 2.Dynamic analysis of SARS-CoV-2 S protein in Vaxign2. (A) The protein accession number of S protein was used as the input, together with the selection of specified parameters. (B) The basic analysis results were provided for the S protein. (C) Vaxitop predicted human MHC-I & -II epitopes and users could select the result based on different MHC Classes, MHC Alleles and epitope length. (D) Population coverage of S protein's predicted epitopes was computed using the MHC-I & -II reference alleles for the general population of each country. Note that some countries with low predicted population coverage might not reflect the actual population coverage due to the lack of reported allele frequencies in the Allele Frequency Net Database (36). (E) Vaxign2 searched the IEDB Epitope database to provide a list of experimentally verified epitopes for both B and T cells. (F, G) EGGNOG was used as a database to identify matching functions, Gene Ontology terms, and known orthologs to facilitate rational vaccine antigen selection.
Figure 3.Comparison of multiple coronavirus strains for uniquely conserved strains. (A) Query for SARS-CoV-2 proteins that share orthologs in SARS-CoV and MERS-CoV but not in four other human coronaviruses and one murine coronavirus strain. (B) The results of seven proteins including nsp8 predicted as a protective antigen and three proteins (nsp8–10) as adhesin proteins. (C) Selection of nsp8 for further analysis. (D) The result of nsp8’s genome group ortholog phylogeny.