| Literature DB >> 21067549 |
Johannes Söllner1, Andreas Heinzel, Georg Summer, Raul Fechete, Laszlo Stipkovits, Susan Szathmary, Bernd Mayer.
Abstract
BACKGROUND: The last years have seen a renaissance of the vaccine area, driven by clinical needs in infectious diseases but also chronic diseases such as cancer and autoimmune disorders. Equally important are technological improvements involving nano-scale delivery platforms as well as third generation adjuvants. In parallel immunoinformatics routines have reached essential maturity for supporting central aspects in vaccinology going beyond prediction of antigenic determinants. On this basis computational vaccinology has emerged as a discipline aimed at ab-initio rational vaccine design.Here we present a computational workflow for implementing computational vaccinology covering aspects from vaccine target identification to functional characterization and epitope selection supported by a Systems Biology assessment of central aspects in host-pathogen interaction. We exemplify the procedures for Epstein Barr Virus (EBV), a clinically relevant pathogen causing chronic infection and suspected of triggering malignancies and autoimmune disorders.Entities:
Year: 2010 PMID: 21067549 PMCID: PMC2981879 DOI: 10.1186/1745-7580-6-S2-S7
Source DB: PubMed Journal: Immunome Res ISSN: 1745-7580
Figure 1Prototypic representation of a (computational) vaccine design workflow. The scheme spans the entire pre-clinical project life cycle from concept phase, determination of vaccine targets, further to detailed epitope analysis, formulation, and experimental validation. The entire process is optimally embedded in an integrated data and knowledge management framework.
Association between autoimmune diseases, neoplasms and pathogens as found by NCBI MeSH.
| pathogen | autoimmune disease | # co-occurrence | pathogen | neoplasm | # co-occurrence |
|---|---|---|---|---|---|
| Measles virus | Multiple Sclerosis | 384 | Papillomaviridae | Uterine Cervical Neoplasms | 5854 |
| Arthritis, Rheumatoid | 279 | Helicobacter pylori | Stomach Neoplasms | 3231 | |
| Campylobacter jejuni | Guillain-Barre Syndrome | 167 | Papillomaviridae | Carcinoma, Squamous Cell | 2252 |
| Enterovirus B, Human | Diabetes Mellitus, Type 1 | 164 | Papillomaviridae | Cervical Intraepithelial Neoplasia | 1922 |
| Campylobacter jejuni | Polyradiculoneuropathy | 150 | Cell Transformation, Viral | 1899 | |
| Multiple Sclerosis | 136 | Burkitt Lymphoma | 1853 | ||
| Lupus Erythematosus, Systemic | 126 | Hepatitis B virus | Liver Neoplasms | 1673 | |
| Helicobacter pylori | Purpura, Thrombocytopenic, Idiopathic | 124 | Hepatitis B virus | Carcinoma, Hepatocellular | 1615 |
| Theilovirus | Multiple Sclerosis | 118 | Nasopharyngeal Neoplasms | 1518 | |
| Herpesvirus 6, Human | Multiple Sclerosis | 113 | Herpesvirus 8, Human | Sarcoma, Kaposi | 1391 |
| Mycobacterium tuberculosis | Arthritis, Rheumatoid | 110 | Hepacivirus | Carcinoma, Hepatocellular | 1005 |
| Escherichia coli | Arthritis, Rheumatoid | 101 | Hepacivirus | Liver Neoplasms | 996 |
| Chlamydophila pneumoniae | Multiple Sclerosis | 74 | Hodgkin Disease | 975 | |
| Streptococcus pyogenes | Arthritis, Rheumatoid | 61 | Simian virus 40 | Cell Transformation, Viral | 860 |
| Human T-lymphotropic virus 1 | Multiple Sclerosis | 61 | Lymphoma | 753 | |
| Lymphocytic choriomeningitis virus | Diabetes Mellitus, Type 1 | 61 | Oncogenic Viruses | Neoplasms | 720 |
| Rubella virus | Multiple Sclerosis | 61 | Helicobacter pylori | Lymphoma, B-Cell, Marginal Zone | 704 |
Table 1 lists the most frequent co-occurrences found between disease (autoimmune disorders and malignancies) and pathogen terms according to MeSH term mining of scientific literature given in Medline. HHV4 (EBV) entries are given in bold.
Figure 2Schematic overview of the relation between the Taverna Workbench, Taverna Remote Execution Server, pBone and pView: Workflow templates are generated in the Taverna Workbench and imported into the pBone system via pView. pView then acts as control element for project creation/extension, data import and processing in pBone. Individual components of pBone are set into relation including dependency on Taverna Remote Execution Server for processing of Taverna workflows.
Figure 3A typical workflow situation utilizing pView. On top a multiple sequence alignment is provided. For the sequences various single sequence profiles are given in separate windows further including a 3D model of the protein of interest.
Figure 4Presented is an overview of the epitope mapping process. The upper part depicts a subgraph comprising shortest paths between known epitopes and EBV gp110. The lower part of the figure shows the first 300 positions of a multiple secondary structure alignment of homologous envelope glycoproteins of EBV, HHV-5, HHV-1 and EHV-2. To improve readability secondary structures are color coded (helical areas in red, beta sheets in green, coils in blue, signal peptides in yellow and gaps in grey). The black strands above the multiple alignment mark possible mapping positions with respect to their position on the gp110 protein of EBV which are connected to their predecessor in the shortest path.
Figure 5Representation of heterogeneous data in a network context. Provided is a subgraph encoding information available for EBV homology data enriched by IEDB object types, relationships and content. Red nodes represent EBV proteins from completely sequenced proteomes which are linked to IEDB data. Turquoise nodes represent proteins listed in the IEDB, orange triangles represent scientific publications, blue diamonds represent peptide epitopes, and green diamonds encode experimental assays.
Figure 6Selection of vertices and edges from the EBV-human interaction graph centered around differentially regulated CD9. EBV proteins are shown in red, human proteins in green. Solid lines indicate physical interactions, dashed lines omicsNET connections. Red, blue and green edges indicate EBV-EBV, EBV-human and human-human interactions, respectively. Human genes significantly differentially regulated upon infection/reactivation are shown as hexagon.
Figure 7Structural alignment of gp350 (PDB entry 2H6O, turquoise) and EBV type 1 gp350 N-terminal domain model (green). Mutations differentiating the two proteins are highlighted by red spheres. Glycosylations (which are part of the 2H6O structure) are drawn in blue and indicate which residues may not be directly accessible to antibodies. Mutations are located outside the CD21 interacting region which is non-glycosylated and has been implicated in neutralizing immunity. The arrow indicates the CD21/gp350 interface.
Figure 8The optimized model for the EBV type 1 gp110 protein is given. Monomers are drawn in green, red and blue. The arrow indicates one of the large coils added by homology modeling. The lower part is close to the viral membrane while the stem and head extend into the solvent and are free for molecular interaction. Potential glycosylations were not further considered.
Figure 9Representation of the gp110 putative trimer surface and secondary structure cartoons in lateral and top view (left to right). Monomers are drawn in green, violet and cyan. Regions covered by predicted, potentially neutralizing epitiopes are shown in blue, residues predicted to be glycosylated are given in brown. Areas coded in red were experimentally shown to be neutralizing in homologous proteins of other herpesviruses, while areas coded in orange were additionally predicted as epitopes. The orange spot at the stem of the molecule indicates the terminus of a neutralizing epitope close to the N-terminus of the protein (unfortunately only partially resolved in the structure model).
Repeats identified by RADAR.
| start | stop | peptide | variant |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 2 | |||
| 2 | |||
| 3 |
Table 2 lists residues dissimilar to peptide sequence variant (1) in bold, identical repeats are considered as one variant.
Metrics for estimating the quality of neutralising epitope prediction.
| protein | baseline | TP | FP | TN | FN | sensitivity | specificity | accuracy | precision | MCC |
|---|---|---|---|---|---|---|---|---|---|---|
| gp110 | 5.72% | 26 | 141 | 667 | 23 | 0.53 | 0.83 | 0.81 | 0.16 | 0.21 |
| gp350 (no repeat) | 8.05% | 47 | 224 | 615 | 26 | 0.64 | 0.73 | 0.73 | 0.17 | 0.22 |
| gp350 (single repeat) | 8.05% | 59 | 224 | 615 | 14 | 0.81 | 0.73 | 0.74 | 0.21 | 0.32 |
| gp350 | 12.35% | 95 | 224 | 576 | 17 | 0.85 | 0.72 | 0.74 | 0.3 | 0.39 |
Table 3 provides TP (True Positives), FP (False Positives), TN (True Negatives), FN (False Negatives), MCC (Matthews correlation coefficient) for selected sequences. The baseline value corresponds to the percentage of residues belonging to neutralizing epitopes. For gp350 tree entries have been added, depending whether a single copy of ‘VTTPTPNATSPT’ is interpreted as False Negatives (FN), namely i) no repeats are considered as predicted, ii) a single repeat is accepted as predicted, or iii) all four copies are accepted as predicted (default).
Peptide coverage of neutralizing epitopes.
| protein | baseline | number of peptides selected | percentage (%) of protein covered | number of potentially neutralizing peptides selected | coverage of known neutralizing epitopes |
|---|---|---|---|---|---|
| gp110 | 5.72% | 12 | 19.49% | 2 | 2/3 = 66% |
| gp350 | 12.35% | 20 | 35.17% | 5 | 4/4 = 100% |
Table 4 gives the number of selected peptides (and percentage of entire protein covered by these) in combination with the percentage of residues belonging to neutralizing epitopes and coverage of known neutralizing epitopes as an indicator of selection effectiveness for gp110 and gp350. A definition of the baseline is given in Table 3.
HLA alleles particularly indicated for coverage of ethnicities in selected equatorial African countries.
| HLA allele | HLA supertype | # of countries | prevalence | score | Cumulative population coverage |
|---|---|---|---|---|---|
| B5802 | B58 | 2 | 10.3 | 20.6 | 12.34 |
| B1503 | B27 | 2 | 7.4 | 14.8 | 22.56 |
| B5301 | B07 | 3 | 4.9 | 14.7 | 31.93 |
| B4901 | Unclassified | 3 | 4.6 | 13.8 | 35.77 |
| B4201 | B07 | 2 | 6 | 12 | 42.23 |
| B4501 | B44 | 3 | 3.8 | 11.4 | 47.41 |
| B1510 | B27 | 2 | 4.25 | 8.5 | 50.57 |
| B1402 | B27 | 2 | 3.95 | 7.9 | 52.84 |
| B3501 | B07 | 3 | 2.2 | 6.6 | 55.8 |
| B8101 | B07 | 2 | 2.6 | 5.2 | 58.49 |
| B5703 | B58 | 2 | 2.5 | 5 | 59.5 |
| B5801 | B58 | 2 | 2.3 | 4.6 | 64.57 |
| A7401 | A03 | 2 | 2.1 | 4.2 | 66.35 |
| B0702 | B07 | 2 | 1.9 | 3.8 | 68.49 |
| B4101 | B44 | 1 | 2.7 | 2.7 | 69.35 |
| A2301 | A24 | 1 | 2.5 | 2.5 | 70.97 |
Table 5 lists allele data for equatorial Africa as obtained from a publicly accessible allele frequency database. Overall frequencies are biased towards HLA-B alleles. ‘Countries’ indicates the number of countries with a substantial sub-population, ‘Prevalence’ gives the median of allele prevalence in available datasets, ‘Score’ provides the product of Countries and Prevalence, ‘Cumulative Population Coverage’ resembles the cumulative average coverage of populations in Kenya, Uganda and Rwanda by adding individual alleles in order of their rank.
Figure 10A snapshot for visualizing T-cell antigenicity of the N-terminal LMP2A. Data from numerous prediction methods were integrated and visualized in form of an HTML table using a Perl framework. Rows contain aligned EBV sequences; colors indicate degree of antigenicity for a particular allele. The snapshot was selected for three spots of potential HLA-B3501 antigenicity. Bars below the alignments indicate (in this order) allele, start position of the ligand, as well as minimum and maximum IC50 in nM of nanomer peptides. Red indicates high affinity ligands (IC50 in nM around 1), blue indicates low affinity ligands (IC50 in nM around 500).
Listing of top-ranked peptides for LMP2A Conservancy Constrained T-cell Epitope Cluster (CCTEC) analysis.
| start | stop | HLA count | alleles |
|---|---|---|---|
| 177 | 196 | 5 | A2301(1), B3501(2), B7(1), B27(1), A24(1) |
| 178 | 197 | 5 | A2301(1), B3501(2), B7(1), B27(1), A24(1) |
| 179 | 198 | 5 | A2301(1), B3501(2), B7(1), B27(1), A24(1) |
| 230 | 249 | 5 | A2301(1), B3501(1), B7(1), B27(2), A24(1) |
| 236 | 255 | 5 | A2301(1), B3501(1), B7(1), B27(4), A24(1) |
| 237 | 256 | 5 | A2301(1), B3501(1), B7(1), B27(4), A24(1) |
| 243 | 262 | 5 | A2301(1), B3501(1), B27(1), A24(1), B1402(1) |
| 349 | 368 | 5 | A2301(1), B1402(1), B7(1), B27(2), A24(1) |
| 350 | 369 | 5 | A2301(1), B1402(2), B7(1), B27(2), A24(1) |
| 351 | 370 | 5 | A2301(2), B1402(2), B7(1), B27(2), A24(2) |
| 352 | 371 | 5 | A2301(2), B1402(2), B7(1), B27(2), A24(2) |
| 353 | 372 | 5 | A2301(2), B1402(2), B7(1), B27(2), A24(2) |
| 354 | 373 | 5 | A2301(2), B1402(2), B7(1), B27(2), A24(2) |
| 355 | 374 | 5 | A2301(2), B1402(2), B7(1), B27(2), A24(2) |
| 120 | 139 | 4 | A2301(1), B3501(1), B7(1), A24(1) |
| 121 | 140 | 4 | A2301(1), B3501(1), B7(1), A24(1) |
| 122 | 141 | 4 | A2301(1), B3501(1), B7(1), A24(1) |
| 123 | 142 | 4 | A2301(1), B3501(1), B7(1), A24(1) |
| 124 | 143 | 4 | A2301(1), B3501(1), B7(2), A24(1) |
| 125 | 144 | 4 | A2301(2), B3501(1), B7(2), A24(2) |
Table 6 provides the top 20 peptides from conservancy constrained class I T-cell epitope cluster scan ranked by number of covered HLA alleles. ‘Start’ and ‘Stop’ indicate start and stop residues of the proposed region on the alignment and need therefore not be congruent with positions on the reference sequence. ‘HLA count’ indicates how many different class I HLA alleles and supertypes are covered including the alleles HLA-A*2301, HLA-B*3501, HLA-B*1402 and supertypes B7, B27, A24. Numbers in parenthesis indicate how many ligands were identified within the screening window (20 amino acids) per allele/supertype.
Figure 11Color coded display of variability of antigenicity comparing virus isolates in a region associated with an experimentally determined epitope. NetCTL (left block) and NetMHC (right block) predictions for supertype A24 and specifically HLA-A2301 are shown, respectively. The area represented is centered around known A2301/supertype A24 ligand peptide ‘PYLFWLAA’ starting at alignment position 137. Sequences in the alignments are in the same order as in Figure 10, where the first four and last two sequences are not of EBV origin.