Literature DB >> 27832200

A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer.

Sudheer Gupta1, Kumardeep Chaudhary1, Sandeep Kumar Dhanda1, Rahul Kumar1, Shailesh Kumar1, Manika Sehgal1, Gandharva Nagpal1, Gajendra P S Raghava1.   

Abstract

Due to advancement in sequencing technology, genomes of thousands of cancer tissues or cell-lines have been sequenced. Identification of cancer-specific epitopes or neoepitopes from cancer genomes is one of the major challenges in the field of immunotherapy or vaccine development. This paper describes a platform Cancertope, developed for designing genome-based immunotherapy or vaccine against a cancer cell. Broadly, the integrated resources on this platform are apportioned into three precise sections. First section explains a cancer-specific database of neoepitopes generated from genome of 905 cancer cell lines. This database harbors wide range of epitopes (e.g., B-cell, CD8+ T-cell, HLA class I, HLA class II) against 60 cancer-specific vaccine antigens. Second section describes a partially personalized module developed for predicting potential neoepitopes against a user-specific cancer genome. Finally, we describe a fully personalized module developed for identification of neoepitopes from genomes of cancerous and healthy cells of a cancer-patient. In order to assist the scientific community, wide range of tools are incorporated in this platform that includes screening of epitopes against human reference proteome (http://www.imtech.res.in/raghava/cancertope/).

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27832200      PMCID: PMC5104390          DOI: 10.1371/journal.pone.0166372

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Worldwide, cancer is one of the most prominent cause of immature deaths every year [1]. In addition to millions of deaths each year, all countries are spending billions of dollars on treatment of cancer patients. In past, effective vaccines have been developed successfully against number of frightening diseases (e.g. small pox, polio); saving millions of lives. Subsequently, it is extremely important to develop effective vaccines against cancer to protect the human population from this awful disease. In this direction, researchers have got limited success in designing vaccine against cancers particularly against cancer-inducing viruses [2,3]. There are a number of hurdles in developing cancer vaccines that includes cross-reactivity, tolerance and insufficient immune response [4]. Similarly, the identification of mutations shared across wide range of cancer patients is also a challenge [5,6]. However, with advent of high throughput sequencing and assay techniques, different authors have made an attempt to investigate important shared mutations in various types of cancers [7,8]. Furthermore, in order to design a successful vaccine, it is important to identify cancer-specific antigens or antigenic regions that can induce immune system specifically against cancerous cells. These antigens and antigenic regions are called neoantigens and neoepitopes respectively. In past, number of experimental techniques has been developed to identify vaccine candidates (e.g., neoantigens, neoepitopes) for designing cancer vaccines [9,10]. Although there are reports of identification of vaccine candidates at genome scale, but the task is demanding because experimental techniques are costlier and time consuming with large amount of samples. In order to overcome the limitations of experimental techniques, numerous computational tools have been developed for designing vaccines or immunotherapy against cancer. Broadly, these computational tools can be divided in two categories: i) methods for predicting epitopes, and ii) prediction of potential vaccine candidates for cancer. In past, numerous direct or indirect epitope predictions have been developed for predicting antigenic regions that can activate B-cell, T-helper and cytotoxic T-cells [11,12]. In case of prediction of cancer vaccine targets, first cancer-specific regions are identified and then their immunogenic properties are predicted. Warren et al. (2010) identified mutated regions in antigens/proteins generated due to somatic mutations (missense, frame shift, insertion, and deletion) in human tumors [11]. They predicted HLA class I binders in these mutated regions and identified 159 potential vaccine candidates. Similarly, Khalili et al. (2012) predicted HLA-A and B binders in mutated region of 312 genes; generated due to missense mutations [13]. Brown et al. identified immunogenic mutations in the form of HLA class I binders from sequencing data of 515 patients [14]. In this study, authors endeavored to correlate the presence of immunogenic missense mutations with the survival of patients. Recently, Rajasagi et al. proposed 22 HLA class I binders generated from missense mutations through a developed pipeline for 91 chronic lymphocytic leukemias [15]. In most of the above studies, authors predicted only HLA class I binders or cytotoxic T-cell (CTL) epitopes. There are several computational tools for the prediction of HLA binding peptides and T-cell epitopes and B cell epitopes, which can be used for the prediction of immunogenic mutated regions in an antigen. However, there is a necessity for a streamlined computational tool that allows users to identify immunogenic mutations and the predicted cancer epitopes. One of the major limitations of existing computational tools for predicting cancer vaccine candidates is that they do not predict B-cell or T-helper epitopes. In addition, there is no specific computation resource for predicted cancer epitopes in user-specified genome. Aim of this study is complementing existing methods and to address unresolved issues. We analyzed mutational profile of 905-cancer cell lines and identified neoepitopes that can activate different arms of immune system. This information has been compiled in the form of a database so that the user can access cancer-specific epitopes for any cancer cell line. In addition, fully and partially personalized pipelines have been integrated in this database to facilitate scientific community. In brief, the study illustrates exclusive evaluation of immune epitopes on the mutational landscape of a large number of cancer cell lines (https://figshare.com/articles/CANCERTOPE_MUTATION_DATASET_txt/4176558) and eventually postulates a workbench, named Cancertope for designing neoepitope-based personalized vaccines/immunotherapies (http://crdd.osdd.net/raghava/cancertope/).

Results

Analysis of Vaccine Targets

The current study is based on 60 vaccine candidates, 26 reported from the analysis of NGS data from CCLE database [16] and remaining 34 candidates from CanProVar [17] based on their association with cancer. The 26 genes (vaccine candidates) were selected from CCLE as they frequently mutate in different types of cell lines (see Methods section). The distribution and types of mutations were then analyzed in vaccine candidates, which further depicted the prominence of missense mutation type (Fig 1). Similarly, the frame shift mutations in a few key genes like PRKDC, RECQL4, PDE4DIP, and CTBP2 were found in harmony with a large number of cell lines. Also, the in-frame insertions and deletions were very profound in genes like AKAP12, NR1H2, GPR112, and MAP3K1. All these genes in the study are being referred to as cancer sensitive genes since they possess higher probability to be associated with cancer on encountering mutations. In other words, a gene is called cancer-sensitive, if the mutations in that gene have high propensity of being cancer-associated.
Fig 1

Frequency and type of mutations reported for each vaccine candidate.

Each numerical value representing the number of mutations across different cell lines in a vaccine candidate, for instance, vaccine target PRKDC has been mutated 842 times (frame shift insertions) in the different cell lines.

Frequency and type of mutations reported for each vaccine candidate.

Each numerical value representing the number of mutations across different cell lines in a vaccine candidate, for instance, vaccine target PRKDC has been mutated 842 times (frame shift insertions) in the different cell lines. Furthermore, Table 1 presents 34 vaccine targets possessing mutations that exhibit higher probability of transforming a normal cell into a cancerous cell as selected from CanProVar. Among these vaccine candidates, mutations in targets like PTEN [18], TP53 [18], BRAF [19], EGFR [20] and c-KIT [21,22] have already been reported in earlier studies to be highly carcinogenic and proposed to be targeted for intending immunotherapies. These analyses support our criteria of selection of generalized vaccine candidates. To further broaden the perspective of functional analysis, the cancer sensitive genes were compared with all other genes on the basis of their gene ontologies. The analyses uncovered interesting observations suggesting involvement of cancer sensitive proteins is somehow greater in the apoptotic processes, biological regulation, catalytic and binding activities as compared to the other proteins (Fig 2 and S1 Fig).
Table 1

Number of deleterious mutations (fD), polymorphism/neutral variants (fP) and cancer association (fD/fP) in each vaccine target.

TargetfDfPfD/fPFamily/subfamily of target/protein
PTEN3891389NA
TP5313537193.3P53_family
CTNNB11321132Beta-catenin_family
BRAF99199Protein_kinase_superfamily,_TKL_Ser/Thr_protein_kinase_family,_RAF_subfamily
NF274174NA
EGFR188362.7Protein_kinase_superfamily,_Tyr_protein_kinase_family,_EGF_receptor_subfamily
SMAD4107253.5Dwarfin/SMAD_family
VHL272645.3NA
KIT131343.7Protein_kinase_superfamily,_Tyr_protein_kinase_family,_CSF-1/PDGF_receptor_subfamily
PIK3CA174443.5PI3/PI4-kinase_family
NRAS36136Small_GTPase_superfamily,_Ras_family
MSH2103520.6DNA_mismatch_repair_MutS_family
GATA120120NA
MLH1118619.7DNA_mismatch_repair_MutL/HexB_family
FBXW767416.8NA
MEN149316.3NA
FGFR331215.5Protein_kinase_superfamily,_Tyr_protein_kinase_family,_Fibroblast_growth_factor_receptor_subfamily
TSHR46315.3G-protein_coupled_receptor_1_family,_FSH/LSH/TSH_subfamily
JAK240313.3Protein_kinase_superfamily,_Tyr_protein_kinase_family,_JAK_subfamily
RB1102812.8Retinoblastoma_protein_(RB)_family
PDGFRA35311.7Protein_kinase_superfamily,_Tyr_protein_kinase_family,_CSF-1/PDGF_receptor_subfamily
NF165610.8NA
FGFR243410.8Protein_kinase_superfamily,_Tyr_protein_kinase_family,_Fibroblast_growth_factor_receptor_subfamily
FLT33548.8Protein_kinase_superfamily,_Tyr_protein_kinase_family,_CSF-1/PDGF_receptor_subfamily
CDH16888.5NA
TNFAIP33147.8Peptidase_C64_family
CBL3047.5NA
RET5887.3Protein_kinase_superfamily,_Tyr_protein_kinase_family
MSH64085DNA_mismatch_repair_MutS_family
ERBB22964.8Protein_kinase_superfamily,_Tyr_protein_kinase_family,_EGF_receptor_subfamily
MET2354.6Protein_kinase_superfamily,_Tyr_protein_kinase_family
ABL12373.3Protein_kinase_superfamily,_Tyr_protein_kinase_family,_ABL_subfamily
ALK2793Protein_kinase_superfamily,_Tyr_protein_kinase_family,_Insulin_receptor_subfamily
ATM134522.6PI3/PI4-kinase_family,_ATM_subfamily
Fig 2

The functional characterization of cancer-sensitive and other proteins based on their gene ontologies.

Expression Analysis of Cancer Vaccine Candidates

As stated earlier, cancer vaccine candidates were selected on the basis of their mutation frequency in cancer cell lines and their level of association with cancer. Next, the expression profile of these genes was examined in all available cancer cell lines. As displayed in Table 2, most of the vaccine candidates were highly expressed in a large number of cell lines. Since, the attained expression data ranged from 2 to 15, the expression values were randomly divided into four bins for well-defined understanding and the genes with expression values > = 9 were anticipated as highly expressed genes. With this assumption, it was perceived that the candidate genes i.e. HSP90B1, MLH1, MSH6, PRKDC, MSH2, and AKAP9 are highly expressed in more than 700 cell lines.
Table 2

Expression analysis depicting number of cell lines with expression more than a given cutoff (e.g., 3, 7, 9) for each antigen.

Target>= 3>= 7>= 9>= 12
AAK1901000
ABL19019005950
AKAP129015092948
AKAP99019007232
ALK90123121
ALPK2901179921
ATM9018161650
BRAF90128210
CARD109015911090
CBL90175440
CDH19013582170
CHD19019016440
CREB3L29018494241
CTBP29018026380
CTNNB1901807600
EGFR901318210
ERBB29013683914
FBXW7NANANANA
FGFR2901164341
FGFR39018860
FLT390135273
FMN2901101220
GATA190119170
GPR112901000
HSP90B1901901897282
JAK29016871
KIT9011708212
MAML29014900
MAP3K1901393280
MAP3K49019006780
MEN1NANANANA
METNANANANA
MLH19018758612
MLL30000
MSH29018877310
MSH390155510
MSH69018928207
MYLK90147328626
MYST4NANANANA
NCOA39018251560
NF190126420
NF29011900
NR1H290116100
NRAS9018916754
PDE4DIP901198120
PDGFRA901115739
PIK3C2G9011420
PIK3CA901856890
PRKDC9019018153
PTEN9018343150
RB1901674260
RECQL4901804390
RET90152150
SMAD490157550
TNFAIP39015962529
TNRC6B90180380
TP53901597750
TSHR9012230
TTBK1901000
VHL90149950

For example, HSP90B1 has 282 cell lines having expression greater or equal to 12.

For example, HSP90B1 has 282 cell lines having expression greater or equal to 12.

Identification of Neopeptides

After scrutinizing 60 potential vaccine candidates, the next challenge was to identify cancer-specific regions/peptides in these vaccine candidates. Therefore, overlapping 9-mer peptides for each of the vaccine candidates (Table 3) were created and different filters were applied in order to identify cancer-specific peptides generated due to cancer-associated mutations. These filters refined the dataset by eliminating all those peptides whose identical sequence maps to the genome of healthy individuals. The criteria adopted for removing identical peptides focused on i) reference protein, 2) reference proteome, 3) 1000 Genomes-based variants of the same antigen and 4) 1000 Genomes-based proteomes. It was observed that the candidates such as TP53, MLL3, PDE4DIP, PRKDC and certain others have the highest number of unique neopeptides, not present in reference proteome or 1000 Genomes-based proteomes.
Table 3

Total number of generated neopeptides (9-mer peptides) in each vaccine candidate and number of neopeptides after applying different filters.

Vaccine CandidateTotal 9-merReference ProteinReference Proteome1000-Genome Proteomes
TP532589220422042204
MLL36570167116711670
PDE4DIP3468113011211013
PRKDC5269114911491149
TNRC6B2730905905886
AKAP94873974974938
ATM4016968968968
GPR1124061989989989
FMN22322850805797
NF13650819819810
MYST42705668668639
PTEN1148753753753
CTBP21550573573573
ALK2185573573557
MYLK2512615615596
ALPK22765603603603
AKAP122265491491430
MAML21536443443440
MAP3K42078482482480
PIK3CA1573513513513
SMAD41005461461460
RECQL41658458458431
PDGFRA1563482473461
MSH61839487487459
CHD12209507507490
PIK3C2G1881444426418
CDH11279405405405
EGFR1628426426426
FGFR31190390390380
MSH31489363363334
FBXW71111412412409
MET1813413413401
TNFAIP31139357357348
CTNNB11122349349349
RB11241321321321
RET1459353353353
NCOA31701305305305
KIT1280312312303
MLH11063315315306
MAP3K11835331330322
BRAF1106348348345
FLT31290305305295
FGFR21102288288288
ABL11400259259259
JAK21379255255255
MSH21198272272254
CARD101264241241232
TSHR1017261261261
ERBB21482235235232
CBL1123225225223
NRAS380199199189
MEN1804197197197
TTBK11483189189180
NF2798211211201
GATA1569164164164
HSP90B1980185185185
CREB3L2670158158158
NR1H2592139139138
AAK11080132132122
VHL297929292

The filters remove neoepitopes present in reference protein, human reference proteome and 1000 Genomes-based proteomes.

The filters remove neoepitopes present in reference protein, human reference proteome and 1000 Genomes-based proteomes.

Evaluating Neopeptides as Neoepitopes

The generated neopeptides in the study were further analyzed for their roles as neoepitopes, i.e. antigenic region of nine amino acids specifically found in cancer antigens that can substantially activate different arms of the human immune system. In order to identify neoepitopes, different prediction tools were used for estimation of distinct epitopes [23,24,25,26,27]. Among all the tissue of origins, cell lines were explored for tissue-specific neoepitopes. Most frequent (top 10) neoepitopes along with their immunological potential are shown in the S1 Table. Interestingly, “IRKQQQQQE” neoepitope, which was generated de novo because of mutation in NR1H2 protein, was frequently observed in hematopoietic, lung, kidney, biliary tract, CNS bone, ovary, pancreas, prostate and large intestine tissues related cell lines. Moreover, it also harbors B cell epitope and is a binder for MHC I, MHC II. Similarly, mutation in same gene and cell lines generated “QQQQQESQS” which is a B cell epitope. Furthermore, in case of solid tumors like large intestine, the total number of neoepitopes was the highest in MLL3 and PDE4DIP targets whereas for hematopoietic tumors, TP53 and PDE4DIP were found to have the highest number of neoepitopes (S2 Table). The analysis of 60 vaccine candidates provided 38 promiscuous epitopes that have the ability to induce all arms of the immune system (S3 Table). Additionally, there were interesting outcomes from each individual algorithm of our pipeline that has been complied in the resource. For example, PRKDC has 5 or more positive neoepitopes predicted using CTLPred and nHLAPred, which were present in more than 800 unique cell lines (S4 and S5 Tables). Also, there were more than 15 neopeptides found to be HLA class I binders (using ProPred1) from RECQL4 and PRKDC, which were present in more than 600 cell lines (S6 Table). Similarly, in case of HLA class II binders (ProPred), PDE4DIP has 7 or more neoepitopes (HLA class II), which were found in 184 cell lines (S7 Table). It was also found that there were 5 or more neoepitopes predicted to be positive using BCE from NR1H2, which were present in 868 cell linesrespectively (S8 Table).

Web-Based In Silico Platform

Based on the extensive evaluation of cancer neoepitopes, an in silico platform, Cancertope, has been developed for guiding subunit-based vaccine development, immunotherapies and other therapeutic interventions. The resource offers potential vaccine candidates and antigenic regions or epitopes, suitable for designing subunit vaccines against cancer. This web-based platform has been developed on LAMP system (Linux, Apache, MySQL, and PHP/Perl). The webserver has integrated following modules in the platform for providing valuable insights into personalized cancer immunotherapies.

Database of Neoepitopes

The database consists of the analyses carried out on 905 human cancer cell lines, where a large number of immunogenic (neoepitopes) and non-immunogenic neopeptides is reported. The mutation and immune epitope information of cancer vaccine targets has been compiled in the form of ‘Cancer-specific database’ (Fig 3). For governing the effective utilization of the database, a number of standard database tools have been integrated for easy searching, browsing and retrieval of data.
Fig 3

A general workflow exhibiting the overall concept of database section of Cancertope workbench.

Partially Personalized Module

This module allows user to identify potential neoepitopes for designing vaccine against a cancer cell line and tissue of a sample from their genomic data. The term partially personalized is used to describe a situation, where the query sequence (from cancer tissue of a sample) is compared with the human reference proteome in the absence of normal/healthy (from non-cancerous tissue) proteome of that particular individual. This module compares user-specified cancer proteome with reference proteome and identifies potential neoepitopes (Fig 4). The module allows the user to submit a single protein sequence, whole proteome or VCF file from whole genome sequencing. The server will provide output in the form of potential neoepitopes.
Fig 4

The personalized module of Cancertope workbench.

Fully Personalized Module

This module is designed for the identification of potential neoepitope-based vaccine candidates from proteomics data of cancerous and healthy tissues of a patient. User needs to provide protein or proteome of cancerous cells (or tissues) as well as of normal cells (healthy tissue) from the same individual (Fig 5). It will identify neopeptides and neoepitopes present in the proteome of cancer tissue but absent in proteome of healthy tissues. Like the partially personalized module, this module also allows the user to submit a pair of protein sequences, a pair of whole proteomes or VCF files from whole genome sequencing.
Fig 5

The fully personalized module of Cancertope workbench.

Advanced tools

This module provides two menus: i) Epitope Mapping for mapping experimentally validated epitopes, and ii) Cross-Reactivity for identification of cancer-specific peptides or neopeptides. ‘Epitope Mapping’ menu of Cancertope allows the user to identify antigenic regions in their protein sequence. In order to identify antigenic regions, we searched experimentally validated epitopes (e.g., B-cell, T-cell, HLA binders) present in major immunological databases like IEDB [28], MHCBN [29], BCIPEP [30]. ‘Cross-Reactivity’ menu is designed for removing neopeptides that are present specifically in cancer antigen submitted by the user and not in the human genome, in order to remove cross-reactive peptides. This ‘Cross-Reactivity’ menu expands the utility of the platform by allowing the user to search their antigen sequence against reference protein, human reference proteome and 1000 Genomes-based proteome.

Discussion

Although the field of personalized cancer vaccine design using patient’s genomics data is in very primitive stages, the approach adopted for developing Cancertope suggests clinical as well as diagnostic potential. Since ages, cancer immunotherapy and vaccine development are being practiced as effective measures of therapeutic interventions. In 1999, Brossart et al. proved the potential implication of HLA-A2 restricted peptides in cancer therapies [31]. Although substantial growth in understanding of cancer induced by viruses such as papilloma virus and hepatitis B virus is achieved, but till date there is no significant success in the development of vaccines against these cancers. The difficulty in developing these vaccines is tolerance against self-antigens, risk of autoimmunity and heterogeneity in genomics of different cancers [32,33]. Cancertope provides well-defined filters that possess great significance in terms of cross reactivity by eliminating epitopes located in reference protein, human reference proteomeand 1000 Genomes-based proteomes. Thus, the provided filters assist in combating the pertaining concern of autoimmunity thus specifically activating immune system against cancer. The use of cancer cell lines for immunological studies may be critical, since in absence of immunological pressure, the genomic profile of cancer cell lines may be ambiguous. However, this possibility has been ruled out by the correlation analysis preformed by CCLE study where the genomic similarities by lineage between CCLE cell lines and primary tumors from Tumorscape, expO, MILE and COSMIC data sets were inspected. The data from mutation frequencies in 17 lineages of CCLE and COSMIC primary tumor data revealed high correlation of these mutations with most of the lineages such as breast (r = 0.73), colorectal (r = 0.76), esophagus (r = 0.95), kidney (r = 0.85), liver (r = 0.64) and pancreas (r = 0.96). Since the mutational profile of cancer cell lines demonstrated significant correlation with patient tumor sample, therefore this sequence data was selected for the conducted immunological evaluation. The proposed vaccine candidates from Cancertope were highly expressed in most of the cell lines, which makes them suitable candidates because over expression is also considered as one of the prime criterion for developing cancer vaccines [34]. While, the immune epitope prediction tools used in this study were highly cited, published and accurate but still these prediction algorithms have their own limitations. Thus, the neoepitope/antigens should be experimentally validated before suggesting it for medical purpose. There are following major parameters which need to be tested to validate a neoepitope: (a) HLA binding of the peptide, (b) Display of the neoepitope on the tumor surface on MHC molecule (can be verified either by mass spectrometry or by using a T cell raised against the neoepitope), (c) Expression of the neoantigen in the tumor cells and (d) cross reactivity which means T cells against the peptide should not recognize the wild-type peptide. After considering these limitations, the applied strategy in the study will be beneficial for scientific community and pharmaceutical companies. The cancer genomics in combination with computational predictions and experimental validations of immune epitopes can be used for designing successful cancer vaccines for patients. A few commercialized agencies (http://neontherapeutics.com/, http://www.chordomafoundation.org/, http://www.vaccinogeninc.com/, http://gapvac.eu/ and http://www.epivax.com/) are already working in this direction. The Cancertope resource delivers extensive information on cancer specific mutations and investigates the immunogenic potential of neoepitopes by employing several prediction algorithms. The database section of Cancertope stipulates all the generalized vaccine candidates that can be validated thus gearing cancer research. Additionally, the module dispensing insights into personalized vaccines (partially- and fully-personalized) for newly sequenced genome operates on the genome annotation. The annotation and immune prediction pipeline further suggests most effective vaccine candidates for the queried sequencing data. The resource also features additional options for experimental epitope mapping and removal of cross-reactive candidates valuable for determining suitable vaccine candidates.

Conclusion

In summary, a web-based platform for predicting vaccine candidates effective against cancer is reported. The platform basically delivers two options to the users, i.e. database-specific and other being user-interactive prediction server. The database-specific service maintains neoepitopes examined in 905 cancer cell lines, which are key components for activating the immune system against cancer cell lines. Furthermore, the neoepitope-based database facilitates a demonstration for guiding the generation of neoepitopes against a tumor from its whole-genome. Although, the indicated cancer cell lines are correlated with patient tumor sample in genomic profiles yet the neoepitopes exemplified in our resource must be authorized experimentally before inclining them for clinical applications. For advancing the aim of personalized vaccine design against a patient or tissue-specific tumor, user-interactive interface has been designed by incorporating different modules. Under the user-interactive provision, server allows to identify cancer-specific epitopes against a tumor from its proteome/protein. In case, where user provides both healthy as well as tumor samples from the same patient, then the server’s personalized module identifies patient-specific potential neoepitopes. Further, these putative neoepitopes can then be targeted for designing vaccines and immunotherapies against cancer thus enabling personalized therapy in real life scenario. Although the prediction methods implemented in the Cancertope pipeline are highly accurate and cited by scientific community, the experimental validation and testing of parameters like HLA binding/expression of neoepitope, cross reactivity and T cell activation, is very important before going to clinical setup. However, the predicted vaccine candidates from Cancertope have higher potential to be experimentally authenticated because of their higher reported efficacies; consequently offering cost-effective, economical, timesaving and streamlined pipeline for acclaiming personalized cancer vaccines.

Methods

Source Data

The mutation profile of cancer cell lines was retrieved from Cancer Cell Line Encyclopedia (CCLE) [16] where MAF file was downloaded from data portal (http://www.broadinstitute.org/ccle/data/browseData). The selected dataset comprised the mutational profile of 1651 genes in 905 cell lines, where the variant filtration was done by exclusion of variants with low allelic fraction, common polymorphisms and putative neutral variants. Since the mutated protein sequences were not provided in CCLE database, the mutation profiles were mapped on to the reference cDNA sequences of each gene obtained from NCBI. Thereafter, the mutated cDNA of each gene was translated into mutant protein sequences. All the four types of mutations namely missense, frame shift, in-frame insertion and in-frame deletions were included in mutation profile.

Selection of Cancer Vaccine Antigens

This section specifies the application of CanProVar (Cancer Proteome Variation) [17] database for selecting cancer vaccine candidates based on their cancer sensitivity. The database consists of single amino acid alterations in the human proteome and contains cancer-specific variations (cancer-sensitive mutations) and non-cancer specific variations in different proteins. First, the frequency of cancer-associated mutations (fD) and frequency of non-cancer specific variations (fP) for each protein, was computed. With a criteria of fD/fP> = 2 and fD> = 20, a total of 52 proteins were selected. These criteria were applied to select highly cancer sensitive proteins. Out of 52 proteins, only 34 proteins were found concurrent to CCLE study. These 34 proteins were then used as potential vaccine antigens or candidates and subsequently subjected to analyses via PANTHER classification system [35] (http://www.pantherdb.org/) to understand the properties of these antigens. In addition, potential vaccine candidates were also identified from CCLE database based on their frequency of mutation. The mutational analysis revealed 26 proteins that were mutated in at least 10% (90 cell lines) of the cell lines. Finally, a total of 60 potential cancer vaccine candidates were obtained (34 cancer-associated antigens from CanProVar and 26 frequently mutated antigens from CCLE).

Generation of Neopeptides

The term neopeptide in this study is being referred to the 9-mer sequences (9 residues continuous stretch of peptide) that contain at least one cancer-associated mutation. The length of neopeptide (epitope) was fixed to nine residues as both HLA class I and class II binders have a binding core of nine residues [36,37]. In order to identify neopeptides in a vaccine antigen, following steps were practiced: i) generated all possible overlapping peptides in an antigen, ii) removed redundant peptides and iii) removed all those peptides mapping to human reference proteome. This strategy expedited the detection of peptides exclusively present in the proteome of cancer cell lines but absent in proteome of a healthy individual.

Pipeline for Predicting Immunogenicity

In order to estimate the immunogenicity of these neopeptides, a pipeline was established for prediction of different types of epitopes/binders. The pipeline integrated a number of algorithms for predicting diverse immune epitopes required for activating different arms of the immune system (CD4+ T cells, CD8+ T cells, B cells). The algorithms employed in the immune epitope prediction pipeline were preferred over other prevailing algorithms on the basis of availability in the standalone state. Moreover, the predictions from these algorithms have already been verified in a few experimental as well as in silico studies approving high accuracy and reliability of the softwares [38,39,40,41]. The immune epitope prediction can broadly be categorized into three categories.

CD8+ T Cell Epitopes

In past, a number of methods have been reported for predicting HLA class I binders including SYFPEITHI [42], NetMHC [43], ProPred1 [24], and nHLAPred [25]. In the present study, we used standalone version of ProPred1 and nHLAPred for predicting HLA class I binders; both the algorithms predict promiscuous HLA class I binders. While, ProPred1 is a matrix-based method that predicts HLA binding sites in an antigenic sequence for 47 HLA class I alleles and nHLAPred was developed for envisaging 67 HLA class I binders using machine learning techniques. In addition to HLA class I binders as potential CTL epitopes, we also used a direct method, CTLPred, for predicting CTL epitopes. The prediction via direct method is critical as it discriminates between T cell epitopes and non-epitope MHC binders whereas HLA binding prediction only predicts the MHC binders from antigenic sequences.

CD4+ T Cell Epitopes

Previously, a number of algorithms have been developed for predicting HLA class II binders such as ProPred [26], TEPITOPE [44] and NetMHCIIpan [45]. In this study, ProPred software has been used for predicting HLA class II binders. This software allows prediction of promiscuous HLA class II binders that can bind to a large number of alleles.

B Cell Epitopes

There are numerous methods such as BCEPred [46], CBtope [47], LBtope [27], Discotope [48], COBEpro [49] available for predicting B-cell epitopes. We employed a standalone version of LBtope software for the prediction of linear B-cell epitopes. In order to predict immune epitopes in the query submitted by user at run time, all the prediction tools were required in standalone form. All the standalone prediction tools chosen for the study were heavily cited and were published in journals of high repute. The prediction standalones were used at default thresholds and parameters as optimized by the original authors.

Proteome data

In this study, the reference proteome and reference gene sequences were obtained from FTP portal of NCBI (http://ftp.ncbi.nlm.nih.gov/refseq/). In addition, the 1000 Genomes-based proteomes were generated by annotation of 1000 Genomes’ VCF files (http://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/) through ANNOVAR package [50]. The mutated sequence generation was done as mentioned in the ‘Source data’ section above.

Expression data

The expression profile of 905 cancer cell lines was obtained from CCLE database (http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-36139/). In order to provide inclusive expression status of vaccine candidates, the number of cell lines with varying range of expressions were calculated; for instance > = 3 (GT3), > = 7 (GT7), > = 9 (GT9) and > = 12 (GT12); expression values ranging from 2–15.

The gene ontological information comprising of biological process, molecular function and cellular localization of cancer sensitive proteins.

(TIFF) Click here for additional data file.

The top ten most frequent neopeptides for each tissue.

For all the tissue of origin, most frequent neoepitopes were investigated and predicted for immune induction potential. (XLSX) Click here for additional data file.

Representation of the number of neopeptides present in every tissue type.

Each vaccine candidate is presented with number of unique neopeptide for each tissue of origin. (XLSX) Click here for additional data file.

List of promiscuous neoepitopes with immunological potential in the form of CTL epitope, MHC binders, number of alleles, and B cell epitope.

(XLSX) Click here for additional data file.

The number of cell lines having positive CTL epitopes in different range; for example PRKDC has 836 unique cell lines having total 5 or more unique CTL epitopes.

The yellow cells present the number of neo-epitopes (CTL). (XLSX) Click here for additional data file.

The number of cell lines having positive HLA I binders (ProPred1) in different range for example RECQL4 has 672 unique cell lines having total 15 or more unique HLA I binders.

The yellow cells present the number of neo-epitopes (HLA I). (XLSX) Click here for additional data file.

The number of cell lines having positive HLA I binders (nHLAPred) in different range for example PDE4DIP has 342 unique cell lines having total 7 or more unique HLA I binders.

The yellow cells present the number of neo-epitopes (HLA I). (XLSX) Click here for additional data file.

The number of cell lines having positive HLA II binders in different range for example PRKDC has 37 unique cell lines having total 5 or more unique HLA II binders.

The yellow cells present the number of neo-epitopes (HLA II). (XLSX) Click here for additional data file.

The number of cell lines having positive B cell epitopes in different range for example NR1H2 has 868 unique cell lines having total 5 or more unique B cell epitopes.

The yellow cells present the number of neo-epitopes (BCE). (XLSX) Click here for additional data file.
  50 in total

1.  Prediction of CTL epitopes using QM, SVM and ANN techniques.

Authors:  Manoj Bhasin; G P S Raghava
Journal:  Vaccine       Date:  2004-08-13       Impact factor: 3.641

2.  SYFPEITHI: database for searching and T-cell epitope prediction.

Authors:  Mathias M Schuler; Maria-Dorothea Nastke; Stefan Stevanovikć
Journal:  Methods Mol Biol       Date:  2007

3.  Comparison of experimental fine-mapping to in silico prediction results of HIV-1 epitopes reveals ongoing need for mapping experiments.

Authors:  Julia Roider; Tim Meissner; Franziska Kraut; Thomas Vollbrecht; Renate Stirner; Johannes R Bogner; Rika Draenert
Journal:  Immunology       Date:  2014-10       Impact factor: 7.397

4.  SF3B1 and other novel cancer genes in chronic lymphocytic leukemia.

Authors:  Lili Wang; Michael S Lawrence; Youzhong Wan; Petar Stojanov; Carrie Sougnez; Kristen Stevenson; Lillian Werner; Andrey Sivachenko; David S DeLuca; Li Zhang; Wandi Zhang; Alexander R Vartanov; Stacey M Fernandes; Natalie R Goldstein; Eric G Folco; Kristian Cibulskis; Bethany Tesar; Quinlan L Sievers; Erica Shefler; Stacey Gabriel; Nir Hacohen; Robin Reed; Matthew Meyerson; Todd R Golub; Eric S Lander; Donna Neuberg; Jennifer R Brown; Gad Getz; Catherine J Wu
Journal:  N Engl J Med       Date:  2011-12-12       Impact factor: 91.245

Review 5.  Cancer vaccines: on the threshold of success.

Authors:  Leisha A Emens
Journal:  Expert Opin Emerg Drugs       Date:  2008-06       Impact factor: 4.191

6.  c-Kit-targeting immunotherapy for hereditary melanoma in a mouse model.

Authors:  Masashi Kato; Kozue Takeda; Yoshiyuki Kawamoto; Toyonori Tsuzuki; Khaled Hossain; Akiko Tamakoshi; Takahiro Kunisada; Yasuhiro Kambayashi; Keiki Ogino; Haruhiko Suzuki; Masahide Takahashi; Izumi Nakashima
Journal:  Cancer Res       Date:  2004-02-01       Impact factor: 12.701

7.  Bcipep: a database of B-cell epitopes.

Authors:  Sudipto Saha; Manoj Bhasin; Gajendra P S Raghava
Journal:  BMC Genomics       Date:  2005-05-29       Impact factor: 3.969

8.  Reliable B cell epitope predictions: impacts of method development and improved benchmarking.

Authors:  Jens Vindahl Kringelum; Claus Lundegaard; Ole Lund; Morten Nielsen
Journal:  PLoS Comput Biol       Date:  2012-12-27       Impact factor: 4.475

9.  Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research.

Authors:  Hong Huang Lin; Guang Lan Zhang; Songsak Tongchusak; Ellis L Reinherz; Vladimir Brusic
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

10.  NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11.

Authors:  Claus Lundegaard; Kasper Lamberth; Mikkel Harndahl; Søren Buus; Ole Lund; Morten Nielsen
Journal:  Nucleic Acids Res       Date:  2008-05-07       Impact factor: 16.971

View more
  2 in total

Review 1.  Managing the genomic revolution in cancer diagnostics.

Authors:  Doreen Nguyen; Christopher D Gocke
Journal:  Virchows Arch       Date:  2017-06-21       Impact factor: 4.064

2.  CancerLivER: a database of liver cancer gene expression resources and biomarkers.

Authors:  Harpreet Kaur; Sherry Bhalla; Dilraj Kaur; Gajendra Ps Raghava
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.