Literature DB >> 31775766

Identification and ranking of recurrent neo-epitopes in cancer.

Eric Blanc1,2, Manuel Holtgrewe1,2, Arunraj Dhamodaran3, Clemens Messerschmidt1,2, Gerald Willimsky4,5,6, Thomas Blankenstein4,3,5, Dieter Beule7,8.   

Abstract

BACKGROUND: Immune escape is one of the hallmarks of cancer and several new treatment approaches attempt to modulate and restore the immune system's capability to target cancer cells. At the heart of the immune recognition process lies antigen presentation from somatic mutations. These neo-epitopes are emerging as attractive targets for cancer immunotherapy and new strategies for rapid identification of relevant candidates have become a priority.
METHODS: We carefully screen TCGA data sets for recurrent somatic amino acid exchanges and apply MHC class I binding predictions.
RESULTS: We propose a method for in silico selection and prioritization of candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. While the percentage of patients carrying a specific neo-epitope and HLA-type combination is relatively small, the sheer number of new patients leads to surprisingly high reoccurence numbers. We identify 769 epitopes which are expected to occur in 77629 patients per year.
CONCLUSION: While our candidate list will definitely contain false positives, the results provide an objective order for wet-lab testing of reusable neo-epitopes. Thus recurrent neo-epitopes may be suitable to supplement existing personalized T cell treatment approaches with precision treatment options.

Entities:  

Keywords:  Cancer; Immunotherapy; Neo-antigen; Neo-epitope; Precision treatment

Mesh:

Substances:

Year:  2019        PMID: 31775766      PMCID: PMC6882202          DOI: 10.1186/s12920-019-0611-7

Source DB:  PubMed          Journal:  BMC Med Genomics        ISSN: 1755-8794            Impact factor:   3.063


Background

Increasing evidence suggests that clinical efficacy of cancer immunotherapy is driven by T cell reactivity against neo-antigens [1-5]. While not yet fully understood, immune response and recognition of tumor cells containing specific peptides depends critically on the ability of the MHC class I complexes to bind to the peptide in order to present it to a T cell. Neo-antigens can be created by a multitude of processes like aberrant expression of genes normally restricted to immuno-privileged tissues, viral etiology or by tumor specific DNA alterations that result in the formation of novel protein sequences. Furthermore there is now evidence for neo-epitopes generated from alternative splicing [6] and alterations in non-coding regions [7]. With the advent of affordable short read sequencing, comprehensive neo-antigen screening based on whole exome sequencing has become feasible and many cancer immune therapeutic approaches try to utilize detailed understanding of the neo-epitope spectrum to create additional or boost pre-existing T cell reactivity for therapeutic purposes [8, 9]. However, in practice the selection and validation of the most promising neo-epitope candidates is a difficult and time-consuming task. The typical approach is based on the private mutational catalogue of the individual patient: exome sequencing data is subjected to bioinformatics analysis and used to predict neo-epitopes and their binding affinities to the MHC class I complex. Our study aims to complement this approach by a precision medicine perspective. We search and prioritize neo-epitope candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. These neo-antigens hold the potential for development of off the shelf T cell therapies for sub groups of cancer patients. We use epidemiological data to give rough estimates for the expected number of patients in these groups. Candidate prediction always relies on somatic variant detection workflows and affinity prediction algorithms based on machine learning, see e.g. [10]. Binding prediction far from perfect [11] especially for rarer HLA types, and may also depend on mutational context [12]. Catalogues of the neo-epitope landscape across various cancer entities have been created by various authors [13-15]. While neoantigen landscape is diverse and sparse [13], here we provide an unbiased, comprehensive ranking of candidates, defined as neo-epitopes arising from recurrent mutations, predicted to be binding to a specific HLA-1 allele. The candidates are ranked according to the expected number of target patients.

Methods

Data sets

Somatic variants for different cancer entities have been determined using matched pairs of tumor and blood whole exome or whole genome sequencing in the TCGA consortium. We downloaded the open-access somatic variants from GDC data release 7.0 [16], consisting of 33 TCGA projects and 10,182 donors in total. Details of the somatic variant calling can be found in [17]. We excluded patients without corresponding entries in the clinical information tables, and 7 projects with less than 100 samples, yielding 9,641 samples covering 26 cancer studies. Figure 1 provides an overview of the complete bioinformatics process, from the GDC somatic single nucleotide variants to the identification of the candidates.
Fig. 1

Workflow overview a Overview of the recurrent neo-epitope candidates generation process: TCGA studies are selected for at least 100 donors with clinical annotations. For each of these studies, recurrent strongly supported missense Single-Nucleotide Variants are collected. Neo-epitopes binding to 11 HLA-1 types are predicted, redundancy is removed from that set (see B) and strong binders are retained. b Example of epitope redundancy: the 18 amino-acids long sequence surrounding recurrent variant GLRA3:S274L generates 7 binding neo-epitopes for the type HLA-A*02:01. Our pipeline retains only the strongest predicted binder for a given variant and HLA-1 type pair (the first, with an IC50 of 8.8 nM in the example). c Number of SNVs occuring in genes classified as Oncogenes or Tumor Suppressors by Vogelstein et al. [28], at various point of the variant selection and neo-epitope selection process

Workflow overview a Overview of the recurrent neo-epitope candidates generation process: TCGA studies are selected for at least 100 donors with clinical annotations. For each of these studies, recurrent strongly supported missense Single-Nucleotide Variants are collected. Neo-epitopes binding to 11 HLA-1 types are predicted, redundancy is removed from that set (see B) and strong binders are retained. b Example of epitope redundancy: the 18 amino-acids long sequence surrounding recurrent variant GLRA3:S274L generates 7 binding neo-epitopes for the type HLA-A*02:01. Our pipeline retains only the strongest predicted binder for a given variant and HLA-1 type pair (the first, with an IC50 of 8.8 nM in the example). c Number of SNVs occuring in genes classified as Oncogenes or Tumor Suppressors by Vogelstein et al. [28], at various point of the variant selection and neo-epitope selection process

Variant selection

For each sample we selected all single nucleotide variants obtained by the “mutect2” pipeline, that had a “Variant_Type” equal to “SNP”, a valid ENSEMBL transcript ID and a valid protein mutation in “HGVSp_Short”. From these variants, we selected those with a “Variant_Classification” equal to “Missense_Mutation”. We checked that all variants had a “Mutation_Status” equal to (up to capitalisation) “Somatic”, that the total depth “t_depth” was the sum of the reference “t_ref_count” and the alternate “t_alt_count” alleles counts, and that the genomics variant length is one nucleotide. To avoid high number of false positives we consider only variants that are supported by at least 5 reads and have a VAF of at least 10%. Furthermore we removed any variant that occurs with more than 1% in any population contained in the ExAC database version 0.31 [18], by coordinates liftover from the GRCh38 to hg19 human genome versions. This way we obtained 26 cancer entity data sets containing a total of 9,641 samples with an overall 1,384,531 variants.

Recurrent protein variant selection

We define recurrence strictly on the protein/amino acid exchange level, i.e. different nucleotide acid variants leading to the same amino acid exchange due to code redundancy will be counted together. Recurrent protein variants are defined within each TCGA study. A protein variant is deemed recurrent when it appears in at least 1% of all the patients in the cohort. As cancer types are only considered when the number of patients involved in the studies is greater than 100, this threshold ensures that every recurrent variant has been observed in at least 2 patients for a given cancer type. To be conservative, the recurrence frequency has been computed using, for the denominator, all patients with clinical information in the study, including those without high-confidence missense SNVs. Using this definition, the total number of recurrent amino acid changes is 1055. A variant recurrent in multiple cancer types is counted multiple times in the above number, the number of unique recurrent variants regardless of the cancer is 869. Additional file 1 shows the most frequent amino acid exchanges across 25 cancer entities, as no variant from project TCGA-KIRC’s donors is labeled as recurrent. Recurrent variants occurring at the same positions (for example when gene’s IDH1 codon R132 is mutated to amino acid H, C, G or S) have been merged into 819 variants suitable for comparisons with the cancer hot spots lists [14]. 122 out of the 819 merged variants belong to the set of 470 cancer hotspot variants, and 5 (PCBP1:L100, SPTLC3:R97, EEF1A1:T432, BCLAF1:E163 & TTN:S3271) to the set of presumptive false positives hotspots listed in the supplementary material of [14].

MHC class i binding prediction and epitopes selection

For all recurrent variants identified, we assess in silico their predicted propensity that the amino-acid exchange generates a binding neo-epitope. A variety of machine learning algorithms have been developed to determine the MHC binding in silico, see ref. [19] for review. Most methods are trained on Immune Epitope Database (IEDB) [20] entries and use allele specific predictors for frequent alleles, while pan-methods are applied to extrapolate to less common alleles. We predicted the MHC class I binding using NetMHCcons [21] v1.1, which predicts peptides IC50 binding, and classifies these predictions as non-binder, weak and strong binders, based on the relative ranking of binding predictions. As the range of IC50 binding values strongly depend on the HLA-1 allele [22], we have used the NetMHCcons classification to select our neo-epitope candidates. For a given recurrent variant and a given HLA-1 type, the epitope prediction pipeline can produce multiple overlapping epitopes candidates, differing by their length and/or their position (see Fig. 1B). To remove such size redundancy, only the epitope with the lowest predicted mutant sequence IC50 is retained. This procedure also removes non-overlapping epitopes, to keep only at most one epitope per recurrent protein variant and HLA-1 type. For comparison we also compute the IC50 for the respective wild type peptide. For MHC class I binding prediction we selected 11 frequent HLA-1 types: HLA-A*01:01, HLA-A*02:01, HLA-A*03:01, HLA-A*11:01, HLA-B*07:02, HLA-B*08:01, HLA-B*15:01, HLA-C*04:01, HLA-C*06:02, HLA-C*07:01, HLA-C*07:02. We limited the search for poly-peptides 9, 10 and 11 amino-acids long. For these alleles, we obtain 769 strong binding recurrent peptides and 1829 weak binders, over all considered cancer types. Their complete list is in Additional file 2, where each candidate is listed with the HLA-1 type it is preticted to bind to.

Data QC

To ensure that the proportion of variants caused by technical artifacts is small, we have computed the proportion of SNVs called in poly-A, poly-C, poly-G or poly-T repeats of length greater than 6 have been computed for each data study [23], for unique variants (that occur in only one patient across a project cohort), and for variants that are observed more than once in a cohort (Additional file 3). For comparison, we have computed the expected frequency of such events, assuming that all possible 11-mers (the mutated nucleotide at the center, flanked by 5 nucleotides on each side) are equiprobable, regardless of their sequence. Based on this equiprobable model, we have computed the probability that the number of mutations found in repeat locii is equal to or greater than the observed numbers. When considering variants appearing more than once, this probability is not significant for all studies; when unique variants are considered, those appear in repeat locii significantly more often than expected by chance in 7 out of 26 studies (TCGA-COAD, TCGA-KIRP, TCGA-LIHC, TCGA-READ, TCGA-SKCM, TCGA-TGCT & TCGA-UCAC, significance level set to 0.05 after Benjamini-Hochberg multiple testing correction).

Mice

ABabDII mice (described in detail in [24]) have been used for this study. They are transgenic for entire human TCR- α and TCR- β gene loci, as well as for HHD molecule [25] and deficient for the murine Tcr- α and - β chains, as well as for murine β2m and H2- D genes. The mice used in the study were generated and housed under SPF conditions (caged enriched with bedding material, 3-5 mice/cage, standard light/dark cycle, food and water ad libitum) at the Max-Delbrück-Center animal facility. All animal experiments were approved by the Landesamt für Arbeitsschutz, Gesundheitsschutz und technische Sicherheit, Berlin, Germany.

Generation of mutation-specific t cells in ABabDII mice

For each candidate, 3 ABabDII mice between 8 to 12 weeks old (6 in total) underwent immunisation. They were injected subcutaneously with 100 μg of mutant short peptide (9-10mers, JPT) supplemented with 50 μg CpG 1826 (TIB Molbiol), emulsified in incomplete Freund’s adjuvant (Sigma). Repetitive immunizations were performed with the same mixture at least three weeks apart. Mutation-specific CD8 + T cells in the peripheral blood of immunized animals were assessed by intracellular cytokine staining (ICS) for IFN γ 7 days after each boost. All 6 animals were peptide-reactive. The 6 mice were sacrificed for spleen preparation by cervical dislocation after isofluran anasthesia.

Patient number estimates and HLA-1 frequencies

HLA-1 frequency data f for the U.S. population was retrieved from the Allele Frequency Net Database (AFND) [26]. Frequency data were estimated by averaging the allele frequencies of multiple population datasets from the North American (NAM) geographical region. The major U.S. ethnic groups were included and sampled under the NAM category. Cancer incidence data for the U.S. population (N) was retrieved from the GLOBOCAN 2012 project of the International Agency for Research on Cancer, WHO [27]. Assuming that the fraction of a recurrent variant in the U.S. population affected by cancer entity d (r) is identical to the observed ratio of that variant in the corresponding TCGA study, the number of patients of HLA-1 type h whose tumor contain the variant is expected to be The summation runs over 18 diseases d for which both the TCGA projects and the cancer incidence data are available.

Results

Recurrent variants and candidates

From the GDC repository [16], we have collected somatic variants for 33 TGCA studies. After removing patients without clinical meta-data, and studies with less than 100 patients, we have selected 1,384,531 high-confidence missense SNPs from 9,641 patients, see methods for details. Using this data, 1,055 variants are deemed recurrent (Additional file 1), as they can be found in more than 1% of the patients in the respective study cohort. These recurrent variants correspond to 869 unique protein changes, as some appear in multiple cancer entities. 77 of the recurrent variants occur in at least 3% of their cohort (43 unique protein changes). From these 869 unique protein changes, we have generated candidates that are predicted to be strong MHC class I binders in frequent HLA-1 types that we considered for initial selection. 415 (48%) of them lead to a strong binder prediction. In total, there are 772 candidates that are recurrent in a cancer entity cohort, and predicted as binding for a considered HLA-1 type. These candidates are non-redundant among all the 9-, 10- & 11-mers containing the variant: the selection process retains only the peptide sequence with the lowest predicted IC50. Figure 1 and Table 1 provide an overview of the variant selection and neo-epitope candidates generation processes, while Additional file 2 lists all neo-epitopes (weak and strong predicted binders) after removing redundancy.
Table 1

Overview of the 33 TCGA studies used in this analysis

Project nameNumber of patientsVariants per patientsMissense variant per patientRecurrent variantsStrong binders
TotalWith clinical dataAverageMedianAverageMedian
TCGA-BLCA4124123262261571092214
TCGA-BRCA986986123625025810
TCGA-CESC289289358157143621716
TCGA-COAD399397666176288824134
TCGA-ESCA18418424618795738072
TCGA-GBM393390212709336155
TCGA-HNSC5085082011399766143
TCGA-KIRC3363367969333100
TCGA-KIRP2812818582393852
TCGA-LAML1431436915166147
TCGA-LGG50850770363316140
TCGA-LIHC36436414912070581116
TCGA-LUAD56751536724218011370
TCGA-LUSC4924923683011871532019
TCGA-OV4364351731215847107
TCGA-PAAD1781781685077192412
TCGA-PCPG17917913125485
TCGA-PRAD4954955935271537
TCGA-READ13713647514823270320186
TCGA-SARC23723711970452620
TCGA-SKCM467467841472413229266220
TCGA-STAD437437488157211741714
TCGA-TGCT14412823219896
TCGA-THCA49249222126543
TCGA-THYM123123392410462
TCGA-UCEC530530167214970854118109
TCGA-ACC9292117360000
TCGA-CHOL5145110620000
TCGA-DLBC37371731570000
TCGA-KICH666644250000
TCGA-MESO828247440000
TCGA-UCS5757183670000
TCGA-UVM808023160000
Total1018210100Total number: 3155183Total number: 13845311055769

The 7 studies displayed at the bottom have not been used for the determination of recurrent vairants, as the number of patients is less than 100. The number of strong binders includes all occurrences of neo-epitopes candidates, so a candidate may be counted multiple times when it is predicted to be binding several HLA-1 types

Overview of the 33 TCGA studies used in this analysis The 7 studies displayed at the bottom have not been used for the determination of recurrent vairants, as the number of patients is less than 100. The number of strong binders includes all occurrences of neo-epitopes candidates, so a candidate may be counted multiple times when it is predicted to be binding several HLA-1 types Despite large differences between variant selection protocols, 123 variants deemed recurrent by the above process can be found among the 470 variants identified in the cancer hotspot datasets [14] (Additional file 4). This overlap is strongly dependent on how frequent those variants are observed: there are 54 common variants out of the 61 variants observed more than 10 times over our dataset (>88%). Among the 819 variants retained for the comparison (see methods for details), only 5 appear among the variants flagged as possible false positive by Chang et al. (<1%).

Enrichment in known cancer related genes

We observe that recurrent variants occur substantially more frequently in known cancer-related genes than in other genes (Fig. 1c). Initially approximatively one percent of all observed variants are found in genes that have been described [28, 29] as oncogenes (54 genes) or tumor suppressor genes (71 genes). When recurrent unique protein changes are considered, the fraction of known oncogenes or tumor suppressor genes is substantially increased to 13% and 6.5% respectively (a χ2 test between unique protein changes and unique recurrent variants gives a P value smaller than 10−16). These fractions only marginally increase to 14% and 7% when only the unique protein changes leading to predicted strong binders for frequent HLA-1 types are considered (a χ2 test between unique recurrent variants and strong binders gives a non-significant P value). Additional file 5 shows a similar enrichment of known cancer-related genes per cohort. We observe that the enrichment is stronger for oncogenes than for tumor suppressors. This might be expected, as activating mutations in oncogenes are mainly distributed on a few protein positions, while loss of function mutations in tumor suppressors are generally distributed more broadly along the protein sequence. It is interesting to observe that several of the highly prevalent neo-epitope candidates occur in genes that are involved in known immune escape mechanisms: RAC1:P29S is recurrent in study SKCM (melanoma), is predicted to lead to strong binding neo-epitopes for HLA-A*01:01 and HLA-A*02:01, and is reported to up-regulate PD-L1 in melanoma [30]. CTNNB1:S33C is recurrent in studies LIHC (liver hepatocellular carcinoma) and UCEC (uterine corpus endometrial carcinoma), is predicted to lead to strong binding neo-epitopes for HLA-A*02:01, and has been shown to increase the expression of the Wnt-signalling pathway in hepatocellular carcinoma [31], leading to modulation of the immune response [32] and ultimately to tumor immune escape [33]. In a separate study, Cho et al. [34] show that this mutation confers acquired resistance to the drug imatinib in metastatic melanoma. Finally, FLT3:D835Y recurrent in study LAML (acute myeloid leukemia), is predicted to lead to a strong binding neo-epitope for HLA-A*01:01, HLA-A*02:01 and HLA-C*06:02, and following Reiter et al. [35], Tyrosine Kinase Inhibitors promote the surface expression of the mutated FLT3, enhancing FLT3-directed immunotherapy options, as its surface expression is negatively correlated with proliferation. While the described mechanisms are probably sufficient to explain immune escape in tumor evolution, the candidates could nevertheless be viable targets for adoptive T cell therapy or TCR gene therapy.

Recurrent neo-epitopes in patient populations

Upon assumption of statistical independence, the product of the frequency of a recurrent variant with the frequency of class I alleles in the population and the incidence rates of cancer types provides an estimate for the number of patients that carry that specific candidate. Using the number of newly diagnosed patients per year and HLA-1 frequency in the US population, we are able to compute the expected number of patients for 18 cancer entities for which both cancer census data and a TCGA study are available. The occurrence numbers for individual candidates range from 0 to 2,254 for PIK3CA:H1047R in breast cancer patients of type HLA-C*07:01; Table 2 presents a summary of expected patient numbers for the complete set of candidates. We estimate that, in the US alone, the previously discussed RAC1:P29S mutation might be present in 628 new patients carrying the HLA-A*02:01 allele each year (in 556 melanoma patients and in 72 lung small cell, head & neck or uterine carcinomas patients, see Additional file 6 for details). For the CTNNB1:S33C mutation, the total number of HLA-A*02:01 patients in the US is expected to be 364, from uterine corpus, prostate and liver cancer types. As another example, 115 myeloid leukemia patients in the US are expected to be of type HLA-A*02:01 and carry the FLT3:D835Y mutation.
Table 2

Expected number of newly diagnosed U.S. patients by HLA-1 type and cancer entity

A)
Cancer entityStudyNumber of patientsHLA-A*01:01 (7.61%)HLA-A*02:01 (20.36%)HLA-A*03:01 (6.60%)HLA-A*11:01 (4.37%)HLA-B*07:02 (6.51%)HLA-B*08:01 (4.80%)HLA-B*15:01 (4.46%)HLA-C*04:01 (16.69%)HLA-C*06:02 (5.72%)HLA-C*07:01 (9.28%)HLA-C*07:02 (15.39%)
Bladder Urothelial CarcinomaBLCA6930034063819715214211289223376596383
Invasive Breast CarcinomaBRCA2048001101090461199108375120482141123051234
Cervical Squamous Cell CarcinomaCESC140005813516171314231422766152
Colon AdenocarcinomaCOAD15484086116287707604502223752961101916201800
Esophageal AdenocarcinomaESCA475023151261620513101232683
Glioblastoma MultiformeGBM3204717370038533
Head & Neck Squamous Cell CarcinomaHNSC580004320811245811181301136220175
Renal Clear Cell CarcinomaKIRC576001370230080010026
Papilliary Renal Cell CarcinomaKIRP8064060300029000
Acute Myeloid LeukemiaLAML1350077115084900048973
Hepatocellular CarcinomaLIHC297001249613184691276869112174
Lung Squamous Cell CarcinomaLUSC660001816423281682821377593151172366
Serous Ovarian CancerOV1680026243310152143991436
Prostate AdenocarcinomaPRAD26000012085269693450701753876281202
MelanomaSKCM75000264978901817936861530203218643810002457
Stomach AdenocarcinomaSTAD2500047172566626830206114166130
Thyroid CancerTHCA46400394000014000044
Endometrial CarcinomaUCEC550009422804817501369290493222285616021779
Total11619585904169364858303325171666159897365080853910116
B)
Number of candidates in diseases5591686433242448485055

A) Expected number of patients of a given HLA-1 type who harbor at least one potentially immunogenic neo-epitope candidate for that HLA-1 type. Both the cancer incidence and the allele frequency are estimated for the U.S. population. The probability that a patient carries at least one variant from the set of neo-epitope candidates is computed under the assumption that the occurrence of variants in a cancer patient stems from statistically independent events. B) Number of neo-epitope candidates identified in the 18 studies shown in A, which are predicted to be strong binders to the corresponding HLA-1 type

Expected number of newly diagnosed U.S. patients by HLA-1 type and cancer entity A) Expected number of patients of a given HLA-1 type who harbor at least one potentially immunogenic neo-epitope candidate for that HLA-1 type. Both the cancer incidence and the allele frequency are estimated for the U.S. population. The probability that a patient carries at least one variant from the set of neo-epitope candidates is computed under the assumption that the occurrence of variants in a cancer patient stems from statistically independent events. B) Number of neo-epitope candidates identified in the 18 studies shown in A, which are predicted to be strong binders to the corresponding HLA-1 type Figure 2 shows the cumulative expected number of patients that carry a specific epitope, and with matching HLA-1 type, for the 50 candidates with the highest expected patients number. The number of patients is derived from the sum over all cancer entities, including those in which the candidate is not recurrent according to our criteria. For example, among newly diagnosed US patients of type HLA-C*04:01, 88 prostate cancer patients are expected to carry the mutation PIK3CA:R88Q, even though its observed frequency in the PRAD study is as low as 0.2%. The data shown in Fig. 2 can be found in Additional file 6.
Fig. 2

50 most frequent candidates in patients for which strong MHC I binding is predicted. For each candidate, the expected number of patients is obtained by summing over the 18 cancer entities for which the number of newly diagnosed patients in the US is available, and for which a corresponding TCGA study has been included in our analysis

50 most frequent candidates in patients for which strong MHC I binding is predicted. For each candidate, the expected number of patients is obtained by summing over the 18 cancer entities for which the number of newly diagnosed patients in the US is available, and for which a corresponding TCGA study has been included in our analysis

Accessible patient population

As our current understanding of peptide immunogenicity is still incomplete [36], not all candidates predicted by our pipeline can be expected to trigger an immunogenic response in patients. To further evaluate the usefulness of our results we consider the list of candidates (neoepitope and HLA type pairs) selected form our ranking. Assuming a T cell therapy could be generated for every candidate we can compute the number of patients that would benefit, see methods. Because of imperfections in candidate prediction, not all candidates hold the potential for an effective T cell therapy, and these ineffective candidates can be thus viewed as “false positives”. Because it is impossible to create a reliable estimated for the fraction of these false positives due to the complexity of the underlying algorithm and biological process we decided to consider a broad range of possible values from 50% to 95%, cf. Figure 3. Using a subset of 6868 patients for whom HLA types were known, we predict the number of patients for whom such positive response might be expected, as a function of the proportion of “false positives” in our candidates. To estimate the impact of such “false positives”, we have randomly flagged 1000 times 337, 539, 607 & 640 candidates as “false positives”, which is corresponding to a fraction of about 50%, 80%, 90% and 95% of the total 674 candidates. This procedure left us with 1000 sets of 337, 135, 67 & 34 candidates that were not flagged as “false positives”. Figure 3 shows that for a pessimistic 90% of false positive candidates, more than 1.5% of patients over all cancer entities (95% CI between 1.25% & 2.65%, mean 1.78%, median 1.72%, both corresponding to about 20000 new patients per year in the U.S.) are still expected to carry at least one of the 67 remaining candidates’ mutation and corresponding HLA allele. While the proportions are modest, the absolute number of patients seems relevant. The figure in Additional file 7 shows that there are considerable differences between entities: the proportion of matching patients is much higher in diseases with high mutational load such as melanomas (TCGA-SKCM, median about 9% for 90% false positives), than in diseases with lower mutational load, such as thyroid cancer (TCGA-THCA, 0.2%, 90% false positives).
Fig. 3

Expected influence of the proportion of false positive neo-epitope candidates on the patient population. Proportion of the patients that carry at least one neo-epitope candidate mutation, and whose HLA-I allele set contains the candidate HLA type, when a limited percentage of the neo-epitope candidates is considered. The patient cohort considered here consists of 6868 patients from the 18 TCGA cohorts for whom the HLA types are known. For each false positive proportion, the false positive candidates have been selected 1000 times at random

Expected influence of the proportion of false positive neo-epitope candidates on the patient population. Proportion of the patients that carry at least one neo-epitope candidate mutation, and whose HLA-I allele set contains the candidate HLA type, when a limited percentage of the neo-epitope candidates is considered. The patient cohort considered here consists of 6868 patients from the 18 TCGA cohorts for whom the HLA types are known. For each false positive proportion, the false positive candidates have been selected 1000 times at random

Confirmational evidence

A limited validation of our method was performed in two steps: first, we confirmed that our pipeline was able to identify candidates that have been previously reported as eliciting spontaneous CD8 + T-cell responses in cancer patients in whom the target epitopes were subsequently discovered [37, 38]. Both sets together (Additional file 8) contain 37 epitopes, 35 of which could be mapped to an ENSEMBL transcript (33 unique genes). For 27 of these epitopes our pipeline predicted strong binding with the specific HLA-1 type reported in the corresponding wet-lab investigations. Another 5 epitopes where predicted as weak binders, some of the latter are also predicted to be strong binders in other HLA-1 types. Our pipeline classified 70% of a set of known tumor neo-antigens as strong binders and another 14% as weak binders. 4 out of 34 unique identifiable variants studied by van Buuren et al. [38] and Fritsch [37] are found among our set of high confidence missense variants, but only one (CTNNB1:S37F) fulfills the 1% recurrence threshold (9 uterine carcinoma patients). This variant was shown to trigger immunological response against HLA-A*24:02 [39], which isn’t in the set of alleles that we have systematically tested. However, our prediction show that the same peptide might also be reactive against HLA-C07:02. Finally, the CDK4:R24C peptide (sequence ACDPHSGHFV, see Additional file 8) is not predicted to bind to HLA-A*02:01, even though it leads to confirmed T cell response [40], and has been related to cutaneous malignant melanoma and hereditary cutaneous melanoma [41], [42]. Taken together, these results show that our candidate prediction pipeline is able to recapitulate most clinically validated neo-epitopes reported in [38] and [37], and that some of these neo-epitopes occur from recurrent variants. We have also performed preliminary validation for two candidates: RAC1:P29S & TRRAP:S722F binding to HLA-A*02:01 (Fig. 4). We utilized ABabDII mice, transgenic animals that harbour the human TCR αβ gene loci, a chimeric HLA-A2 gene and are deficient for mouse TCR αβ and mouse MHC I genes. These mice have been shown to express a diverse human TCR repertoire [24, 43] and thus mimic human T cell response. They were immunized at least twice with mutant peptides and IFN γ producing CD8 + T cells were monitored in ex vivo ICS analysis 7 days after the last immunization. CD8 + T cells were purified from spleen cell cultures of reactive mice using either IFN γ-capture or tetramer-guided FACSort. Sequencing of specific TCR α and β chain amplicons that were obtained by RACE-PCR revealed that this procedure yields an almost monoclonal CD8 + T cell population (not shown). In both cases, tested neo-antigen candidates lead to T cell reactivity, confirming not only predicted MHC binding by our pipeline but also immunogenicity in vivo in human TCR transgenic mice. Therefore this workflow also allows to generate potentially therapeutic relevant TCRs to be used in the clinics for cancer immunotherapy.
Fig. 4

Recognition of predicted epitopes by CD8 + T cells. Epitopes for recurrent mutations that have been identified in silico to bind to HLA-A*02:01 using our pipeline were synthesized and used for immunization of human TCR transgenic ABabDII mice. Examples (RAC1:P29S and TRRAP:S722F) of ex vivo ICS analysis of mutant peptide immunized ABabDII mice 7 days after the last immunization are shown. Polyclonal stimulation with CD3/CD28 dynabeads was used as positive control, stimulation with an irrelevant peptide served as negative control (data not shown)

Recognition of predicted epitopes by CD8 + T cells. Epitopes for recurrent mutations that have been identified in silico to bind to HLA-A*02:01 using our pipeline were synthesized and used for immunization of human TCR transgenic ABabDII mice. Examples (RAC1:P29S and TRRAP:S722F) of ex vivo ICS analysis of mutant peptide immunized ABabDII mice 7 days after the last immunization are shown. Polyclonal stimulation with CD3/CD28 dynabeads was used as positive control, stimulation with an irrelevant peptide served as negative control (data not shown)

Discussion

By virtue of the underlying mutational processes, the genome architecture and accessibility as well as for functional reasons within the disease process, certain somatic mutations will be present in multiple patients while still being highly specific to the tumor [14]. Using existing cancer studies and neo-epitope binding predictions to MHC class I proteins, we propose a ranking of candidates which mutation occur frequently in observed cancer patient cohorts. The candidates are ranked according to the expected number of target patients. For one candidate, the target patients are defined as those who bear the candidate’s mutation, and whose HLA types contain the candidate’s. The expected number of target patients is proportional to the HLA type frequency in the population, and to the frequency of the mutation in the cancer cohorts. Taking into account the fact that MHC binding is a necessary but not sufficient condition for T cell activity, and the limitations of MHC binding prediction algorithms, our method provides an objective ranking of neo-epitopes based on recurrent variants, as a basis for the development of off-the-shelf immunotherapy treatments. Despite numerous mechanisms of immune evasion, neo-epitopes are important targets of endogenous immunity [5]. In some cases at least, it has been shown that they contribute to tumor recognition [44], achieve high objective response (in melanoma, see ref. [45, 46]), and a single of them is presumably sufficient for tumor regression [47]. Moreover, positive association has been shown between antigen load and cytolytic activity [48], activated T cells [13] and high levels of the PD-1 ligand [49]. Taken together, these results suggest that neo-epitopes occupy a central role in regulating immune response to cancer, and that this role can be exploited for cancer immunotherapy. Even though the question of negative selection for strong binding neo-epitopes and its relation to other immune evasion mechanisms like HLA loss or PD-L1, CTLA4 dis-regulation is still open [50]. A recent CRISPR screen suggest that more then 500 genes are essential for cancer immunotherapy [51]. Targeting neo-epitopes based on non-recurrent, private somatic variants requires generation of private TCRs or CARs for each individual patient, which is challenging [52]. Successful treatments based on genetically engineered lymphocytes has been shown for epitopes arising from unmutated proteins, i.e. public epitopes: MART-1 and gp100 proteins have been targeted in melanoma cases [53]. In another trial, Robbins et al. [54] have studied long-term follow-up of patients who were treated with TCR-transduced T cells against NY-ESO-1, a protein whose expression is normally restricted to testis, but which is frequently aberrantly expressed in tumor cells. They show that treatment may be effective for some patients. These results show that immune treatments based on public variants can be beneficial, suggesting that similar success may potentially be achieved using candidates based on recurrent variants. However, targeting such non somatic epitopes presents safety and efficacy concerns [2]. The administration of T cells transduced with MART-1 specific T-cell receptor have led to fatal outcomes [55]. Cross-reactivity of TCR against MAGE-A3 (a protein normally restricted to testis and placenta) caused cardiovascular toxicity [56]. Neo-epitopes based on recurrent somatic variants potentially alleviate such problems, as the target sequences are truly restricted to tumor cells. Our computation of expected targetable patient groups assumes that neither the cancer type nor the patient’s mutanome are associated with the patient’s HLA-1 alleles. In a recent study, Van den Eyden et al. [50] show that there is little (if any) antigen depletion due to the negative selection pressure from the immune response. Molecular evolution methods applied to somatic mutations show that nearly all mutations escape negative selection [57]. Taken together, these results suggest that the expected probability of a recurrent variant being present in a patient somatic mutations pool should not be affected (significantly) by the patient’s HLA-1 alleles. The neo-epitope landscape is diverse and sparse [13]. Few neo-epitopes are predicted to be both strong binders and present in multiple patients. In their analysis, Hartmaier et al. [58] estimate that neo-epitopes suitable for precision immuno-therapy might be relevant for about 0.3% of the patients, which is in agreement with our results. However, the absolute number of patients is still considerable, see Table 2. Our study shows that a relatively large number of patients (about 1% of newly diagnosed patients) might benefit from a small library of candidates proven to generate immunological response. These numbers must be compared to “conventional” personalised immunotherapy, where a immunologically active candidate must be identified for each new patient for which efficacy and safety are always unknown. Even if a substantial part of the neo-epitopes we suggest turns out to be false positives due to the limitation of prediction algorithms and understanding of immune response, there is potential to help tens of thousands of patients.

Conclusions

Off the shelf immune treatments can be faster, less costly and safer for individual patients, because each neo-epitope based treatment scheme can be reused on hundreds of patients per year. In this respect, they might open the way to supplement existing personalized cancer immune treatments approaches with precision treatment options. We believe that our ranking provides a rational order for testing for and selecting off the shelf neo-epitope based therapies. Our preliminary in vivo mouse experiments show that this in principle feasible. Additional file 1 1055 recurrent variants identified in 26 TCGA studies. For each variant, the number of cases harboring the variant (Number of occurrences), the cohort size and the fraction of cases in the cohort (Fraction) are given. When available, COSMIC entries (from ENSEMBL) are are also listed, as well as the highest allele frequency from all populations quoted in ExAC version 0.31 ([18]). Gene annotations from Vogelstein et al. ([28]) & Rubio-Perez et al. ([29]) are also provided. Additional file 2 Neo-epitope candidates from recurrent variants. Recurrent variants leading to binding (strong and weak binders) neo-epitopes for one of the 11 HLA types considered. Peptide length redundancy has been removed from the variant list, and each variant is listed only once, even if it is recurrent in multiple study cohorts. Additional file 3 Frequency of Single Nucleotides Variants (SNVs) that fall in a poly-A, poly-C, poly-G or poly-T sequence of length at least 6. The variants that appear only once in the whole study are colored in blue, while the variants that appear more than once are colored in red. The dotted line shows the expected fraction of such variants, if the sequences were all random. Except for the LIHC study, all variants that occur more than once in the cohort are found in difficult-to-sequence regions less than expected by chance. Additional file 4 Overlap between recurrent variants and hotspot variants. The overlap is based on the codon position, so that all variants occurring at the same protein sequence position are pooled together. The recurrent variants that match the alternate codon definition in Chang et al. are added to the overlap. The recurrent variants are pooled by codon and sorted by decreasing occurrence frequency in the study. The overlap between hotspots and highly recurrent variants is high, and the common variants fraction decreases when recurrent variants become less frequent. The overlap between recurrent variants and the list of suspected false positive hotspots compiled by Chang et al. ([14]) is very limited. Inset: Venn diagram of the total overlap between the recurrent variants called in this study, and the hotspot variants described in Chang et al. ([14]). Additional file 5 Oncogenes and tumor suppressors. Number of variants occurring in genes classified as tumor suppressor genes and oncongenes by ([28]), for each study. The numbers are given for the full set of variants, among recurrent variants only and among variants leading to neo-epitope candidates. As each protein change is considered only once, the total number of variants is always smaller or equal to the sum over all studies, as protein changes appearing in multiple studies are counted only once in the total. Additional file 6 Expected number of target patients for each neo-epitope candidates, for the 18 cancer entities with associated epidemiological data. The expected number of patients is the product between the number of new cases, the observed variant frequency and the HLA type frequency in the US population. The total expected number of patients for each candidate is the sum over the expected number of candidates by study. Additional file 7 Expected frequency of patients with at least one candidate not labelled as false positive. For each TCGA cohort, we have selected at random 1000 times 50%, 20%, 10% and 5% from the candidates, to conservately model a high rate of false positive within the candidates. From these selected candidates, we have computed the expected frequency of patients with a HLA-1 allele and a mutation matching at least one selected candidate. Additional file 8 Confirmation Status with Gold Standard Data Set. The protein changes described in van Buuren et al. ([38]) and Fritsch et al. ([37]) have been mapped to the ENSEMBL protein set and neo-epitopes have been computed using our standard pipeline. 26 of these epitopes are exactly recovered by the pipeline, for one of them the pipeline predicts a strong binder for a shorter peptide, and 5 of them are predicted to be weakly binding. Column 7 to 10 are copied from ([37]) and ([38]). Additional file 9 ARRIVE checklist concerning the animals used for the experimental validation of the in vivo presentation of two peptides.
  56 in total

1.  Noncoding regions are the main source of targetable tumor-specific antigens.

Authors:  Céline M Laumont; Krystel Vincent; Leslie Hesnard; Éric Audemard; Éric Bonneil; Jean-Philippe Laverdure; Patrick Gendron; Mathieu Courcelles; Marie-Pierre Hardy; Caroline Côté; Chantal Durette; Charles St-Pierre; Mohamed Benhammadi; Joël Lanoix; Suzanne Vobecky; Elie Haddad; Sébastien Lemieux; Pierre Thibault; Claude Perreault
Journal:  Sci Transl Med       Date:  2018-12-05       Impact factor: 17.956

Review 2.  Targeting cancer-specific mutations by T cell receptor gene therapy.

Authors:  Thomas Blankenstein; Matthias Leisegang; Wolfgang Uckert; Hans Schreiber
Journal:  Curr Opin Immunol       Date:  2015-02-27       Impact factor: 7.486

Review 3.  Immunogenic peptide discovery in cancer genomes.

Authors:  Alexandra Snyder; Timothy A Chan
Journal:  Curr Opin Genet Dev       Date:  2015-01-12       Impact factor: 5.578

4.  Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients.

Authors:  André Kahles; Kjong-Van Lehmann; Nora C Toussaint; Matthias Hüser; Stefan G Stark; Timo Sachsenberg; Oliver Stegle; Oliver Kohlbacher; Chris Sander; Gunnar Rätsch
Journal:  Cancer Cell       Date:  2018-08-02       Impact factor: 31.743

Review 5.  Current tools for predicting cancer-specific T cell immunity.

Authors:  David Gfeller; Michal Bassani-Sternberg; Julien Schmidt; Immanuel F Luescher
Journal:  Oncoimmunology       Date:  2016-04-25       Impact factor: 8.110

6.  Toward a Shared Vision for Cancer Genomic Data.

Authors:  Robert L Grossman; Allison P Heath; Vincent Ferretti; Harold E Varmus; Douglas R Lowy; Warren A Kibbe; Louis M Staudt
Journal:  N Engl J Med       Date:  2016-09-22       Impact factor: 91.245

7.  Cardiovascular toxicity and titin cross-reactivity of affinity-enhanced T cells in myeloma and melanoma.

Authors:  Gerald P Linette; Edward A Stadtmauer; Marcela V Maus; Aaron P Rapoport; Bruce L Levine; Lyndsey Emery; Leslie Litzky; Adam Bagg; Beatriz M Carreno; Patrick J Cimino; Gwendolyn K Binder-Scholl; Dominic P Smethurst; Andrew B Gerry; Nick J Pumphrey; Alan D Bennett; Joanna E Brewer; Joseph Dukes; Jane Harper; Helen K Tayton-Martin; Bent K Jakobsen; Namir J Hassan; Michael Kalos; Carl H June
Journal:  Blood       Date:  2013-06-14       Impact factor: 22.113

Review 8.  Neoantigens in cancer immunotherapy.

Authors:  Ton N Schumacher; Robert D Schreiber
Journal:  Science       Date:  2015-04-03       Impact factor: 47.728

Review 9.  Predicting Antigen Presentation-What Could We Learn From a Million Peptides?

Authors:  David Gfeller; Michal Bassani-Sternberg
Journal:  Front Immunol       Date:  2018-07-25       Impact factor: 7.561

Review 10.  Best practices for bioinformatic characterization of neoantigens for clinical utility.

Authors:  Megan M Richters; Huiming Xia; Katie M Campbell; William E Gillanders; Obi L Griffith; Malachi Griffith
Journal:  Genome Med       Date:  2019-08-28       Impact factor: 11.117

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.