Literature DB >> 31117948

NeoPredPipe: high-throughput neoantigen prediction and recognition potential pipeline.

Ryan O Schenck^1,2, Eszter Lakatos³, Chandler Gatenbee⁴, Trevor A Graham³, Alexander R A Anderson⁴.

Abstract

BACKGROUND: Next generation sequencing has yielded an unparalleled means of quickly determining the molecular make-up of patient tumors. In conjunction with emerging, effective immunotherapeutics for a number of cancers, this rapid data generation necessitates a paired high-throughput means of predicting and assessing neoantigens from tumor variants that may stimulate immune response.
RESULTS: Here we offer NeoPredPipe (Neoantigen Prediction Pipeline) as a contiguous means of predicting putative neoantigens and their corresponding recognition potentials for both single and multi-region tumor samples. NeoPredPipe is able to quickly provide summary information for researchers, and clinicians alike, on predicted neoantigen burdens while providing high-level insights into tumor heterogeneity given somatic mutation calls and, optionally, patient HLA haplotypes. Given an example dataset we show how NeoPredPipe is able to rapidly provide insights into neoantigen heterogeneity, burden, and immune stimulation potential.
CONCLUSIONS: Through the integration of widely adopted tools for neoantigen discovery NeoPredPipe offers a contiguous means of processing single and multi-region sequence data. NeoPredPipe is user-friendly and adaptable for high-throughput performance. NeoPredPipe is freely available at https://github.com/MathOnco/NeoPredPipe .

Entities: Chemical

Keywords: Cancer; Evolution; Heterogeneity; Neoantigens; Next-generation sequencing

Mesh：

Substances：
Antigens, Neoplasm

Year: 2019 PMID： 31117948 PMCID： PMC6532147 DOI： 10.1186/s12859-019-2876-4

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

Cancer cells are fraught with genomic variants in all regions of the genome with high degrees of heterogeneity in a spatially complex tumor. This intra-tumor heterogeneity (ITH) realizes a fitness landscape upon which natural selection can act (reviewed by [1]). Neoantigens, epitopes derived from proteins translated from non-synonymous variants, are able to make their way to the cell surface in the hopes of stimulating an immune response after a number of cellular processing steps have occurred, primarily proteosomal cleavage and binding with major histocompatibility complexes (MHC) I or II. This binding depends upon the patient specific human leukocyte antigen (HLA) alleles. From here, the bound neoantigen with its MHC-Class I complex makes its way to the cell surface where it may bind with cytotoxic T-cell receptors thereby eliciting infiltration of cytotoxic T-cells capable of detecting and eliminating cells carrying the neoantigen in the absence of immune evading tactics. The immune response is strongly influenced by the total number of neoantigens within a tumor, especially in hyper-mutated cancers ([2]), as well as the ITH of antigenic mutations ([3]). Recent advances in sequencing techniques allow for multi-region sequencing approaches whereby adjacent regions of the same tumor or tissue are able to provide greater insights into variant clonality (i.e. truly clonal, subclonal, or shared). There is increasing evidence that the neoantigen landscape of tumours can be highly heterogeneous, containing regions of subclonal immune escape and significantly different neoantigen load that can influence a patient’s response to immunotherapy [4-6]. A number of tools are available that provide mutated peptide annotation, binding affinity prediction, wild-type and mutant peptide comparison, and neoantigen ranking based on these measures [7-10]. Their input varies from raw sequencing files (e.g. fastq) [7, 8, 10] to highly annotated vcf files [9]; some provide HLA-typing as part of their pipeline [7, 10], but require further dependencies for HLAtyping software. Most rely on a version of netMHC or netMHCpan for binding prediction, but [9] offers a choice of additional software. For an in-depth comparison of available pipelines for neoantigen calling, we refer the reader to the recent review of Lancaster et al. [11]. Despite the increasing number and diversity of neoantigen-prediction tools, none of them possess the capability of providing predicitions on multi-region sequence data and assessing ITH of the antigenic landscape of tumours. Here, we present NeoPredPipe, a pipeline connecting commonly used bioinformatic software via custom python scripts to allow for the processing of single and multi-region variant call format (VCF) files, variant annotations, neoantigen predictions, cross-referencing with known epitopes, and performing in silico TCR recognition potential predictions in a single, clear, and proficient workflow (Fig. 1).

Fig. 1

NeoPredPipe workflow differentiating between user steps (green) and execution processes (purple). NeoPredPipe provides low level details and high level summary statistics as output for downstream analysis (red)

Implementation

The first stage in neoantigen identification from a VCF file is the proper annotation of variants to identify non-synonymous variants. To this end, NeoPredPipe employs the widely used and efficient genomics tool, ANNOVAR ([12]). Specifically, ANNOVAR processes samples in a way that prioritizes exonic variants, this step provides a useful means for quickly partitioning variant calls for downstream applications. The user is able to specify the genome build that they would like to use, provided it is compatible with ANNOVAR. Finally, using the coding_change function of ANNOVAR and custom code, the mutated amino acid sequence is predicted from annotated nonsynonymous variant calls, and the peptide sequence surrounding the newly introduced amino acid is extracted for epitope prediction. From this step, mutations that give rise to a single amino acid change, and mutations that mutate a larger peptide segment (e.g. indels and stop-losses) are handled separately and reported in separate files to help further assessment. Once the VCF files have been annotated and partitioned with ANNOVAR, the program determines if HLA haplotypes have been provided by the user containing the HLA-A, -B, and -C haplotypes. NeoPredPipe does not include HLA allele identification as this step in the pipeline is highly dependent upon the source of the data (WES, WGS, targeted gene panels, transcriptome data, or conducted via experimental methods), but the pipeline’s github page provides detailed advice on haplotyping from WES/WGS data using the popular tool POLYSOLVER [13], and the output of POLYSOLVER is automatically processed in NeoPredPipe. In cases where no HLA haplotype information is available the most common alleles of each haplotype are assessed; while in cases where the HLA haplotypes are homozygous only that HLA haplotype is used for prediction. HLA haplotypes are cross-referenced with available HLA haplotypes prior to executing netMHCpan ([14]) for the primary neoantigen predictions. As with the primary tool, the user is able to specify the epitope lengths to conduct predictions for (typically epitopes of 8-, 9-, or 10-mers). The output from this process yields a single file containing either filtered or unfiltered (dependent on user options) neoantigen predictions with information on the sample possessing the neoantigen and, in the case of multi-region variant calling, a presence/absence indicator for each of the sequenced regions. These predicted neoantigens are then, optionally, cross-referenced with normal peptides utilizing PeptideMatch ([15]), whereby the candidate epitopes are assessed for novelty against a reference proteome that can be supplied by the user as a fasta file (e.g. from Ensembl or UniProt). When available, users may also provide expression data as a tsv file specific to each sample (or a single reference file) to quickly assess expression levels of the gene carrying a predicted neoantigen. This information is included in the final output table. The steps outlined above deliver candidate information for neoantigens from provided variant calls that may be presented to cytotoxic T-cells, however, this does not inform the likelihood of a neoantigen eliciting an immune response (i.e. being recognised by a TCR). In order to predict the recognition potential we employ the algorithms and process utilized by [16]. The recognition potential is defined as the product of A and R, where A is the amplitude of the ratio of the relative probabilities of binding for the wild-type and mutant epitopes to the MHC-class I molecules; and R is a measure of similarity to pathogenic peptides, meant to represent the probability that the neoantigen in question is recognised by a TCR clone already present in the tissue/blood. To define A it is necessary to perform neoantigen predictions for the wildtype and mutant epitope: this is not performed by default by NeoPredPipe, but is supplied as an option to employ as a contiguous pipeline. To define R, NeoPredPipe utilizes the multistate thermodynamic model employed by [16], which requires alignment scores for each epitope to a curated Immune Epitope Database list of known epitopes (can be refined and updated by the user, but is provided). In order to incorporate the ability to assess ITH in regards to both effective mutations (non-synonymous variants and indels) and neoantigen burdens, NeoPredPipe is capable of handling multi-region VCF files; further these files can be multi-region in only a select number of samples and differ in the number of regions. Similarly, NeoPredPipe can process multi-region expression data for samples where information on regions are compiled into separate columns. Thus NeoPredPipe is able to efficiently handle various, potentially multi-region experimental designs for neoantigen prediction and assessments providing a summary table and an optional web-based visualization tool for downstream statistical and in-depth analysis.

Results

The output of the pipeline depends largely on the options set by the user, but at the very least, NeoPredPipe provides two tables of putative neoantigens and their predicted binding affinities, one for single nucleotide/amino acid, and one for indel(-type) variants. With additional options selected it is possible to include, within a single output, whether an epitope matches a reference proteome, its expression on the RNA level and the neoantigen’s recognition potential. In additon, for rapid assessment, NeoPredPipe yields summary statistics on the neoantigen burden for each sample, a rapidly executed web-based visualization, as well as information to assess ITH by reporting neoantigen burdens for clonal, subclonal, and shared variants for multi-region samples. A detailed description of NeoPredPipe’s output tables and each field in these can be found at https://github.com/MathOnco/NeoPredPipe.

Use Case

While a small, two sample, multi-region example dataset is provided with the source code for users, we demonstrate the usefulness of NeoPredPipe by applying it to a previously published dataset examining the evolutionary landscape of colorectal tumors [17]. We select two exemplary patient samples (Adenoma 3 and Carcinoma 7 in the original paper) from the dataset, and apply our pipeline using default parameters to evaluate neoantigens in each sample. Figure 2 illustrates the information included in the standard output of NeoPredPipe and potential analysis that can be performed if NeoPredPipe is combined with the output of other standard bioinformatic methods.

Fig. 2

Analysis of neoantigens in two colorectal tumors using NeoPredPipe. a Venn diagram of all neoantigens in the five regions of Adenoma 3. b Number of neoantigens in the two samples that are clonal (present in all regions, shown in blue), shared (present in at least two regions, in yellow) or subclonal (present in a single region, red). Separate counts of weak and strong MHC-binding neoantigens (WB and SB, respectively) are also shown. c Distribution of recognition potential values of neoantigens present in Adenoma 3 (green) and Carcinoma 7 (red). The boxplots represent the median and upper and lower 25 percentile. Only neoantigens with recognition potential higher than zero are shown. d Phylogenetic tree reconstructed from all exonic mutations for Adenoma 3 (left) and Carcinoma 7 (right). Pie-charts and the bar-charts represent the number of weak (orange) and strong (red) binder neoantigens assigned to each branch. The size of each circle is proportional to the percentage of total neoantigens on that branch Figure 2a provides a summary of the complex interactions between different regions of Adenoma 3, and highlights both Region 4, which harbours the highest amount of subclonal (only present in a single region) neoantigens, and the overall clonality of the sample, with 72 neoantigens detected in all regions. For quick analysis, NeoPredPipe directly outputs a summary of the clonality of neoantigens, also divided into categories of strong and weak binders (peptides with a netMHCpan percentile rank ≤0.5 and ≤2, respectively, as recommended in [14]). Figure 2b visualizes this summary on two bar-charts for Adenoma 3 and Carcinoma 7. We find that whilst the number of shared neoantigens (present in more than one, but not all regions) is highly similar between the two samples, Carcinoma 7 harbours both more clonal (present in all regions) and subclonal neoantigens; and in total 26% of the neoantigens are clonal, compared to 16% of Adenoma 3. Figure 2c shows the recognition potential value for all neoantigens in the two samples. NeoPredPipe identified 10 peptides in Adenoma 3 and 9 in Carcinoma 7 with a recognition potential value above 1. In Fig. 2d, we provide an example of integrating NeoPredPipe outputs with downstream multi-region variant analysis. By inferring phylogenetic trees of each tumor, constructed using all exonic mutations with a variant allele frequency above 0.05 (see [17] for full methods), we find that neoantigen distributions across regions can reflect the phylogenetic distance of regions and clonal structure of samples. 31% and 23.5% of total exonic mutations are clonal in Carcinoma 7 and Adenoma 3, similarly to the clonality of neoantigens shown in Panel B. This approach also highlights regions with neoantigen loads different from their closest neighbors, such as Region61 and Region62 of Carcinoma 7. Therefore the analysis can inform future experimental and bioinformatic investigations of samples allowing for new evolutionary and mechanistic insights into tumor development, evolution, and progression.

Conclusions

We present NeoPredPipe, an efficient, high-throughput, and user-friendly pipeline for neoantigen prediction and interrogation for single and multi-region tumor VCF files. By tying together commonly utilized bioinformatics toolsets and integrating recent advances in neoantigen assessment, NeoPredPipe yields concise information typically required by researchers and clinicians. Through user options, based on the individuals own computational limitations, the pipeline is scalable for a high performance computing (HPC) cluster environment and customizable for individual research questions. Furthermore, unlike existing methods[7-10], NeoPredPipe can process a directory containing numerous samples in a single command; therefore provides a user-friendly way for not computer-proficient users to analyse the output of large studies or compare against reference datasets. All source code and an extensive read me for each component of NeoPredPipe with all pipeline options are available at https://github.com/MathOnco/NeoPredPipe.

Availability and requirements

Project name: NeoPredPipe Project home page: https://github.com/MathOnco/NeoPredPipe Operating system: Unix-based operating system Programming languages: Python and Bash Other requirements: Python 2.7, ANNOVAR, netMHCpan, PeptideMatch, and (optionally) NCBI BlastX+. License: GNU GPLv3 Any restrictions to use by non-academics: None

15 in total

1. A fast Peptide Match service for UniProt Knowledgebase.

Authors: Chuming Chen; Zhiwen Li; Hongzhan Huang; Baris E Suzek; Cathy H Wu
Journal: Bioinformatics Date: 2013-08-19 Impact factor: 6.937

Review 2. Clonal Heterogeneity and Tumor Evolution: Past, Present, and the Future.

Authors: Nicholas McGranahan; Charles Swanton
Journal: Cell Date: 2017-02-09 Impact factor: 41.582

3. MuPeXI: prediction of neo-epitopes from tumor sequencing data.

Authors: Anne-Mette Bjerregaard; Morten Nielsen; Sine Reker Hadrup; Zoltan Szallasi; Aron Charles Eklund
Journal: Cancer Immunol Immunother Date: 2017-04-20 Impact factor: 6.968

4. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Authors: Kai Wang; Mingyao Li; Hakon Hakonarson
Journal: Nucleic Acids Res Date: 2010-07-03 Impact factor: 16.971

Review 5. Neoantigens in cancer immunotherapy.

Authors: Ton N Schumacher; Robert D Schreiber
Journal: Science Date: 2015-04-03 Impact factor: 47.728

6. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes.

Authors: Sachet A Shukla; Michael S Rooney; Mohini Rajasagi; Grace Tiao; Philip M Dixon; Michael S Lawrence; Jonathan Stevens; William J Lane; Jamie L Dellagatta; Scott Steelman; Carrie Sougnez; Kristian Cibulskis; Adam Kiezun; Nir Hacohen; Vladimir Brusic; Catherine J Wu; Gad Getz
Journal: Nat Biotechnol Date: 2015-11 Impact factor: 54.908

7. TSNAD: an integrated software for cancer somatic mutation and tumour-specific neoantigen detection.

Authors: Zhan Zhou; Xingzheng Lyu; Jingcheng Wu; Xiaoyue Yang; Shanshan Wu; Jie Zhou; Xun Gu; Zhixi Su; Shuqing Chen
Journal: R Soc Open Sci Date: 2017-04-05 Impact factor: 2.963

8. CloudNeo: a cloud pipeline for identifying patient-specific tumor neoantigens.

Authors: Preeti Bais; Sandeep Namburi; Daniel M Gatti; Xinyu Zhang; Jeffrey H Chuang
Journal: Bioinformatics Date: 2017-10-01 Impact factor: 6.937

9. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade.

Authors: Nicholas McGranahan; Andrew J S Furness; Rachel Rosenthal; Sofie Ramskov; Rikke Lyngaa; Sunil Kumar Saini; Mariam Jamal-Hanjani; Gareth A Wilson; Nicolai J Birkbak; Crispin T Hiley; Thomas B K Watkins; Seema Shafi; Nirupa Murugaesu; Richard Mitter; Ayse U Akarca; Joseph Linares; Teresa Marafioti; Jake Y Henry; Eliezer M Van Allen; Diana Miao; Bastian Schilling; Dirk Schadendorf; Levi A Garraway; Vladimir Makarov; Naiyer A Rizvi; Alexandra Snyder; Matthew D Hellmann; Taha Merghoub; Jedd D Wolchok; Sachet A Shukla; Catherine J Wu; Karl S Peggs; Timothy A Chan; Sine R Hadrup; Sergio A Quezada; Charles Swanton
Journal: Science Date: 2016-03-03 Impact factor: 47.728

10. pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens.

Authors: Jasreet Hundal; Beatriz M Carreno; Allegra A Petti; Gerald P Linette; Obi L Griffith; Elaine R Mardis; Malachi Griffith
Journal: Genome Med Date: 2016-01-29 Impact factor: 11.117

26 in total

Review 1. Computational cancer neoantigen prediction: current status and recent advances.

Authors: G Fotakis; Z Trajanoski; D Rieder
Journal: Immunooncol Technol Date: 2021-11-20

2. Mutational signature profiling classifies subtypes of clinically different mismatch-repair-deficient tumours with a differential immunogenic response potential.

Authors: Mar Giner-Calabuig; Seila De Leon; Julian Wang; Tara D Fehlmann; Chinedu Ukaegbu; Joanna Gibson; Miren Alustiza-Fernandez; Maria-Dolores Pico; Cristina Alenda; Maite Herraiz; Marta Carrillo-Palau; Inmaculada Salces; Josep Reyes; Silvia P Ortega; Antònia Obrador-Hevia; Michael Cecchini; Sapna Syngal; Elena Stoffel; Nathan A Ellis; Joann Sweasy; Rodrigo Jover; Xavier Llor; Rosa M Xicola
Journal: Br J Cancer Date: 2022-02-23 Impact factor: 9.075

Review 3. An overview of immunoinformatics approaches and databases linking T cell receptor repertoires to their antigen specificity.

Authors: Ivan V Zvyagin; Vasily O Tsvetkov; Dmitry M Chudakov; Mikhail Shugay
Journal: Immunogenetics Date: 2019-11-18 Impact factor: 2.846

4. Evolution of Genomic and T-cell Repertoire Heterogeneity of Malignant Pleural Mesothelioma Under Dasatinib Treatment.

Authors: Runzhe Chen; Won-Chul Lee; Anne S Tsao; Jianjun Zhang; Junya Fujimoto; Jun Li; Xin Hu; Reza Mehran; David Rice; Stephen G Swisher; Boris Sepesi; Hai T Tran; Chi-Wan Chow; Latasha D Little; Curtis Gumbs; Cara Haymaker; John V Heymach; Ignacio I Wistuba; J Jack Lee; P Andrew Futreal; Jianhua Zhang; Alexandre Reuben
Journal: Clin Cancer Res Date: 2020-08-14 Impact factor: 12.531

Review 5. Best practices for bioinformatic characterization of neoantigens for clinical utility.

Authors: Megan M Richters; Huiming Xia; Katie M Campbell; William E Gillanders; Obi L Griffith; Malachi Griffith
Journal: Genome Med Date: 2019-08-28 Impact factor: 11.117

Review 6. Neoantigen vaccine: an emerging tumor immunotherapy.

Authors: Miao Peng; Yongzhen Mo; Yian Wang; Pan Wu; Yijie Zhang; Fang Xiong; Can Guo; Xu Wu; Yong Li; Xiaoling Li; Guiyuan Li; Wei Xiong; Zhaoyang Zeng
Journal: Mol Cancer Date: 2019-08-23 Impact factor: 27.401

7. Hypoxia increases mutational load of breast cancer cells through frameshift mutations.

Authors: Goutham Hassan Venkatesh; Pamela Bravo; Walid Shaaban Moustafa Elsayed; Francis Amirtharaj; Bartosz Wojtas; Raefa Abou Khouzam; Husam Hussein Nawafleh; Sandeep Mallya; Kapaettu Satyamoorthy; Philippe Dessen; Filippo Rosselli; Jerome Thiery; Salem Chouaib
Journal: Oncoimmunology Date: 2020-04-16 Impact factor: 8.110

Review 8. Protecting Tumors by Preventing Human Papilloma Virus Antigen Presentation: Insights from Emerging Bioinformatics Algorithms.

Authors: Elizabeth Gensterblum-Miller; J Chad Brenner
Journal: Cancers (Basel) Date: 2019-10-12 Impact factor: 6.639

Review 9. Evolution of Cancer Vaccines-Challenges, Achievements, and Future Directions.

Authors: Ban Qi Tay; Quentin Wright; Rahul Ladwa; Christopher Perry; Graham Leggatt; Fiona Simpson; James W Wells; Benedict J Panizza; Ian H Frazer; Jazmina L G Cruz
Journal: Vaccines (Basel) Date: 2021-05-20

10. Evolutionary dynamics of neoantigens in growing tumors.

Authors: Eszter Lakatos; Marc J Williams; Ryan O Schenck; William C H Cross; Jacob Househam; Luis Zapata; Benjamin Werner; Chandler Gatenbee; Mark Robertson-Tessi; Chris P Barnes; Alexander R A Anderson; Andrea Sottoriva; Trevor A Graham
Journal: Nat Genet Date: 2020-09-14 Impact factor: 38.330