Literature DB >> 27797777

INTEGRATE-neo: a pipeline for personalized gene fusion neoantigen discovery.

Jin Zhang1,2,3, Elaine R Mardis1,3,4,5,6, Christopher A Maher1,2,3,7.   

Abstract

Motivation: While high-throughput sequencing (HTS) has been used successfully to discover tumor-specific mutant peptides (neoantigens) from somatic missense mutations, the field currently lacks a method for identifying which gene fusions may generate neoantigens.
Results: We demonstrate the application of our gene fusion neoantigen discovery pipeline, called INTEGRATE-Neo, by identifying gene fusions in prostate cancers that may produce neoantigens. Availability and Implementation: INTEGRATE-Neo is implemented in C ++ and Python. Full source code and installation instructions are freely available from https://github.com/ChrisMaherLab/INTEGRATE-Neo . Contact: christophermaher@wustl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 27797777      PMCID: PMC5408800          DOI: 10.1093/bioinformatics/btw674

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The mutational landscape of cancer genomes results in the production of tumor specific peptides recognizable by immune molecules. These so-called neoantigens can be exploited for personalized cancer immunotherapy (Heemskerk ). To date, multiple studies have successfully used Next Generation Sequencing (NGS) to discover tumor specific neoantigens (Carreno ; Gubin ; Matsushita ). These analyses have relied on somatic missense mutation-based neoantigen discovery workflows like pVAC-Seq (Hundal ). Despite these successes, such methods do not consider gene fusions, which occur when two genes are rearranged in the genome to encode an aberrant transcript that may translate into a novel immunogenic peptide. To address this critical gap, we developed the first open source pipeline, called INTEGRATE-Neo, for gene fusion neoantigen discovery using NGS data. INTEGRATE-Neo expands the functionality of our highly accurate gene fusion discovery tool, INTEGRATE (Zhang ). Here, we apply INTEGRATE-Neo to the TCGA prostate cohort data (PRAD) to demonstrate its utility for identifying gene fusion neoantigens that may serve as personalized cancer immunotherapy targets.

2 The INTEGRATE-neo pipeline

The gene fusion neoantigen discovery pipeline, INTEGRATE-Neo, is comprised of the following steps: (1) gene fusion peptide prediction, (2) HLA allele prediction and (3) gene fusion neoantigen discovery (Fig. 1).
Fig. 1.

Overview of INTEGRATE-Neo. Green box is data, and blue box is a module

Overview of INTEGRATE-Neo. Green box is data, and blue box is a module The first step takes (1) the human reference genome in FASTA format, (2) gene models in GenePred format and (3) gene fusions in BEDPE format predicted by INTEGRATE as input to predict gene fusion peptides. The BEDPE follows the standards provided by The ICGC-TCGA DREAM Somatic Mutation Calling-RNA Challenge (SMC-RNA). This step annotates the gene fusion predictions with information such as gene fusion exonic boundaries, open reading frames (ORF), and the predicted peptide at the fusion junction. Each codon within the 5′ gene partner is inferred according to the starting position of the 5′ ORF. The amino acids spanning the fusion junction are determined by the codons that result from merging the sequences of both the 5′ and 3′ gene partners. The 3′ reading frames, which may have shifted, are then calculated for the remaining portion of the gene fusion transcript downstream of the fusion junction until a stop codon is encountered. These annotations are appended as user defined columns to the BEDPE file. Gene fusions that do not produce a predicted fusion peptide are subsequently filtered. The second step takes (1) high-throughput sequencing reads in FASTQ format and (2) reference HLA alleles in FASTA format as input to predict HLA alleles. It performs alignment using BWA (Li and Durbin, 2009) and predicts the HLA alleles using HLAminer v 1.3 (Warren ). This module outputs a Tab-separated value (TSV) file for the predicted HLA alleles that includes the four-digit HLA allele names, scoring metrics from HLAminer (score, e-value and confidence), and the prediction source. To increase the flexibility of INTEGRATE-Neo, a user has the option to upload their own HLA alleles in case they use another method, such as sequence-specific oligonucleotide probe hybridization and serotyping techniques, or already have algorithmic predicted results for their dataset. The third neoantigen discovery step takes in (1) a TSV file for the predicted HLA alleles, (2) an annotated BEDPE file for the predicted gene fusion peptides and (3) a file of the list of HLA alleles supported by NetMHC v 4.0 (Andreatta and Nielsen, 2016). The epitope lengths supplied are 8–11 by default but can be defined by the user. For each epitope length, a FASTA file is prepared with peptides of 2n − 1 amino acids, where n is the epitope length set by ‘−l’. The single amino acid in the middle spans the fusion junction. If the 5′ junction is at a full codon, then a peptide of 2n − 2 amino acids is used. If a non-coding region (UTR) is encountered, the peptide sequence can be shorter than 2n − 1 (or 2n − 2). The summarization module keeps the epitope with the highest predicted binding affinity (nM) passed a user-defined threshold (default: 500) for each neoantigen. The final result is a BEDPE file with gene fusion neoantigen predictions. The summarization module appends the epitope sequences, binding affinities, HLA alleles and metrics of the HLA alleles, as user defined columns, to the output BEDPE file. To ensure user-friendliness, all of the modules within INTEGRATE-Neo are designed as standalone tools with their own optional parameters. This enables users to incorporate INTEGRATE-Neo functions within their existing pipelines. INTEGRATE-Neo also ensures that all the modules are running the same version of the software. The paths to software and databases can be set in setup.ini.

3 Application to TCGA PRAD cohort

RNA-seq reads of 333 TCGA PRAD tumor samples were used to discover gene fusions and gene fusion neoantigens using INTEGRATE and INTEGRATE-Neo (Supplementary Methods). We discovered 1761 gene fusions in the 333 prostate cancer samples that generate 2707 fusion transcript isoforms (Supplementary Table S1). 2369 (87.5%) of the 2707 fusion transcripts have canonical exon boundaries, and 338 (12.5%) have junctions in other (non-exonic or truncated exonic) regions. 61 (3.5%) of the 1761 gene fusions are recurrent (occur in ≥ 2 patients; Supplementary Table S2; Supplementary Figs. S1 and S2) and 1700 (96.5%) are singletons (occur in 1 patient). INTEGRATE-Neo predicted 1600 (1300 singleton and 300 recurrent) fusion junction peptides for the 2,707 gene fusion transcripts. Of these, 240 (15%) (Supplementary Fig. S3a and Table S3) have epitopes with binding affinity scores ≤ 500 nM. The epitopes encompassed all epitope lengths as follows: 2.7%, 60.8%, 33.7% and 2.7% for 8, 9, 10 and 11 amino acids, respectively (Supplementary Fig. S3b). Interestingly, binding affinity scores skewed towards 1 rather than 500, with smaller scores indicating better binding affinities (Supplementary Fig. S3c). This pattern was consistent across all epitope lengths (Supplementary Fig. S3d). The most frequent gene fusion neoantigen from TMPRSS2-ERG is shown in Supplementary Figure S4. Epitope affinities in different HLA alleles and in recurrent gene fusions are shown in Supplementary Figure S5. Analysis of the TCGA PRAD data with the aforementioned parameters on our servers with 2.50 GHz Intel Xeon processors had an average runtime of 75.1 ± 29.2 seconds and average memory usage of 1.88 ± 0.90 GB per patient using single thread highlighting the efficiency of INTEGRATE-Neo (Supplementary Fig. S6).

4 Discussion

Here, we described the first automated gene fusion neoantigen discovery pipeline, INTEGRATE-Neo, and demonstrated that it can efficiently process the TCGA prostate cancer patient cohort. This revealed predicted gene fusions neoantigens across a distribution of epitope binding affinities. Overall, INTEGRATE-Neo provides a valuable resource to the cancer community by complementing existing somatic missense mutation-based neoantigen discovery methods to ensure that no potential neoantigen is missed in the search for personalized immunotherapy targets. Click here for additional data file.
  9 in total

1.  Gapped sequence alignment using artificial neural networks: application to the MHC class I system.

Authors:  Massimo Andreatta; Morten Nielsen
Journal:  Bioinformatics       Date:  2015-10-29       Impact factor: 6.937

2.  Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells.

Authors:  Beatriz M Carreno; Vincent Magrini; Michelle Becker-Hapak; Saghar Kaabinejadian; Jasreet Hundal; Allegra A Petti; Amy Ly; Wen-Rong Lie; William H Hildebrand; Elaine R Mardis; Gerald P Linette
Journal:  Science       Date:  2015-04-02       Impact factor: 47.728

Review 3.  The cancer antigenome.

Authors:  Bianca Heemskerk; Pia Kvistborg; Ton N M Schumacher
Journal:  EMBO J       Date:  2012-12-21       Impact factor: 11.598

4.  Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting.

Authors:  Hirokazu Matsushita; Matthew D Vesely; Daniel C Koboldt; Charles G Rickert; Ravindra Uppaluri; Vincent J Magrini; Cora D Arthur; J Michael White; Yee-Shiuan Chen; Lauren K Shea; Jasreet Hundal; Michael C Wendl; Ryan Demeter; Todd Wylie; James P Allison; Mark J Smyth; Lloyd J Old; Elaine R Mardis; Robert D Schreiber
Journal:  Nature       Date:  2012-02-08       Impact factor: 49.962

5.  Checkpoint blockade cancer immunotherapy targets tumour-specific mutant antigens.

Authors:  Matthew M Gubin; Xiuli Zhang; Heiko Schuster; Etienne Caron; Jeffrey P Ward; Takuro Noguchi; Yulia Ivanova; Jasreet Hundal; Cora D Arthur; Willem-Jan Krebber; Gwenn E Mulder; Mireille Toebes; Matthew D Vesely; Samuel S K Lam; Alan J Korman; James P Allison; Gordon J Freeman; Arlene H Sharpe; Erika L Pearce; Ton N Schumacher; Ruedi Aebersold; Hans-Georg Rammensee; Cornelis J M Melief; Elaine R Mardis; William E Gillanders; Maxim N Artyomov; Robert D Schreiber
Journal:  Nature       Date:  2014-11-27       Impact factor: 49.962

6.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

7.  Derivation of HLA types from shotgun sequence datasets.

Authors:  René L Warren; Gina Choe; Douglas J Freeman; Mauro Castellarin; Sarah Munro; Richard Moore; Robert A Holt
Journal:  Genome Med       Date:  2012-12-10       Impact factor: 11.117

8.  INTEGRATE: gene fusion discovery using whole genome and transcriptome data.

Authors:  Jin Zhang; Nicole M White; Heather K Schmidt; Robert S Fulton; Chad Tomlinson; Wesley C Warren; Richard K Wilson; Christopher A Maher
Journal:  Genome Res       Date:  2015-11-10       Impact factor: 9.043

9.  pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens.

Authors:  Jasreet Hundal; Beatriz M Carreno; Allegra A Petti; Gerald P Linette; Obi L Griffith; Elaine R Mardis; Malachi Griffith
Journal:  Genome Med       Date:  2016-01-29       Impact factor: 11.117

  9 in total
  37 in total

Review 1.  Whole-Genome Sequencing in Cancer.

Authors:  Eric Y Zhao; Martin Jones; Steven J M Jones
Journal:  Cold Spring Harb Perspect Med       Date:  2019-03-01       Impact factor: 6.915

Review 2.  Applications of Immunogenomics to Cancer.

Authors:  X Shirley Liu; Elaine R Mardis
Journal:  Cell       Date:  2017-02-09       Impact factor: 41.582

Review 3.  Cancer transcriptome profiling at the juncture of clinical translation.

Authors:  Marcin Cieślik; Arul M Chinnaiyan
Journal:  Nat Rev Genet       Date:  2017-12-27       Impact factor: 53.242

4.  FusionPro, a Versatile Proteogenomic Tool for Identification of Novel Fusion Transcripts and Their Potential Translation Products in Cancer Cells.

Authors:  Chae-Yeon Kim; Keun Na; Saeram Park; Seul-Ki Jeong; Jin-Young Cho; Heon Shin; Min Jung Lee; Gyoonhee Han; Young-Ki Paik
Journal:  Mol Cell Proteomics       Date:  2019-06-17       Impact factor: 5.911

Review 5.  Neoantigen prediction and computational perspectives towards clinical benefit: recommendations from the ESMO Precision Medicine Working Group.

Authors:  L De Mattos-Arruda; M Vazquez; F Finotello; R Lepore; E Porta; J Hundal; P Amengual-Rigo; C K Y Ng; A Valencia; J Carrillo; T A Chan; V Guallar; N McGranahan; J Blanco; M Griffith
Journal:  Ann Oncol       Date:  2020-06-28       Impact factor: 32.976

6.  Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers.

Authors:  Roozbeh Dehghannasiri; Donald E Freeman; Milos Jordanski; Gillian L Hsieh; Ana Damljanovic; Erik Lehnert; Julia Salzman
Journal:  Proc Natl Acad Sci U S A       Date:  2019-07-15       Impact factor: 11.205

7.  Design of Personalized Neoantigen RNA Vaccines Against Cancer Based on Next-Generation Sequencing Data.

Authors:  Begoña Alburquerque-González; María Dolores López-Abellán; Ginés Luengo-Gil; Silvia Montoro-García; Pablo Conesa-Zamora
Journal:  Methods Mol Biol       Date:  2022

8.  ProTECT-Prediction of T-Cell Epitopes for Cancer Therapy.

Authors:  Arjun A Rao; Ada A Madejska; Jacob Pfeil; Benedict Paten; Sofie R Salama; David Haussler
Journal:  Front Immunol       Date:  2020-11-10       Impact factor: 7.561

9.  pVACtools: A Computational Toolkit to Identify and Visualize Cancer Neoantigens.

Authors:  Jasreet Hundal; Susanna Kiwala; Joshua McMichael; Christopher A Miller; Huiming Xia; Alexander T Wollam; Connor J Liu; Sidi Zhao; Yang-Yang Feng; Aaron P Graubert; Amber Z Wollam; Jonas Neichin; Megan Neveau; Jason Walker; William E Gillanders; Elaine R Mardis; Obi L Griffith; Malachi Griffith
Journal:  Cancer Immunol Res       Date:  2020-01-06       Impact factor: 11.151

Review 10.  Alternative tumour-specific antigens.

Authors:  Christof C Smith; Sara R Selitsky; Shengjie Chai; Paul M Armistead; Benjamin G Vincent; Jonathan S Serody
Journal:  Nat Rev Cancer       Date:  2019-07-05       Impact factor: 60.716

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.