Preeti Bais1, Sandeep Namburi1, Daniel M Gatti2, Xinyu Zhang1, Jeffrey H Chuang1,3. 1. The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA. 2. The Jackson Laboratory, Bar Harbor, ME 04609, USA. 3. Department of Genetics and Genome Sciences, University of Connecticut Health, Farmington, CT 06032, USA.
Abstract
SUMMARY: We present CloudNeo, a cloud-based computational workflow for identifying patient-specific tumor neoantigens from next generation sequencing data. Tumor-specific mutant peptides can be detected by the immune system through their interactions with the human leukocyte antigen complex, and neoantigen presence has recently been shown to correlate with anti T-cell immunity and efficacy of checkpoint inhibitor therapy. However computing capabilities to identify neoantigens from genomic sequencing data are a limiting factor for understanding their role. This challenge has grown as cancer datasets become increasingly abundant, making them cumbersome to store and analyze on local servers. Our cloud-based pipeline provides scalable computation capabilities for neoantigen identification while eliminating the need to invest in local infrastructure for data transfer, storage or compute. The pipeline is a Common Workflow Language (CWL) implementation of human leukocyte antigen (HLA) typing using Polysolver or HLAminer combined with custom scripts for mutant peptide identification and NetMHCpan for neoantigen prediction. We have demonstrated the efficacy of these pipelines on Amazon cloud instances through the Seven Bridges Genomics implementation of the NCI Cancer Genomics Cloud, which provides graphical interfaces for running and editing, infrastructure for workflow sharing and version tracking, and access to TCGA data. AVAILABILITY AND IMPLEMENTATION: The CWL implementation is at: https://github.com/TheJacksonLaboratory/CloudNeo. For users who have obtained licenses for all internal software, integrated versions in CWL and on the Seven Bridges Cancer Genomics Cloud platform (https://cgc.sbgenomics.com/, recommended version) can be obtained by contacting the authors. CONTACT: jeff.chuang@jax.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: We present CloudNeo, a cloud-based computational workflow for identifying patient-specific tumor neoantigens from next generation sequencing data. Tumor-specific mutant peptides can be detected by the immune system through their interactions with the human leukocyte antigen complex, and neoantigen presence has recently been shown to correlate with anti T-cell immunity and efficacy of checkpoint inhibitor therapy. However computing capabilities to identify neoantigens from genomic sequencing data are a limiting factor for understanding their role. This challenge has grown as cancer datasets become increasingly abundant, making them cumbersome to store and analyze on local servers. Our cloud-based pipeline provides scalable computation capabilities for neoantigen identification while eliminating the need to invest in local infrastructure for data transfer, storage or compute. The pipeline is a Common Workflow Language (CWL) implementation of human leukocyte antigen (HLA) typing using Polysolver or HLAminer combined with custom scripts for mutant peptide identification and NetMHCpan for neoantigen prediction. We have demonstrated the efficacy of these pipelines on Amazon cloud instances through the Seven Bridges Genomics implementation of the NCI Cancer Genomics Cloud, which provides graphical interfaces for running and editing, infrastructure for workflow sharing and version tracking, and access to TCGA data. AVAILABILITY AND IMPLEMENTATION: The CWL implementation is at: https://github.com/TheJacksonLaboratory/CloudNeo. For users who have obtained licenses for all internal software, integrated versions in CWL and on the Seven Bridges Cancer Genomics Cloud platform (https://cgc.sbgenomics.com/, recommended version) can be obtained by contacting the authors. CONTACT: jeff.chuang@jax.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Mutations in tumor genomes create specific peptide changes that can be recognized by the immune system and influence sensitivity to immunotherapy (van der Most ; van Rooij ). The mechanism of action involves binding of native major histocompatibility complex (MHC) class I and II molecules, a.k.a. human leukocyte antigen (HLA) complex I and II molecules, to the novel peptide sequences that result from protein-changing somatic mutations in cancer cells. Cells presenting these neoantigens are recognized as foreign by T-cells, which then selectively destroy them. With the arrival of new next generation sequencing platforms, it has become possible to interrogate the genomes of patient tumors and computationally predict T-cell reactivity against putative mutation-derived neoantigens (Schumacher ) by estimating the binding of MHC class I molecules to each new peptide sequence.Several bioinformatics tools are routinely used to predict tumor neoantigen—MHC class I binding from sequencing data. For example, HLAMiner (Warren ) and Polysolver (Shukla ) are software tools that can predict patient-specific HLA classes I and II typing from RNA sequencing data, and netMHCpan (Nielsen ) predicts HLA-peptide binding. Prior studies in cancer immunotherapy have successfully used these tools to predict the efficacy of immuno-oncological therapies in a patient-specific manner (Rizvi ; Van Allen ), demonstrating the importance of making such methods easily available to the general research community. However, the cost of developing and maintaining the bioinformatics infrastructure to perform this type of analysis is substantial. In particular, research groups are generating increasing amounts of custom sequencing data or investigating massive consortium datasets such as The Cancer Genome Atlas (Weinstein ), for which data transfer and scalability of computing can be significant obstacles to analysis on local compute clusters. To resolve these problems, we have developed a cloud-based analysis pipeline for tumor neoantigen detection.
2 Description
We developed the CloudNeo pipeline on the Seven Bridges cloud platform as part of the National Cancer Institute’s Cancer Genomics Cloud [http://www.cancergenomicscloud.org/] (CGC), which uses Docker containers to execute the tasks in the workflow. Briefly, CloudNeo takes a vcf file (for mutations) and bam file (for HLA typing) as inputs and then outputs HLA binding affinity predictions for all mutated peptides (see Supplementary Fig. S1). A first input to CloudNeo is a list of non-synonymous mutations in vcf file format. There are multiple somatic mutation calling pipelines that can be used to generate and filter this vcf file (Alioto ), including several which are available through the CGC. The genomic variants are translated into amino acid changes using the VEP tool (McLaren ) and a custom R script that we have created called Protein_Translator. The output of the custom tool is a list of N-amino-acid-long peptide sequences in a fasta format, such that the single peptide change is in the middle of the N-mer. In parallel, Protein_Translator generates another fasta file for the homologous N-mers with no peptide mutation. Users have options to calculate the HLA types using either HLAminer or Polysolver. Six HLA types are predicted, namely the top two predictions for each of HLA-A, HLA-B and HLA-C. The final step in the pipeline is the NetMHCpan tool, which uses the HLA types and the N-mer mutant peptide sequences to calculate the binding affinities for potential neoantigens. Affinities between the two HLA-A, two HLA-B, and two HLA-C molecules and each of the ([N/2]+1)mer peptide subsequences within the N-mers are computed. The output of the pipeline is a list of peptide subsequences along with the MHC binding affinity scores for each of the six HLA types. Similar results are generated for the homologous unmutated peptide sequences as a comparison.To test this pipeline, we analyzed 23 melanoma tumor samples (Hugo ) as described earlier using both the HLAminer and Polysolver versions of the pipeline. We then predicted neo-antigens based on criteria of strong mutant-MHC binding affinity (NetMHCpan score < 500), non-zeroexpression of the transcript containing the mutation, and lack of strong affinity between the non-mutated sequence and the MHC (NetMHCpan score for the non-mutant sequence ≥ 500). For each sample we merged the set of neoepitopes predicted across the six HLA types. The neoepitope load ranged from 0 to 1244 with an average of 107.89 using the HLAminer version of the pipeline. For the Polysolver version of the pipeline, the same filtering criteria were used and the neoepitope load was from 0 to 1417 with an average load of 133.53. The differences in the two pipeline results were due to differing HLA type predictions by Polysolver and HLAminer. 16 HLAtype predictions by the tools overlapped with each other, and there were 102 unique HLA predictions from Polysolver and 122 unique predictions from HLAminer. While our HLA type predictions were based on RNA-seq data, CloudNeo can also use DNA data as inputs for HLA calling. The average wall time required to run the pipeline for a given tumor on CGC was 8 h and 2 min for the HLAminer version and 7 h and 25 min for the Polysolver version (see Supplementary Material ‘Pipeline Performance’).
3 Discussion
Other recent methods, such as (Hundal ), are similar to CloudNeo in providing a computational pipeline for neoantigen prediction. However, to our knowledge CloudNeo is the only such pipeline that has been developed for cloud computing. This allows users to realize advantages of cloud analysis, including massive computing scalability and access to large datasets on the CGC such as TCGA, as these can be reached without downloading to a local server. This cloud approach also makes CloudNeo easy to match to time and budget restrictions on demand, providing a flexible computational approach for the research community. A version of the CloudNeo pipeline is openly available at the Github site as a Common Workflow Language (CWL) implementation that can be run using Rabix (Kaushik ), allowing for running on systems including AWS, Google Compute Engine and Azure. Licenses for academically licensed software (HLAminer and NetMHCpan) must be obtained by users, but simple instructions to do so are provided at the Github site. Users with licenses can also contact the authors to request a version with all software integrated. Full versions are available either in CWL or as a workflow on the Seven Bridges implementation of the CGC. The CGC version is recommended, as this provides additional functionality including graphical interfaces for running and editing, simple workflow sharing and version tracking, improved calling of multiple cloud instances, and access to TCGA data. Full details and docs are at https://github.com/TheJacksonLaboratory/CloudNeo.Click here for additional data file.
Authors: R G van der Most; A Sette; C Oseroff; J Alexander; K Murali-Krishna; L L Lau; S Southwood; J Sidney; R W Chesnut; M Matloubian; R Ahmed Journal: J Immunol Date: 1996-12-15 Impact factor: 5.422
Authors: Eliezer M Van Allen; Diana Miao; Bastian Schilling; Sachet A Shukla; Christian Blank; Lisa Zimmer; Antje Sucker; Uwe Hillen; Marnix H Geukes Foppen; Simone M Goldinger; Jochen Utikal; Jessica C Hassel; Benjamin Weide; Katharina C Kaehler; Carmen Loquai; Peter Mohr; Ralf Gutzmer; Reinhard Dummer; Stacey Gabriel; Catherine J Wu; Dirk Schadendorf; Levi A Garraway Journal: Science Date: 2015-09-10 Impact factor: 47.728
Authors: John N Weinstein; Eric A Collisson; Gordon B Mills; Kenna R Mills Shaw; Brad A Ozenberger; Kyle Ellrott; Ilya Shmulevich; Chris Sander; Joshua M Stuart Journal: Nat Genet Date: 2013-10 Impact factor: 38.330
Authors: Naiyer A Rizvi; Matthew D Hellmann; Alexandra Snyder; Pia Kvistborg; Vladimir Makarov; Jonathan J Havel; William Lee; Jianda Yuan; Phillip Wong; Teresa S Ho; Martin L Miller; Natasha Rekhtman; Andre L Moreira; Fawzia Ibrahim; Cameron Bruggeman; Billel Gasmi; Roberta Zappasodi; Yuka Maeda; Chris Sander; Edward B Garon; Taha Merghoub; Jedd D Wolchok; Ton N Schumacher; Timothy A Chan Journal: Science Date: 2015-03-12 Impact factor: 47.728
Authors: Nienke van Rooij; Marit M van Buuren; Daisy Philips; Arno Velds; Mireille Toebes; Bianca Heemskerk; Laura J A van Dijk; Sam Behjati; Henk Hilkmann; Dris El Atmioui; Marja Nieuwland; Michael R Stratton; Ron M Kerkhoven; Can Kesmir; John B Haanen; Pia Kvistborg; Ton N Schumacher Journal: J Clin Oncol Date: 2013-09-16 Impact factor: 44.544
Authors: Sachet A Shukla; Michael S Rooney; Mohini Rajasagi; Grace Tiao; Philip M Dixon; Michael S Lawrence; Jonathan Stevens; William J Lane; Jamie L Dellagatta; Scott Steelman; Carrie Sougnez; Kristian Cibulskis; Adam Kiezun; Nir Hacohen; Vladimir Brusic; Catherine J Wu; Gad Getz Journal: Nat Biotechnol Date: 2015-11 Impact factor: 54.908
Authors: René L Warren; Gina Choe; Douglas J Freeman; Mauro Castellarin; Sarah Munro; Richard Moore; Robert A Holt Journal: Genome Med Date: 2012-12-10 Impact factor: 11.117
Authors: Tyler S Alioto; Ivo Buchhalter; Sophia Derdak; Barbara Hutter; Matthew D Eldridge; Eivind Hovig; Lawrence E Heisler; Timothy A Beck; Jared T Simpson; Laurie Tonon; Anne-Sophie Sertier; Ann-Marie Patch; Natalie Jäger; Philip Ginsbach; Ruben Drews; Nagarajan Paramasivam; Rolf Kabbe; Sasithorn Chotewutmontri; Nicolle Diessl; Christopher Previti; Sabine Schmidt; Benedikt Brors; Lars Feuerbach; Michael Heinold; Susanne Gröbner; Andrey Korshunov; Patrick S Tarpey; Adam P Butler; Jonathan Hinton; David Jones; Andrew Menzies; Keiran Raine; Rebecca Shepherd; Lucy Stebbings; Jon W Teague; Paolo Ribeca; Francesc Castro Giner; Sergi Beltran; Emanuele Raineri; Marc Dabad; Simon C Heath; Marta Gut; Robert E Denroche; Nicholas J Harding; Takafumi N Yamaguchi; Akihiro Fujimoto; Hidewaki Nakagawa; Víctor Quesada; Rafael Valdés-Mas; Sigve Nakken; Daniel Vodák; Lawrence Bower; Andrew G Lynch; Charlotte L Anderson; Nicola Waddell; John V Pearson; Sean M Grimmond; Myron Peto; Paul Spellman; Minghui He; Cyriac Kandoth; Semin Lee; John Zhang; Louis Létourneau; Singer Ma; Sahil Seth; David Torrents; Liu Xi; David A Wheeler; Carlos López-Otín; Elías Campo; Peter J Campbell; Paul C Boutros; Xose S Puente; Daniela S Gerhard; Stefan M Pfister; John D McPherson; Thomas J Hudson; Matthias Schlesner; Peter Lichter; Roland Eils; David T W Jones; Ivo G Gut Journal: Nat Commun Date: 2015-12-09 Impact factor: 14.919
Authors: Jasreet Hundal; Beatriz M Carreno; Allegra A Petti; Gerald P Linette; Obi L Griffith; Elaine R Mardis; Malachi Griffith Journal: Genome Med Date: 2016-01-29 Impact factor: 11.117
Authors: Jasreet Hundal; Susanna Kiwala; Joshua McMichael; Christopher A Miller; Huiming Xia; Alexander T Wollam; Connor J Liu; Sidi Zhao; Yang-Yang Feng; Aaron P Graubert; Amber Z Wollam; Jonas Neichin; Megan Neveau; Jason Walker; William E Gillanders; Elaine R Mardis; Obi L Griffith; Malachi Griffith Journal: Cancer Immunol Res Date: 2020-01-06 Impact factor: 11.151
Authors: Mary A Wood; Mayur Paralkar; Mihir P Paralkar; Austin Nguyen; Adam J Struck; Kyle Ellrott; Adam Margolin; Abhinav Nellore; Reid F Thompson Journal: BMC Cancer Date: 2018-04-13 Impact factor: 4.430
Authors: Jovan Cejovic; Jelena Radenkovic; Vladimir Mladenovic; Adam Stanojevic; Milica Miletic; Stevan Radanovic; Dragan Bajcic; Dragan Djordjevic; Filip Jelic; Milos Nesic; Jessica Lau; Patrick Grady; Nick Groves-Kirkby; Deniz Kural; Brandi Davis-Dusenbery Journal: Cancer Inform Date: 2018-09-28