Literature DB >> 29259323

INTEGRATE-Vis: a tool for comprehensive gene fusion visualization.

Jin Zhang1, Teng Gao2,3, Christopher A Maher4,5,6,7.   

Abstract

Despite the increasing quantity of tools for accurately predicting gene fusion candidates from sequencing data, we are still faced with the critical challenge of visualizing the corresponding gene fusion products to infer their biological consequence (i.e. novel protein and increased gene expression). This is currently accomplished by manually inspecting and inferring the biological consequence of top scoring gene fusion candidates. This labor-intensive process could be made easier by automating the annotation of gene fusion products and generating easily interpretable visualizations. We developed a gene fusion visualization tool, called INTEGRATE-Vis, that generates comprehensive, highly customizable, publication-quality graphics focused on annotating each gene fusion at the transcript- and protein-level and assessing expression within an individual sample or across a patient cohort. INTEGRATE-Vis is the first comprehensive gene fusion visualization tool to help a user infer the potential consequence of a gene fusion event. It has potential utility in both research and clinical settings. INTEGRATE-Vis is available at https://github.com/ChrisMaherLab/INTEGRATE-Vis .

Entities:  

Mesh:

Year:  2017        PMID: 29259323      PMCID: PMC5736641          DOI: 10.1038/s41598-017-18257-2

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Gene fusions have served as highly specific diagnostic markers, prognostic indicators and therapeutic targets[1]. High throughput transcriptome sequencing (RNA-Seq) has accelerated our ability to discover expressed gene fusions[2]. While recent tools, such as INTEGRATE[3], are highly sensitive and specific, we are still faced with the critical challenge of ensuring that a casual gene fusion is not only detected, but that it can be prioritized accordingly amongst passenger events. This is currently accomplished by manually inspecting and inferring the biological consequence (i.e., generation of a novel protein, altered expression levels) of top scoring gene fusion candidates. The labor-intensive process could be made easier and more precise by automating the annotation of gene fusion transcripts and proteins and generating easily interpretable visualizations. Currently gene fusion visualization approaches either rely on CIRCOS to highlight the genomic locations of the gene fusion partners[4], IGV for assessing sequence coverage at fusion junctions[5], or splicing graphs to observe the exons involved in a gene fusion[6]. Individually each of these methods is insufficient for inferring the consequences of the gene fusion on expression or the corresponding protein product. To address these limitations, we developed a tool, INTEGRATE-Vis, which generates multiple visualizations for annotating each gene fusion, at the transcript- and protein-level, and assessing gene expression within an individual sample or across a cohort.

Results

INTEGRATE-Vis pipeline (Fig. 1) generates four types of figures, created manually in previous publications, to provide easy-to-interpret visualizations of gene fusion predictions[3,7-10]. To illustrate how INTEGRATE-Vis works we focused on a prostate cancer patient harboring the most prevalent gene fusion, TMPRSS2-ERG, that results in the marked increase in the expression of the oncogenic transcription factor ERG (Fig. 2). Panels A through D of Fig. 2 correspond to structure plot, domain plot, exon expression plot, and gene expression plot outputs described in Fig. 1, respectively.
Figure 1

Overview of the INTEGRATE-Vis pipeline. INTEGRATE-Vis contains four major modules to plot four types of figures for gene fusions: isoform structure, domain, exon expression, and gene partner expression. The first three modules are for individual samples, and the last module is for cohort data. Only minimal inputs in standard formats, including BEDPE, TSV, FASTA, BAM, and GTF are needed to make the plots.

Figure 2

INTEGRATE-Vis output illustrated using the TMPRSS2-ERG gene fusion in prostate cancer. INTEGRATE-Vis outputs four visualizations including: (A) gene fusion transcript isoforms, (B) the predicted protein structure of the gene fusion, (C) RNA-Seq read coverage across each gene fusion partner to reveal changes in exon expression (A red line is plotted at the fusion junctions at both gene partners. Exon boundaries are represented by blue lines. A marked expression change occurs between exons 3 and 4 of ERG.), and (D) expression of each gene fusion partner across the TCGA PRAD cohort. Blue is used to represent supporting reads, exons, transcript, and genomic locations for the 5′ gene partner (TMPRSS2), while red is for those of the 3′ gene partner (ERG).

Overview of the INTEGRATE-Vis pipeline. INTEGRATE-Vis contains four major modules to plot four types of figures for gene fusions: isoform structure, domain, exon expression, and gene partner expression. The first three modules are for individual samples, and the last module is for cohort data. Only minimal inputs in standard formats, including BEDPE, TSV, FASTA, BAM, and GTF are needed to make the plots. INTEGRATE-Vis output illustrated using the TMPRSS2-ERG gene fusion in prostate cancer. INTEGRATE-Vis outputs four visualizations including: (A) gene fusion transcript isoforms, (B) the predicted protein structure of the gene fusion, (C) RNA-Seq read coverage across each gene fusion partner to reveal changes in exon expression (A red line is plotted at the fusion junctions at both gene partners. Exon boundaries are represented by blue lines. A marked expression change occurs between exons 3 and 4 of ERG.), and (D) expression of each gene fusion partner across the TCGA PRAD cohort. Blue is used to represent supporting reads, exons, transcript, and genomic locations for the 5′ gene partner (TMPRSS2), while red is for those of the 3′ gene partner (ERG). First, in our structure plot we show the predicted gene fusion transcript structure highlighting sequence reads that encompass and span the fusion junction (Fig. 2A). Both TMPRSS2 and ERG are on the reverse strand of chromosome 22 in two consecutive cytogenetic bands. A genomic deletion between the upstream gene, TMPRSS2, and the downstream partner, ERG, generates the gene fusion event. By default, INTEGRATE-Vis constructs a gene fusion transcript isoform using the most prevalent transcript isoform of the gene partners. Alternatively, a user can designate specific transcript isoforms to display in the reconstructed gene fusion transcript. As shown in Fig. 2A, a 14-exon isoform (ENST00000458356) of TMPRSS2 and a 12-exon isoform (ENST00000398919) of ERG were used for visualization. The fusion junction is located at the second exon of ENST00000458356 and the fourth exon of ENST00000398919. The gene fusion transcript is shown in the upper panel with the corresponding supporting reads to infer the expression level. To conserve space, INTEGRATE-Vis displays 1, 2, or 3 supporting reads to illustrate 1, 2–10, or >10 supporting sequence reads, respectively. Second, to predict the potential functional consequences of a gene fusion INTEGRATE-Vis generates a domain plot to translate the fusion transcript and displays the corresponding protein domains (Fig. 2B). The protein product from the 5′ gene partner (i.e. TMPRSS2) is plotted on the left and the protein product from the 3′ gene partner (i.e. ERG) is plotted on the right. This is also indicated by the annotation of N and C for N- and C- terminuses (Fig. 2B). As shown in Fig. 2B, the in-frame gene fusion protein product is comprised of a small 5′ regulatory region of TMPRSS2 and the majority of ERG. This includes both the ETS domain and Pointed domain of the ERG gene. As shown in Figure S1, for out-of-frame gene fusion transcripts, the 3′ end is represented by a white box and a red cross is used to represent the translation termination site. Third, to determine whether the gene fusion increases the expression of the 3′ partner (i.e., oncogene) or decreases the expression of the 5′ partner (i.e., tumor suppressor), INTEGRATE-Vis generates an exon expression plot. This displays read coverage for the exon-level expression for each gene involved in the fusion (Fig. 2C). As shown in Fig. 2C, the ERG exons involved in the gene fusion (exons 4 through 12) have significantly higher read coverage compared to the exons that are not included in the gene fusion (exons 1 through 3). Notably, INTEGRATE-Vis automatically selects the scales for the y- and x-axes to show the ranges of read coverage for each gene partner, although they can be adjusted based on user-defined input (Figure S2). Fourth, to determine if the sample harboring the gene fusion results in a unique expression change relative to a cohort of samples (e.g., prostate cancer patients lacking the gene fusion), INTEGRATE-Vis generates a gene expression plot. This outputs a bar plot of the expression level for both genes involved in the fusion across a patient cohort (Fig. 2D). For example, Fig. 2D highlights the difference in ERG expression levels in patients lacking the TMPRSS2-ERG gene fusion with patients harboring the TMPRSS2- ERG gene fusion. In contrast, expression levels of TMPRSS2 are not different between patients with or without the gene fusion. In addition to determining if the gene fusion alters the expression of the 5′ or 3′ gene, this visualization can also identify additional patients that may also harbor gene fusions producing similar expression consequences. While we have demonstrated the utility of INTEGRATE-Vis using the most prevalent TMPRSS2-ERG isoform, INTEGRATE-Vis automatically generates plots for all predicted gene fusion isoforms. INTEGRATE-Vis has been implemented with reasonable default parameters to help best interpret the functions of the gene fusion products. It also provided ample options to enhance user-friendliness (Figures S2 and S3). INTEGRATE-Vis executes efficiently; figure generation takes a few seconds (Figure S4).

Discussion

Overall, we developed the first comprehensive gene fusion visualization tool, INTEGRATE-Vis, which generates publication-quality graphics to help a user infer the potential consequence of a gene fusion event. We have implemented INTEGRATE-Vis to utilize standardized input files, including the SMC-RNA BEDPE format for gene fusion predictions, therefore making it widely accessible to the larger research community independent of the gene fusion discovery tool being used.

Methods

The INTEGRATE-Vis pipeline was implemented in Python and C++, and requires a minimal set of dependencies (CMake, GCC, Matplotlib, and gtfToGenePred) to install and execute. The input into INTEGRATE-Vis includes a list of gene fusion candidates in a standard BEDPE format as well as other common standardized inputs (i.e. FASTA, GTF) including a reference genome and gene models. 333 BEDPE files can be downloaded from https://github.com/ChrisMaherLab/INTEGRATE-Vis, including gene fusions previously discovered[10]. Additional input files in TSV format (i.e. a protein domain table and an ideogram table for cytogenetic bands) and the commands to generate these TSV files are all included at https://github.com/ChrisMaherLab/INTEGRATE-Vis. Read counts for the samples were calculated using FeatureCounts[11]. INTEGRATE-Vis performs a series of annotation and calculation steps before generating figures summarizing the gene fusion in PDF format (Fig. 1).

Availability and requirements

The INTEGRATE-Vis pipeline has been tested using Python version 2.7 and requires CMake, GCC, Matplotlib, and gtfToGenePred to install and execute. It is available from https://github.com/ChrisMaherLab/INTEGRATE-Vis, which also contains instructions and links for downloading required tools or packages, step-by-step instructions of installing INTEGRATE-Vis pipeline, and sample command lines of executing INTEGRATE-Vis from either BEDPE files or raw RNA-seq reads. Raw sequence reads of TCGA PRAD cohort can be downloaded from Genomic Data Commons (https://gdc.cancer.gov). Supplementary Figures
  11 in total

1.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

Authors:  Yang Liao; Gordon K Smyth; Wei Shi
Journal:  Bioinformatics       Date:  2013-11-13       Impact factor: 6.937

2.  Detecting and visualizing gene fusions.

Authors:  Jochen Supper; Claudia Gugenmus; Johannes Wollnik; Tanja Drueke; Matthias Scherf; Alexander Hahn; Korbinian Grote; Nancy Bretschneider; Bernward Klocke; Christian Zinser; Kerstin Cartharius; Martin Seifert
Journal:  Methods       Date:  2012-10-02       Impact factor: 3.608

3.  Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts.

Authors:  Shunqiang Li; Dong Shen; Jieya Shao; Robert Crowder; Wenbin Liu; Aleix Prat; Xiaping He; Shuying Liu; Jeremy Hoog; Charles Lu; Li Ding; Obi L Griffith; Christopher Miller; Dave Larson; Robert S Fulton; Michelle Harrison; Tom Mooney; Joshua F McMichael; Jingqin Luo; Yu Tao; Rodrigo Goncalves; Christopher Schlosberg; Jeffrey F Hiken; Laila Saied; Cesar Sanchez; Therese Giuntoli; Caroline Bumb; Crystal Cooper; Robert T Kitchens; Austin Lin; Chanpheng Phommaly; Sherri R Davies; Jin Zhang; Megha Shyam Kavuri; Donna McEachern; Yi Yu Dong; Cynthia Ma; Timothy Pluard; Michael Naughton; Ron Bose; Rama Suresh; Reida McDowell; Loren Michel; Rebecca Aft; William Gillanders; Katherine DeSchryver; Richard K Wilson; Shaomeng Wang; Gordon B Mills; Ana Gonzalez-Angulo; John R Edwards; Christopher Maher; Charles M Perou; Elaine R Mardis; Matthew J Ellis
Journal:  Cell Rep       Date:  2013-09-19       Impact factor: 9.423

4.  Comprehensive genomic analysis reveals FLT3 activation and a therapeutic strategy for a patient with relapsed adult B-lymphoblastic leukemia.

Authors:  Malachi Griffith; Obi L Griffith; Kilannin Krysiak; Zachary L Skidmore; Matthew J Christopher; Jeffery M Klco; Avinash Ramu; Tamara L Lamprecht; Alex H Wagner; Katie M Campbell; Robert Lesurf; Jasreet Hundal; Jin Zhang; Nicholas C Spies; Benjamin J Ainscough; David E Larson; Sharon E Heath; Catrina Fronick; Shelly O'Laughlin; Robert S Fulton; Vincent Magrini; Sean McGrath; Scott M Smith; Christopher A Miller; Christopher A Maher; Jacqueline E Payton; Jason R Walker; James M Eldred; Matthew J Walter; Daniel C Link; Timothy A Graubert; Peter Westervelt; Shashikant Kulkarni; John F DiPersio; Elaine R Mardis; Richard K Wilson; Timothy J Ley
Journal:  Exp Hematol       Date:  2016-05-13       Impact factor: 3.084

5.  Integrative genomics viewer.

Authors:  James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal:  Nat Biotechnol       Date:  2011-01       Impact factor: 54.908

6.  ClicO FS: an interactive web-based service of Circos.

Authors:  Wei-Hien Cheong; Yung-Chie Tan; Soon-Joo Yap; Kee-Peng Ng
Journal:  Bioinformatics       Date:  2015-07-29       Impact factor: 6.937

7.  INTEGRATE-neo: a pipeline for personalized gene fusion neoantigen discovery.

Authors:  Jin Zhang; Elaine R Mardis; Christopher A Maher
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

8.  State-of-the-art fusion-finder algorithms sensitivity and specificity.

Authors:  Matteo Carrara; Marco Beccuti; Fulvio Lazzarato; Federica Cavallo; Francesca Cordero; Susanna Donatelli; Raffaele A Calogero
Journal:  Biomed Res Int       Date:  2013-02-17       Impact factor: 3.411

9.  A genomic case study of mixed fibrolamellar hepatocellular carcinoma.

Authors:  O L Griffith; M Griffith; K Krysiak; V Magrini; A Ramu; Z L Skidmore; J Kunisaki; R Austin; S McGrath; J Zhang; R Demeter; T Graves; J M Eldred; J Walker; D E Larson; C A Maher; Y Lin; W Chapman; A Mahadevan; R Miksad; I Nasser; D W Hanto; E R Mardis
Journal:  Ann Oncol       Date:  2016-03-30       Impact factor: 32.976

10.  INTEGRATE: gene fusion discovery using whole genome and transcriptome data.

Authors:  Jin Zhang; Nicole M White; Heather K Schmidt; Robert S Fulton; Chad Tomlinson; Wesley C Warren; Richard K Wilson; Christopher A Maher
Journal:  Genome Res       Date:  2015-11-10       Impact factor: 9.043

View more
  3 in total

1.  FGviewer: an online visualization tool for functional features of human fusion genes.

Authors:  Pora Kim; Ke Yiya; Xiaobo Zhou
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

2.  ChimerDB 4.0: an updated and expanded database of fusion genes.

Authors:  Ye Eun Jang; Insu Jang; Sunkyu Kim; Subin Cho; Daehan Kim; Keonwoo Kim; Jaewon Kim; Jimin Hwang; Sangok Kim; Jaesang Kim; Jaewoo Kang; Byungwook Lee; Sanghyuk Lee
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

3.  Discovery of clinically relevant fusions in pediatric cancer.

Authors:  Stephanie LaHaye; James R Fitch; Kyle J Voytovich; Adam C Herman; Benjamin J Kelly; Grant E Lammi; Jeremy A Arbesfeld; Saranga Wijeratne; Samuel J Franklin; Kathleen M Schieffer; Natalie Bir; Sean D McGrath; Anthony R Miller; Amy Wetzel; Katherine E Miller; Tracy A Bedrosian; Kristen Leraas; Elizabeth A Varga; Kristy Lee; Ajay Gupta; Bhuvana Setty; Daniel R Boué; Jeffrey R Leonard; Jonathan L Finlay; Mohamed S Abdelbaki; Diana S Osorio; Selene C Koo; Daniel C Koboldt; Alex H Wagner; Ann-Kathrin Eisfeld; Krzysztof Mrózek; Vincent Magrini; Catherine E Cottrell; Elaine R Mardis; Richard K Wilson; Peter White
Journal:  BMC Genomics       Date:  2021-12-04       Impact factor: 3.969

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.