| Literature DB >> 27105842 |
Natasha S Latysheva1, M Madan Babu2.
Abstract
Although gene fusions have been recognized as important drivers of cancer for decades, our understanding of the prevalence and function of gene fusions has been revolutionized by the rise of next-generation sequencing, advances in bioinformatics theory and an increasing capacity for large-scale computational biology. The computational work on gene fusions has been vastly diverse, and the present state of the literature is fragmented. It will be fruitful to merge three camps of gene fusion bioinformatics that appear to rarely cross over: (i) data-intensive computational work characterizing the molecular biology of gene fusions; (ii) development research on fusion detection tools, candidate fusion prioritization algorithms and dedicated fusion databases and (iii) clinical research that seeks to either therapeutically target fusion transcripts and proteins or leverages advances in detection tools to perform large-scale surveys of gene fusion landscapes in specific cancer types. In this review, we unify these different-yet highly complementary and symbiotic-approaches with the view that increased synergy will catalyze advancements in gene fusion identification, characterization and significance evaluation.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27105842 PMCID: PMC4889949 DOI: 10.1093/nar/gkw282
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Mechanisms of gene fusion formation. (A) Structural rearrangements of chromosomes, such as translocations, inversions, deletions and insertions, can lead to the formation of gene fusions. These hybrid genes may then be transcribed and translated into fusion transcripts and proteins. (B) Non-structural rearrangement mechanisms, such as transcription read-through of neighboring genes or splicing of mRNA molecules, are increasingly recognized as leading to the formation of a large proportion of gene fusions.
Figure 2.Trends in fusion functionality. (A) Recent surveys have uncovered the diverse gene fusion landscapes present in a variety of cancers. (B) The frequency of gene fusions varies by cancer type and appears to anti-correlate with frequencies of other somatic mutations at the level of both cancer types and individual tumor samples. (C) Gene fusions tend to involve genes with kinase, DNA-binding and chromatin modifying activity. (D) Network studies of fusions have identified global and cancer-type-specific patterns in gene partnerships, such as the trend toward most fusion genes only fusing with only one other partner.
Figure 3.Structural features of fusion proteins. (A) Genes which form fusions tend to have fewer domains, but fusion transcript sequences have been shown to have more domains than expected by chance. (B) Fusion proteins are enriched for specific domains and permutations, which are occasionally proteomically novel. (C) Fusion breakpoints are biased toward locations which preserve fusion protein reading frames and structural viability. (D) Increased intrinsic disorder in fusion proteins may permit the protein to fold and place the constituent domains into proximity of each other.
Figure 4.Expression and regulation of fusion proteins. (A) Although the parent proteins that constitute fusion proteins tend to be more highly expressed than average, the expression of fusion proteins tends to be low. Fusion protein expression is highly tissue specific and tends to follow the tissue distribution of the parent proteins. (B) 5′ translocation partners tend to have highly active promoters and 3′ partners have especially stable UTR regions, which suggests an optimization for increasing fusion transcript and protein levels. (C) An increasing number of reports demonstrate that cancer-associated fusions can also be present in healthy, non-diseased tissue. (D) The translation of fusion transcripts into fusion proteins is relatively rarely confirmed, which may be partially due to false positive hits from fusion transcript detection algorithms.
Software packages, algorithms and tools for identifying gene fusions from sequencing data
| Name | Notable features | URL | PMID | Year |
|---|---|---|---|---|
|
| An ensemble of the three fusion transcript detection algorithms (SOAPfuse, FusionCatcher and JAFFA) with the best performance on three synthetic and three real PE RNA-seq cancer data sets. R package. |
| 26582927 | 2015 |
|
| Combines WGS for structural variant detection with RNA-seq to detect expressed gene fusion transcripts. Emphasizes the minimization of false positive hit rate. |
| 26556708 | 2015 |
|
| Detects gene fusions, identifies junctions and quantifies fusion isoforms by integrating third-generation sequencing long reads and second-generation sequencing short reads. |
| 26040699 | 2015 |
|
| Fusion transcript detection algorithm optimized for reads of 100 base pairs or greater. Uses a known transcriptome as an alignment reference instead of genome. |
| 26019724 | 2015 |
|
| Detects fusion transcripts from PE RNA-seq data. Performs split read mapping and assembly of potential breakpoint regions. Filters include thresholds on repeat content and number of supporting reads. |
| 25650807 | 2015 |
|
| Detects several predefined pathognomonic gene fusions in childhood sarcomas from RNA-seq data. Operates on cloud-computing platform. Part of the FUSIONCloud commercial analytical platform. |
| 24517889 | 2014 |
|
| Detects somatic fusion transcripts. Uses ensemble approach of four different methods and aligners to identify fusion junctions. Discordantly mapping reads are filtered on gene identity and positioning. |
|
| 2014 |
|
| Uses dual-mapping strategy of aligning paired-end reads to a combined genome and transciptome reference to detect fusion transcripts. Outputs fusion frame classification, homology scores and other summary features. |
| 24695405 | 2014 |
|
| Method for |
| 23457040 | 2013 |
|
| Detects gene fusions from PE RNA-seq data, reconstructs features of fusion transcripts and estimates their abundances. Uses a residual sequence extension method to extend short reads. |
| 23768108 | 2013 |
|
| Web-based visualization tool for structural variants that prioritizes breaks that are likely to be associated with gene fusions. Provides candidate transcript and protein sequences resulting from the identified gene fusions. |
| 23661695 | 2013 |
|
| Identifies fusion transcripts through discordant PE reads and junction spanning reads. Features an improved algorithm for constructing the putative junction library and a relatively high computational efficiency. |
| 23409703 | 2013 |
|
| Part of the SOAP software for genome-wide detection of gene fusions from PE RNA-Seq data. Focuses on high sensitivity and low false discovery rates at low coverage. |
| 24123671 | 2013 |
|
| Identifies fusion transcripts from PE data. Selects from fusion candidates using a 'gene fusion model', and features splice site and abundance analyses that provide a more accurate set of junction reads. |
| 22711792 | 2012 |
|
| Detects fusion transcripts using a targeted transcriptome assembly strategy. Introduces a single statistical chimeric score that summarizes the likelihood of a junction sequence containing true breakpoints. |
| 22563071 | 2012 |
|
| Commercial software for identifying fusions from paired-end RNA seq reads. Filters on fusion structure and read support. Introduces the Transcriptome Viewer, a tool for visualizing gene fusions. |
| 23036331 | 2012 |
|
| Detects fusion transcripts from PE data. Can create synthetic gene fusions with the EricScript simulator, and EricScript CalcStats can generate summary statistics for scoring fusion detection methods. |
| 23093608 | 2012 |
|
| A graphical tool for detecting fusion transcripts from PE data that provides a user-friendly GUI and filtering system for non-programmers. |
| 22570408 | 2012 |
|
| Identifies gene fusion partners from either SE or PE RNA-seq data. Filters on features including read-through transcripts, homology and antisense information. |
| 22761941 | 2012 |
|
| GUI-based splice and fusion detection from RNA-seq data method. Available from within the LifeScope software package. |
| 22496636 | 2012 |
|
| Detects fusion transcripts and related chromosomal rearrangements from matched RNA-seq and whole genome shotgun sequencing data. |
| 22745232 | 2012 |
|
| Detects fusion transcripts from PE RNA-seq data. Automatically generates HTML reports to facilitate results analysis. |
| 21840877 | 2011 |
|
| Performs an integrated analysis of RNA-Seq and WGS data to detect genomic rearrangements and fusion transcripts. Handles low coverage genome data. |
| 21478487 | 2011 |
|
| Uses discordant paired end alignments to guide the split read analysis. Does not discard ambiguously mapping reads, but considers all possible alignments and fusion boundaries and resolve the most probable position. |
| 21625565 | 2011 |
|
| Detects fusion transcripts from PE data. Can identify transcript fragments without known annotations. Filters on anchor length, read-through transcripts, junction reads and PCR artifacts. |
| 21546395 | 2011 |
|
| Fusion gene detection from either SE or PE RNA-seq or gDNA-seq data. Focuses on improving the accuracy of mapping junction-spanning single reads. |
| 21593131 | 2011 |
|
| Fusion transcript detection from PE RNA-seq data. Focuses on accurately identifying fusion transcripts when many read pairs map ambiguously. |
| 21330288 | 2011 |
|
| Fusion transcript detection from PE RNA-seq data. Includes prediction of genomic rearrangements, fusion protein sequence reconstruction and generation of fusion spanning sequence for PCR validation. |
| 21622959 | 2011 |
|
| A version of TopHat specialized for the detection of fusion transcripts. Implements a two stage process of aligning reads to genomic reference using altered version of TopHat, then a processing step to incorporate annotation and filter candidates. |
| 21835007 | 2011 |
|
| Fusion transcript detection from PE RNA-seq data. Considers annotated exons during mapping procedure, and reports read-through fusions in addition to other fusions. Variety of filters, including comparing fusion expression with general expression. |
| 20964841 | 2010 |
GUI = graphical user interface, PCR = polymerase chain reaction, PE = paired-end, RNA-seq = RNA sequencing, SE = single end, WGS = whole genome sequencing.
Databases of gene fusions
| Database name | Description | Data sources | URL | PMID | Database size (in current release or as of October 2015) | First published and current database release |
|---|---|---|---|---|---|---|
|
| Relates gene fusions and other chromosomal aberrations to tumor characteristics, based either on individual cases or associations. | Manual literature curation. |
| 17361217 (review) | 10 026 gene fusions; 65 975 patient cases | 1994–2015. Current release: August 2014 |
|
| Catalog of translocations and fusions between gene pairs supplemented with extensive clinical data. | Manual literature curation. |
| 25355519 (full 2015 COSMIC db) | 10 534 gene fusions | 2004–2015. Current release: v70 (2014) |
|
| Database of functional and regulatory elements in gene fusion events related to cancer. | Integration of diverse data sets, including fusion events, molecular and regulatory features. |
| 26384373 | 1587 gene fusions | 2015 |
|
| Repository for the results of the landscape of cancer-associated fusion study carried out using the PRADA pipeline. | Integrated analysis of paired-end RNA sequencing and DNA copy number data from TCGA. |
| 25500544 | 7887 fusion transcripts | 2015. Current release: December, 2014. |
|
| Fusion gene database derived exclusively from cancer RNA-seq data. | Compiled from 591 recently published RNA-seq datasets covering 15 kinds of human cancer. |
| 26215638 | 11 839 gene fusions | 2015 |
|
| Catalogue of fusion transcripts in humans, mice, fruit flies, zebrafishes, cows, rats, pig and yeast. | Bioinformatic analysis of ESTs and mRNAS from GenBank. |
| 25414346 (2.1); 23143107 | 29 159 fusion transcripts | 2013. Current release: ChiTaRS 2.1 (2014) |
|
| Curated database of human chromosomal rearrangements, associated diseases and clinical symptoms. | Manual literature curation. |
| 21051346 | 2643 chromosome rearrangements | 2011. Current release: v 0.9 (2010) |
|
| Database of conjoined genes (transcription read-through fusions). | Manual literature curation and bioinformatic analysis of EST and mRNA sequences from GenBank. |
| 20967262 | 800 conjoined genes from 1542 parent genes | 2010 |
|
| Database of hybrid genes in the human genome. | Analysis of mRNA, EST, cDNA and genomic DNA sequences in the INSDC resource. |
| 17519042 | 3404 gene fusions | 2007 |
|
| Finely mapped translocation breakpoints in cancer. | Manual literature curation and analysis of public databases (Mitelman, GenBank). |
| 17257420 | 1374 fusion sequences from 431 different genes | 2007. Current release: release 3.3 (2013) |
|
| Knowledgebase of fusion transcripts across multiple species, as well as information on cancer breakpoints. | Bioinformatics analysis of Sanger CGP, OMIM, PubMed and the Mitelman's database and transcript sequences in GenBank. |
| 19906715 (2.0); 16381848 | 2699 fusion transcripts | 2006. Current release: ChimerDB 2.0 (2010) |
|
| Database of all published chromosomal rearrangements that are associated with an abnormal phenotype. | Online searches of PubMed, Scopus and OMIM. |
| Unpublished | 965 translocations | NA |
Databases are annotated with source data types, URLs, estimates of database content and size and first and current releases. EST = expressed sequence tag, INSDC = International Nucleotide Sequence Database Collaboration, OMIM = Online Mendelian Inheritance in Man, Sanger CGP = Sanger Cancer Genome Project, TCGA = The Cancer Genome Atlas.