Literature DB >> 25286921

Chimera: a Bioconductor package for secondary analysis of fusion products.

Marco Beccuti¹, Matteo Carrara¹, Francesca Cordero¹, Fulvio Lazzarato², Susanna Donatelli¹, Francesca Nadalin¹, Alberto Policriti¹, Raffaele A Calogero¹.

Abstract

SUMMARY: Chimera is a Bioconductor package that organizes, annotates, analyses and validates fusions reported by different fusion detection tools; current implementation can deal with output from bellerophontes, chimeraScan, deFuse, fusionCatcher, FusionFinder, FusionHunter, FusionMap, mapSplice, Rsubread, tophat-fusion and STAR. The core of Chimera is a fusion data structure that can store fusion events detected with any of the aforementioned tools. Fusions are then easily manipulated with standard R functions or through the set of functionalities specifically developed in Chimera with the aim of supporting the user in managing fusions and discriminating false-positive results.

Entities: Disease Gene Species

Mesh：

Year: 2014 PMID： 25286921 PMCID： PMC4253834 DOI： 10.1093/bioinformatics/btu662

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Fusion genes, also known as chimeras, have become crucial in the areas of biomarkers and therapeutic targets investigation. The emergence of deep sequencing of the transcriptome has opened many opportunities for the identification of this class of genomic alterations, leading to the discovery of novel chimeric transcripts in cancers. A significant number of bioinformatics algorithms have been developed to detect fusion genes (Beccuti ). Recently, we have shown that the performances of such tools can be quite variegated and that their false-positive detection rate can be a critical issue (Carrara et al., 2013) (Carrara et al., 2013). Most tools detect fusion events in two steps: (i) alignment to a reference genome to detect discordant alignments, and (ii) refinements of fusion candidates with removal of false-positive results. The second step is particularly critical, as each tool implements a different set of filters (Beccuti ; Wang ), and users have limited control on them. As each fusion detection tool has a specific combination of alignment and filtering approaches, results produced by each tool are only partially comparable with the others’ (Carrara ,b). Thus, although the combination of the results derived from different tools can increase the number of true fusions detected, it also significantly increases the number of false-positive results (see Supplementary Material). The Bioconductor package Chimera, reported in this note, provides a common framework to manipulate, analyze and filter fusion events detected by a variety of fusion detection tools. Chimera is based on typical R data structures, which allows its users to take advantage also of the full set of R functionalities.

2 DESCRIPTION

The flow of analysis supported by Chimera is described in Figure 1, and it is exemplified in the note’s Supplementary Material. With the functions made available by Chimera, the user can import data from 11 different fusion detection tools, represented on top of Figure 1, into a list of fusions. Each fusion stores the genomic break point and all the information that can be derived from the output of fusion detection tools (e.g. number of spanning and encompassing reads supporting the fusion junction, splicing pattern used and transcripts involved in the fusion). As each fusion detection tool might rely on different gene annotations, the import function recovers the HUGO (Seal ) symbols for the genes involved in the fusion, by overlapping the fusion break points to the genes’ genomic location stored in the Bioconductor package (Gentleman ) org.Hs.eg.db, which contains a genome-wide annotation based on Entrez Gene identifiers.

Fig. 1.

The information flow in Chimera

The information flow in Chimera Thus, a fusion involving two annotated genes will be described as SymbolX:SymbolY. In case a break point is located in a non-coding genomic region, it is annotated using its genomic coordinates (e.g. NBPF1:chr14:81937890-81937920). A list of fusions can be manipulated, as in Figure 1, using basic R list functions (e.g. concatenation) or with the set of Chimera ‘fusion queries’, e.g. function fusionName extracts the fusion name in the above described format (e.g. SPDYE8P:SLC24A5), function prettyPrint stores fusions description in a tab delimited file and function supportingReads extracts supporting reads for a given fusion.

2.1 Fusion filtering

Once the user has created and inspected the fusion list with the above functionalities, he/she might decide to filter the fusions through a select/exclude mechanism. Available filtering criteria are as follows: (i) a (user defined) threshold for the number of encompassing/spanning reads; (ii) a specific subset of fusion names; (iii) the presence of annotated gene names in fusions (e.g. SPDYE8P:SLC24A5 is kept, NBPF1:chr14:81937890-81937920 is excluded); (iv) fusions belonging to long distance exons of the same gene (e.g. SPDYE8P: SPDYE8P is discarded) and (v) presence of an intron at the fusion junction. This last filter is justified by the fact that analysis is performed on mature mRNAs. Thus, in case long introns are retained at the fusion break point, it is likely that such fusion event will not be translated into a functional protein.

2.2 Annotation

A user willing to complete, or enrich, fusion data can use Chimera functionalities for annotation (‘Annotations’ block in Fig. 1). The annotation facility allows to generate (i) the nucleotide sequence of a (list of) fusions (chimeraSeqSet); (ii) the number of supporting reads; and (iii) the list of fusion encoded peptides. This additional information could be used as final output or to enrich the information present in a fusion object. In particular, the output of chimeraSeqSet is a R DNAStringSet object that can be exported as a fasta file and that can enrich the fusion information (with the method addRNA). This output can also be used by the function fusionPeptides to explore the fusion at the protein level (the function retrieves the peptide regions involved and evaluates whether the fusion will produce an in-frame polypeptide). Chimera also embeds Oncofuse (Shugay ), a naive Bayes Network Classifier that provides a variegated annotation of fusions, as well as a score for the probability that a fusion acts as tumor driver.

2.3 Validation

Finally, a user can take advantage of Chimera to perform a first step of validation, which can be useful given the high number of false-positive results reported, in most cases, by fusion detection tools. Fusion break points are assessed by de novo assembly using GapFiller (Nadalin ). GapFiller is a seed-and-extend method able to correctly fill the gap between paired reads, thus it generates accurate longer sequences with respect to input reads. The supporting reads of a fusion (whether made available by the fusion detection tool or through the Chimera annotation functions) are assembled with GapFiller into a reference. The nucleotide sequence of the fusion break point, reconstructed by chimeraSeqSet, is then checked for inclusion against the de novo assembled reference. Validation is particularly useful for the correct evaluation of fusion events supported by few reads (see Supplementary Material).

3 DISCUSSION

To the best of our knowledge, Chimera is the only available software able to integrate and compare the data produced by different fusion detection tools. Thus, it represents an answer to the lack of standard data structure for fusion data representation. Furthermore, as the combination of results from more than one fusion detection tool enhances the probability to identify fusion events (see Supplementary Material), Chimera provides a common framework to manipulate, filter and prioritize fusion events. Moreover Chimera offers a validation procedure based on de novo assembly (thanks to the GapFiller tool) and enhances fusions’ annotation through the exploitation of the Oncofuse results. Funding: This work was supported by the Epigenomics Flagship Project EPIGEN and the European 7th frame-work program, Health.2012.1.2-1, NGS-PTL grant n. 306242. Conflict of interest: none declared.

7 in total

1. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions.

Authors: Mikhail Shugay; Iñigo Ortiz de Mendíbil; José L Vizmanos; Francisco J Novo
Journal: Bioinformatics Date: 2013-08-16 Impact factor: 6.937

Review 2. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives.

Authors: Qingguo Wang; Junfeng Xia; Peilin Jia; William Pao; Zhongming Zhao
Journal: Brief Bioinform Date: 2012-08-09 Impact factor: 11.622

3. Bioconductor: open software development for computational biology and bioinformatics.

Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583

4. State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues?

Authors: Matteo Carrara; Marco Beccuti; Federica Cavallo; Susanna Donatelli; Fulvio Lazzarato; Francesca Cordero; Raffaele A Calogero
Journal: BMC Bioinformatics Date: 2013-04-22 Impact factor: 3.169

5. genenames.org: the HGNC resources in 2011.

Authors: Ruth L Seal; Susan M Gordon; Michael J Lush; Mathew W Wright; Elspeth A Bruford
Journal: Nucleic Acids Res Date: 2010-10-06 Impact factor: 16.971

6. GapFiller: a de novo assembly approach to fill the gap within paired reads.

Authors: Francesca Nadalin; Francesco Vezzi; Alberto Policriti
Journal: BMC Bioinformatics Date: 2012-09-07 Impact factor: 3.169

7. State-of-the-art fusion-finder algorithms sensitivity and specificity.

Authors: Matteo Carrara; Marco Beccuti; Fulvio Lazzarato; Federica Cavallo; Francesca Cordero; Susanna Donatelli; Raffaele A Calogero
Journal: Biomed Res Int Date: 2013-02-17 Impact factor: 3.411

7 in total

1. Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data.

Authors: Sakrapee Paisitkriangkrai; Kelly Quek; Eva Nievergall; Anissa Jabbour; Andrew Zannettino; Chung Hoow Kok
Journal: Mol Genet Genomics Date: 2018-06-07 Impact factor: 3.291

Review 2. Discovering and understanding oncogenic gene fusions through data intensive computational approaches.

Authors: Natasha S Latysheva; M Madan Babu
Journal: Nucleic Acids Res Date: 2016-04-21 Impact factor: 16.971

3. Unraveling Gene Fusions for Drug Repositioning in High-Risk Neuroblastoma.

Authors: Zhichao Liu; Xi Chen; Ruth Roberts; Ruili Huang; Mike Mikailov; Weida Tong
Journal: Front Pharmacol Date: 2021-04-23 Impact factor: 5.810

4. Clinker: visualizing fusion genes detected in RNA-seq data.

Authors: Breon M Schmidt; Nadia M Davidson; Anthony D K Hawkins; Ray Bartolo; Ian J Majewski; Paul G Ekert; Alicia Oshlack
Journal: Gigascience Date: 2018-07-01 Impact factor: 6.524

5. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.

Authors: Malachi Griffith; Jason R Walker; Nicholas C Spies; Benjamin J Ainscough; Obi L Griffith
Journal: PLoS Comput Biol Date: 2015-08-06 Impact factor: 4.475

6. Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data.

Authors: Silvia Liu; Wei-Hsiang Tsai; Ying Ding; Rui Chen; Zhou Fang; Zhiguang Huo; SungHwan Kim; Tianzhou Ma; Ting-Yu Chang; Nolan Michael Priedigkeit; Adrian V Lee; Jianhua Luo; Hsei-Wei Wang; I-Fang Chung; George C Tseng
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971

7. The diagnostic application of RNA sequencing in patients with thyroid cancer: an analysis of 851 variants and 133 fusions in 524 genes.

Authors: Moraima Pagan; Richard T Kloos; Chu-Fang Lin; Kevin J Travers; Hajime Matsuzaki; Ed Y Tom; Su Yeon Kim; Mei G Wong; Andrew C Stewart; Jing Huang; P Sean Walsh; Robert J Monroe; Giulia C Kennedy
Journal: BMC Bioinformatics Date: 2016-01-11 Impact factor: 3.169

7 in total