| Literature DB >> 31639029 |
Brian J Haas1, Alexander Dobin2, Bo Li3,4, Nicolas Stransky5, Nathalie Pochet3,6, Aviv Regev3,7.
Abstract
BACKGROUND: Accurate fusion transcript detection is essential for comprehensive characterization of cancer transcriptomes. Over the last decade, multiple bioinformatic tools have been developed to predict fusions from RNA-seq, based on either read mapping or de novo fusion transcript assembly.Entities:
Keywords: Benchmarking; Cancer; Fusion; RNA-seq; STAR-Fusion; TrinityFusion
Mesh:
Year: 2019 PMID: 31639029 PMCID: PMC6802306 DOI: 10.1186/s13059-019-1842-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Methods for fusion transcript prediction and accuracy evaluation. a The two general paradigms for fusion transcript identification include (left) mapping reads to the genome and capturing discordant read pairs and chimeric read alignments and (right) performing genome-free de novo transcript assembly followed by identification of chimeric transcript alignments. b Given a well-defined truth set of fusions, true- and false-positive predictions are tallied according to minimum threshold for fusion-supporting reads. F1 accuracy values are computed at each minimum evidence threshold to determine the threshold that yields peak prediction accuracy for each method. Similarly, precision and recall values are computed at each minimum evidence threshold, plotted as a precision-recall curve, and the area under the curve (AUC) is computed as a measure of overall prediction accuracy
RNA-seq-based fusion transcript predictors evaluated
| Method | Class* | Brief overview of methodology |
|---|---|---|
| Arriba [ | R | Arriba extracts gene fusions from the chimeric alignments reported by STAR [ |
| ChimeraScan [ | R | Identifies candidate fusions from discordant Bowtie [ |
| ChimPipe [ | R | The GEMtools RNA-seq pipeline [ |
| deFuse [ | R | Aligns reads to spliced and unspliced gene sequences using Bowtie [ |
| EricScript [ | R | BWA [ |
| FusionCatcher [ | R | Leverages a collection of alignment utilities including Bowtie [ |
| FusionHunter [ | R | First uses Bowtie to align reads to the genome and identify candidate fusions based on discordant read pairs. Then creates a “pseudoreference” by positioning candidate fusion genes with canonical ordering, realigns reads using a custom algorithm and identifies both split and spanning reads providing evidence for gene fusions. |
| InFusion [ | R | Reads are first aligned to the reference transcriptome using Bowtie2. Unaligned and discordantly aligned reads are further examined in the context of the genome and transcriptome to cluster evidence and define candidate fusions. |
| JAFFA-Assembly [ | A | After removing intronic and intergenic region aligning reads defined by Bowtie genome alignments, the remaining reads are assembled using Oases [ |
| JAFFA-Direct [ | R | After removing intronic and intergenic region aligning reads defined by Bowtie genome alignments, the remaining reads are mapped directly to the transcriptome using BLAT. Chimeric BLAT alignments are further assessed as fusion candidates. |
| JAFFA-Hybrid [ | R,A | After removing intronic and intergenic region aligning reads defined by Bowtie genome alignments, the remaining reads are assembled using Oases. Both the assembled transcripts and the original reads that failed to map to the genome are then mapped directly to the transcriptome using BLAT. Chimeric BLAT alignments are further assessed as fusion candidates. |
| MapSplice [ | R | An RNA-seq aligner based on Bowtie similar to TopHat [ |
| nFuse [ | R | Designed for use with WGS-seq and RNA-seq but can be executed with RNA-seq only, leveraging its included deFuse with Bowtie2. |
| Pizzly [ | R | Uses a k-mer based strategy to examine reads that do not map to isoforms consistently via kallisto [ |
| PRADA [ | R | Reads are aligned to a combined genome and transcriptome reference using BWA. Discordant reads identify fusion candidates, and junction reads are identified by mapping to a database of all possible 5′-3′ chimeric exon junction database. |
| SOAP-fuse [ | R | The SOAP2 aligner [ |
| STARChip [ | R | Uses chimeric reads reported by STAR aimed primarily at identifying circular RNAs but also reports fusion candidates. |
| STAR-Fusion [ | R | Uses chimeric read alignments reported by STAR in its Chimeric.out.junction file to identify candidate fusions followed by extensive filtering of likely artifacts. |
| STAR-SEQR [ | R | Uses chimeric reads reported by STAR to find fusions. |
| TopHat-Fusion [ | R | A modified execution of the TopHat aligner [ |
| TrinityFusion-C [ | A | De novo assembles only the chimeric reads defined by STAR using the Trinity assembler [ |
| TrinityFusion-D [ | A | De novo assembles all input reads using Trinity, and subsequently leverages GMAP for chimera candidate detection. |
| TrinityFusion-UC [ | A | De novo assembles both chimeric and unmapped reads defined by STAR using the Trinity assembler, and subsequently leverages GMAP for chimera candidate detection. |
*Class of fusion detection method: R read mapping, A assembly followed by alignment
Fig. 2Fusion prediction accuracy on simulated fusion RNA-seq data. a Distribution of AUC values across replicates for both the 50 base length (PE 50) and 101 base length (PE 101) simulated paired-end RNA-seq fusion data sets. JAFFA-Hybrid and JAFFA-Direct were incompatible with the shorter PE 50 data set and so only results for longer PE 101 data are shown. b Heatmaps illustrating sensitivity for fusion detection according to fusion expression levels. Fusions were divided into bins based on log2(TPM) expression levels, and the percent of fusions identified within each expression bin are indicated according to color and intensity
Fig. 3Identification of experimentally validated fusions in breast cancer cell lines BT474, KPL4, MCF7, and SKBR3. a All fusions identified by at least three different methods are shown and ranked from being predicted by fewest to most methods in an UpSetR [61] style plot (UpSetR code forked and modified to show individual fusion group memberships here [62]). Previously reported experimentally validated fusions are shaded to facilitate identification. b Bar plot showing the number of experimentally validated fusions (left axis) contained within the union of all predictions supported by at least the specified number of fusion prediction methods. Also shown is the corresponding percent of the union of predictions containing experimentally validated fusions (blue line, right axis)
Fig. 4Fusion prediction accuracy on 56 cancer cell lines. a The distribution of leaderboard rankings for accuracies assessed using the varied truth sets. Methods are ranked from left to right according to median accuracies. b The distributions of execution times for all cancer cell lines are shown. All methods were run on the Broad Institute computing grid with commodity hardware and allocated single cores, with the exception of the two slowest methods, TrinityFusion-UC and TrinityFusion-D, which were each given four cores. c Median rankings are plotted vs. median run times, with a black dashed box drawn around the most accurate and fastest methods. d The PPV and TPR are shown at maximum point accuracy (F1) for an example trial involving the truth set defined as requiring at least seven methods to agree. The most accurate methods are found to cluster into groups of high sensitivity (top dashed rectangle) or high precision (right dashed rectangle)