| Literature DB >> 32466770 |
Liqing Tian1, Yongjin Li1, Michael N Edmonson1, Xin Zhou1, Scott Newman1, Clay McLeod1, Andrew Thrasher1, Yu Liu1,2, Bo Tang3, Michael C Rusch1, John Easton1, Jing Ma3, Eric Davis1, Austyn Trull1, J Robert Michael1, Karol Szlachta1, Charles Mullighan3, Suzanne J Baker4, James R Downing3, David W Ellison3, Jinghui Zhang5.
Abstract
To discover driver fusions beyond canonical exon-to-exon chimeric transcripts, we develop CICERO, a local assembly-based algorithm that integrates RNA-seq read support with extensive annotation for candidate ranking. CICERO outperforms commonly used methods, achieving a 95% detection rate for 184 independently validated driver fusions including internal tandem duplications and other non-canonical events in 170 pediatric cancer transcriptomes. Re-analysis of TCGA glioblastoma RNA-seq unveils previously unreported kinase fusions (KLHL7-BRAF) and a 13% prevalence of EGFR C-terminal truncation. Accessible via standard or cloud-based implementation, CICERO enhances driver fusion detection for research and precision oncology. The CICERO source code is available at https://github.com/stjude/Cicero.Entities:
Keywords: Cloud computing; Fusion visualization; Gene fusion; Precision oncology; RNA-seq
Mesh:
Year: 2020 PMID: 32466770 PMCID: PMC7325161 DOI: 10.1186/s13059-020-02043-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Examples of complex fusion cases missed by commonly used fusion detection tools. a A 3-segment C11orf95-MAML2 fusion in an ependymoma (SJEPD001509_D). The fusion breakpoints are shown at the top, which introduces a new splice site (reverse complement sequence AG|GT) within intron 1 of MAML2 (red arrow at the top). This resulted in replacing the last 23AA of C11orf95 with a 36 bp in-frame insertion, which was confirmed by Sanger sequencing shown at the bottom. This fusion can be detected by FusionCatcher but not the other three public methods. b IGH-EPOR fusion in a B-ALL (SJBALL020824_D1) which caused the insertion of EPOR gene into the highly repetitive IGH locus. Y-axis shows the coverage of RNA-seq at the two loci with arrows denoting the fusion breakpoints. None of the four public methods can detect this fusion
Fig. 2Fusion detection using CICERO. a Overview of CICERO algorithm which consists of fusion detection through analysis of candidate SV breakpoints and splice junction, fusion annotation, and ranking; key data sets used in each step are labeled. b Workflow of fusion detection. A user can submit an aligned BAM file or a raw fastq file as the input on a local computer cluster or on St. Jude Cloud. The raw output can be curated using FusionEditor and final results can be exported as a text file
Fig. 3Visualization interface of FusionEditor for curating fusions predicted from one sample. a Table view which shows the five “HQ” (high quality) in-frame fusions predicted in an infant ALL (SJINF011_D). A 3-gene fusion involving AFF1-RAD51B-KMT2A is recognized automatically and marked by a box labeled as “multi-seg.” The reciprocal KMT2A-AFF1 fusion was also identified as a HQ in-frame fusion. Inter-and intra-chromosomal fusions are labeled with red and black text, respectively. Known fusions are labeled with purple text (e.g., KMT2A-AFF1 and FLT3 ITD in this case). b Graphical view depicting the breakpoints on the protein domains of the three partner genes. Additional information such as the chimeric reads ratio for each fusion breakpoint is shown to support assessing the validity of each predicted fusion
Fig. 4Comparison of CICERO with other methods on driver fusion detection. a Distribution of leukemia, solid tumor, and brain tumor in the 170 RNA-seq used for benchmark test. b Prevalence of recurrent (≥ 3) gene fusions in the benchmark data sets stratified by the following four classes: chimeric transcript caused by exon-to-exon fusion expressed at high (> 5 FPKM) or low level, internal tandem duplication (ITD), and other non-canonical fusions involving intronic or intergenic regions. c Comparison of the sensitivity (top panel) and ranking of the driver fusions among all predicted fusions (bottom panel) by CICERO and five other methods (ChimeraScan, deFuse, FusionCatcher, STAR-Fusion, and Arriba) in the four categories of driver fusion. The ranking by CICERO, labeled CICERO_raw, is based on fusion score alone without incorporating matches to known fusion status. Error bars representing standard deviation of detection sensitivity at the top panel were calculated by bootstrapping of samples with 100 iterations. d True positives (dark blue) and false positives (light blue) of predicted somatic fusions identified by CICERO and other fusion detection programs. The exact number of events is marked as (true positive/total prediction) under the name of each method. CICERO’s high-quality predictions are compared to those of STAR-fusion and Arriba (left panel) while all CICERO predictions are compared to those of FusionCatcher, deFuse, and ChimeraScan (right panel)
Fig. 5Examples of additional fusions identified by CICERO from TCGA-GBM cohort. The protein domain of the cancer gene involved in a fusion is labeled by colored legend. a Comparison of gene fusions leading to truncation of EGFR C terminal autophosphorylation domain discovered only by CICERO with those reported by both CICERO and the TCGA Research Network. Sites marked as “Closs” refer to out-of-frame C-terminal truncation fusions while those marked with a gene symbol refer to in-frame fusions. b Gene fusions that are likely to cause kinase activation. For the KLHL7-BRAF fusion, we selected the protein encoded by the KLHL7 short isoform NM_001172428 because the fusion breakpoint occurred at the last exon unique to this transcript. c CCDC127-TERT fusion in TCGA-06-2564 leading to over-expression of TERT. Right panel shows the FPKM value of CCDC127 and TERT of the entire GBM cohort with the red dot marking the fusion sample TCGA-06-2564