| Literature DB >> 33317447 |
Krutika S Gaonkar1,2,3, Federico Marini4,5, Komal S Rathi1,2,3, Payal Jain1,3, Yuankun Zhu1,3, Nicholas A Chimicles1,2, Miguel A Brown1,3, Ammar S Naqvi1,2,3, Bo Zhang1,3, Phillip B Storm1,3, John M Maris6, Pichai Raman1,2,3, Adam C Resnick1,2,3, Konstantin Strauch4, Jaclyn N Taroni7, Jo Lynne Rokita8,9,10.
Abstract
BACKGROUND: Gene fusion events are significant sources of somatic variation across adult and pediatric cancers and are some of the most clinically-effective therapeutic targets, yet low consensus of RNA-Seq fusion prediction algorithms makes therapeutic prioritization difficult. In addition, events such as polymerase read-throughs, mis-mapping due to gene homology, and fusions occurring in healthy normal tissue require informed filtering, making it difficult for researchers and clinicians to rapidly discern gene fusions that might be true underlying oncogenic drivers of a tumor and in some cases, appropriate targets for therapy.Entities:
Keywords: Annotation tool; Cancer; Gene fusions; Oncogenes; RNA-seq; Shiny web application
Mesh:
Substances:
Year: 2020 PMID: 33317447 PMCID: PMC7737294 DOI: 10.1186/s12859-020-03922-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Available fusion annotation and prioritization tools
| Tools | Software implementation | Compatible fusion algorithms | Annotations | Prioritization | Visualization | Code availability | Last Updated |
|---|---|---|---|---|---|---|---|
| annoFuse | R | Arriba, STAR-Fusion for annoFuse; agnostic for shinyFuse | Protein Domains, Kinase, Transcription Factor, Oncogene, Tumor Suppressor Gene, Kinase Domain Retention, Reciprocality, CTAT_HumanFusionLib, Gene Homolog, Expression | Based on quality/artifact filtering, functional, and oncogenic annotation | Genomic view of single or multiple fusion events() with domains, Recurrent fusions and recurrently fused genes barplots | September 2020+ | |
| AGFusion | Python | Algorithm agnostic; requires two fusion gene partners (gene symbol, Ensembl ID, or Entrez gene ID), predicted fusion junctions in genomic coordinates, genome build | Protein domains | Amino acid structure of fusion | September 2019 | ||
| chimera | R package | chimeraScan, bellerophontes, deFuse, FusionFinder, FusionHunter, mapSplice, tophat-fusion, FusionMap, STAR, Rsubread, fusionCatcher | Uses Oncofuse to prioritize oncogenic fusions | March 2015 | |||
| chimeraviz | R package | deFuse, EricScript, InFusion, JAFFA, FusionCatcher, FusionMap, PRADA, SOAPFuse, STAR-Fusion | Genomic view of fusion event (Breakpoints required) | October 2019 | |||
| co-Fuse | R | FusionCatcher, SoapFuse, TopHat, DeFuse | Recurrence detection | May 2017 | |||
| FusionAnnotator | perl | Algorithm agnostic; requires fusion gene partners | CTAT_HumanFusionLib, Gene Homolog | July 2019 | |||
| FusionHub | web-based | Algorithm agnostic; requires fusion gene partners to search databases for fusion and uses breakpoints from ChimerSeq to visualize | Utilizes oncofuse for oncogenicity | Circos, domain, network view of fusions | June 2018 | ||
| FusionPathway | R | Algorithm agnostic; Fusion gene Entrez IDs and gene symbols, Pfam protein domain IDs of parental genes | Oncogenic pathway association using GSEA | September 2019 | |||
| Oncofuse | Java/Groovy | Tophat-fusion and TopHat2, FusionCatcher, RNASTAR | Protein domains, expression | Bayesian probability of oncogenicity | February 2016 | ||
| Pegasus | Java/Perl/Python/Bash | chimerascan 0.4.5, defuse 0.4.3, bellerophontes 0.4.0 | Kinase, Protein domains | Driver Score for oncogenicity | February 2016 | ||
| confFuse | Python | deFuse | Feature dependent scoring | November 2017 |
List of ten openly-available fusion annotation and prioritization software tools, compared to annoFuse. Only AGFusion, FusionAnnotator, Fusion Pathway, and certain functions of FusionHub are algorithm agnostic, and most algorithms require outdated fusion algorithm input
Fig. 1Graphical representation of the annoFuse pipeline. RNA-seq data processed through STAR-RSEM and fusion calls generated by Arriba v1.1.0 and/or STAR-Fusion 1.5.0 are inputs for the pipeline. The fusion_standardization function standardizes calls from fusion callers to retain information regarding fused genes, breakpoints, reading frame information, as well as annotation from FusionAnnotator. Standardized fusion calls use fusion_filtering_QC to remove false positives such as fusions with low read support, annotated as read-throughs, found in normal and gene homolog databases and remove non-expressed fusions using expression_filter_fusion. Calls are annotated with annotate_fusion_calls to include useful biological features of interest (eg. Kinase, Tumor suppressor etc.) Project-specific filtering captures recurrent fused genes using functions to filter (shown in boxes) as well as putative driver fusion. Outputs available from annoFuse include TSV files of annotated and prioritized fusions, a PDF summary of fusions, recurrently-fused gene/fusion plots, and an HTML report. Finally, users can explore fusion data interactively using shinyFuse. (Created with BioRender.com)
Fusion filtering and annotation criteria
| Order | Description | Filtering type | Rationale | Output type | Filtering criteria |
|---|---|---|---|---|---|
| 1 | Artifact filtering for readthroughs (readthrough fusions in mittelman database are not filtered) | QC | To filter artifacts | All | General |
| 2 | Artifact filtering for fusions found in normal datasets and gene homologs (red flag database from FusionAnnotator) | QC | To filter artifacts | All | General |
| 3 | JunctionReadCount == 0 | QC | To filter out false calls | All | General |
| 4 | SpanningFragCount-JunctionReadCount >= 100 | QC | To filter false calls from low mapping | All | General |
| 5 | Both gene partners with FPKM < 1 | QC | To filter out not expressed fused genes | All | General |
| 6 | Fused genes with either gene in TSGs,Cosmic,Oncogenic,TCGA fusion list | Gene-list | To capture cancer-specific fusions | Putative-driver | Project-specific |
| 7 | Local Rearrangement | QC | To remove local rearrangement within neighbouring genes | Filtered-fusion | Project-specific |
| 8 | Fusion is called by both callers | QC | To filter out calls from only 1 caller | Filtered-fusion | Project-specific |
| 9 | Fusion is called in atleast 2 samples per histology | Recurrence | To gather recurrent fusion calls | Filtered-fusion | Project-specific |
| 10 | Fusion in Filtered-fusions found in more than 1 histology | QC | To remove fusions from Filered-fusion list that are found in more than1 histology | Filtered-fusion | Project-specific |
| 11 | Fused genes in Filtered-fusion fusions found to be multi-fused more than 5 times in a sample | QC | To remove fusions from Filtered-fusion list that are found to be multi-fused | Filtered-fusion | Project-specific |
| 12 | Add recurrent fusions that pass QC from steps 7–11 | Recurrence | To add non-oncogenic fusions to putative-driver fusion list | Putative-driver + recurrent non-oncogenic fusion | Project-specific |
Fusion filtering criteria were developed to gather high quality recurrent fusion calls while retaining fusions containing oncogenes and/or tumor suppressor genes. Filtering is divided into 3 types (1) QC: filters known causes of false positives. (2) Gene-list: retains additional fusions in genes and fusions of interest list. (3) Recurrence: filters out non-recurrent fusions in genes not annotated as putative oncogenic. Annotation lists are also described
Fig. 4Breakpoint distribution for KIAA1549-BRAF fusion. Displayed are each fusion gene, all known transcripts, and their genomic coordinates. Red dotted lines in each gene panel are the locations of breakpoints detected for KIAA1549 (3′) and BRAF (5′) compiled from both Arriba and STAR-Fusion. Strand information is depicted with an arrow in the gene and domain as colored boxes. Black boxes represent exons for each transcript
Sensitivity of TCGA fusion calls
| spanningDelta cutoff | |
|---|---|
| 10 | 0.709784411 |
| 20 | 0.807628524 |
| 30 | 0.870646766 |
| 40 | 0.893864013 |
| 50 | 0.922056385 |
| 100 | 0.963515755 |
| 150 | 0.973466003 |
| 200 | 0.983416252 |
Fusion standardization and fusion artifact filtering was conducted on a subset of TCGA samples and compared to published filtered fusion calls from The Fusion Analysis Working Group. SpannigFragCountFilter cutoffs of 10, 20, 30, 40, 50, 100, 150, and 200 were assessed to determine sensitivity of annoFuse prioritized fusion calls
Validation of annoFuse prioritization using PPTC PDX fusion calls
| Histology | PPTC STAR-Fusion/FusionCatcher/SOAPFuse/ deFuse (n detected) | PPTC STAR-Fusion/Arriba (n detected) | annoFuse STAR-Fusion/Arriba (n retained) | % Retained with annoFuse |
|---|---|---|---|---|
| ALL | 117 | 75 | 72 | 96 |
| CNS Embryonal | 4 | 3 | 3 | 100 |
| Ependymoma | 2 | 0 | 0 | NA |
| Ewing Sarcoma | 11 | 10 | 10 | 100 |
| Glioblastoma | 1 | 1 | 0 | 0 |
| Osteosarcoma | 18 | 15 | 15 | 100 |
| Other Brain | 1 | 1 | 1 | 100 |
| Other Sarcoma | 4 | 3 | 3 | 100 |
| Rhabdomyosarcoma | 7 | 6 | 6 | 100 |
| Wilms | 1 | 0 | 0 | NA |
| Total | 166 | 114 | 110 | 96 |
| ALL truth set | 27 | 23 | 23 | 100 |
Retention of high-confidence, putative oncogenic calls averaged 96% across the entire PPTC PDX dataset and was 100% for the ALL truth set (ALL = acute lymphoblastic leukemia). Column 1 = PPTC histology, Column 2 = fusion calls from STAR-Fusion, FusionCatcher, deFuse, and SOAPFuse which were filtered and reported as high-confidence in the PPTC dataset, Column 3 = PPTC reported fusions detected from STAR-Fusion and Arriba, Column 4 = Fusions retained following annoFuse filtering, Column 5 = Percent of fusions retained after applying annoFuse
Fig. 2Fusion annotations generated by annoFuse a Distribution of intra- and inter-chromosomal fusions across histologies. b Transcript frame distribution of fusions detected by Arriba and STAR-Fusion algorithms. c Bubble plot of gene partner distribution with respect to ENSEMBL biotype annotation (Size of circle proportional to number of genes). d Barplots representing the distribution of kinase groups represented in the PBTA cohort annotated by gene partner. (Alpha_kinase = Alpha-kinase family, Hexokinase_2 = Hexokinase, PI3_PI4_kinase = Phosphatidylinositol 3- and 4-kinase, Pkinase = Protein kinase domain, Pkinase_C = Protein kinase C terminal domain, Pkinase_Tyr = Protein tyrosine kinase). e Bubble plot representing the distribution of fused genes as oncogenes, tumor suppressor genes, kinases, COSMIC, predicted and curated transcription factors (Size of circle proportional to number of genes). Genes belonging to more than one category are represented in each. In all panels except for B, fusion calls were merged from both STAR-Fusion and Arriba
Fig. 3Recurrent fusion plots generated by annoFuse. Bar plots as representative of histology showing recurrent fusion calls by number of patients (a) and recurrently-fused genes by number of patients (b) after filtering and annotation