| Literature DB >> 34863095 |
Stephanie LaHaye1, James R Fitch1, Kyle J Voytovich1, Adam C Herman1, Benjamin J Kelly1, Grant E Lammi1, Jeremy A Arbesfeld1, Saranga Wijeratne1, Samuel J Franklin1, Kathleen M Schieffer1, Natalie Bir1, Sean D McGrath1, Anthony R Miller1, Amy Wetzel1, Katherine E Miller1, Tracy A Bedrosian1, Kristen Leraas1, Elizabeth A Varga1, Kristy Lee1, Ajay Gupta2, Bhuvana Setty2,3, Daniel R Boué4,5, Jeffrey R Leonard3,6, Jonathan L Finlay2,3, Mohamed S Abdelbaki2,3, Diana S Osorio2,3, Selene C Koo4,5, Daniel C Koboldt1, Alex H Wagner1,3,7, Ann-Kathrin Eisfeld8,9,10, Krzysztof Mrózek9,10, Vincent Magrini1,3, Catherine E Cottrell1,3,4, Elaine R Mardis1,3, Richard K Wilson1,3, Peter White11,12.
Abstract
BACKGROUND: Pediatric cancers typically have a distinct genomic landscape when compared to adult cancers and frequently carry somatic gene fusion events that alter gene expression and drive tumorigenesis. Sensitive and specific detection of gene fusions through the analysis of next-generation-based RNA sequencing (RNA-Seq) data is computationally challenging and may be confounded by low tumor cellularity or underlying genomic complexity. Furthermore, numerous computational tools are available to identify fusions from supporting RNA-Seq reads, yet each algorithm demonstrates unique variability in sensitivity and precision, and no clearly superior approach currently exists. To overcome these challenges, we have developed an ensemble fusion calling approach to increase the accuracy of identifying fusions.Entities:
Keywords: Cancer; Gene fusions; Genomics; Pediatric neoplasms; RNA-Seq; Transcriptomics
Mesh:
Year: 2021 PMID: 34863095 PMCID: PMC8642973 DOI: 10.1186/s12864-021-08094-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Performance comparison of individual fusion calling algorithms. Fusion calling algorithms utilized by EnFusion and their contributions to fusion calling in the NCH pediatric cancer and hematologic disease cohort
| Tool | Version | Aligner | Reference | Average fusions called per case | Sensitivity (clinically relevant fusions called out of 67) |
|---|---|---|---|---|---|
| Arriba | v1.2.0 | STAR aligner | Uhrig et al., 2021 [ | 54 | 88.1% (59) |
| CICERO | v0.3.0 | candidate SV (structural variant) breakpoints and splice junction | Tian et al., 2020 [ | 1909 | 92.5% (62) |
| FusionMap | v mono-2.10.9 | GSNAP (Genomic Short-read Nucleotide Alignment Program) - 12mer based | Ge et al., 2011 Bioinformatics [ | 34 | 86.6% (58) |
| FusionCatcher | v0.99.7c | 4 aligners to identify junctions (Bowtie, BLAT, STAR, and Bowtie2) | Nicorici et al. | 1554 | 89.6% (60) |
| JAFFA | direct v1.09 | BLAT, uses kmers to selects reads that do not map to known transcripts | Davidson et al., 2015 [ | 1134 | 97.0% (65) |
| MapSplice | v2.2.1 | approximate sequence alignment combined with a local search | Wang et al., 2010 [ | 37 | 85.1% (57) |
| STAR-Fusion | v1.6.0 | STAR aligner | Haas et al., 2019 [ | 71 | 94.0% (63) |
Fig. 1The EnFusion pipeline identifies true positive fusions. A The EnFusion approach identifies fusions in RNA-Seq data by overlapping results from Arriba, CICERO, FusionCatcher, FusionMap, JAFFA, MapSplice, and STAR-Fusion. It hierarchically prioritizes and filters the fusions utilizing an in-house PostgreSQL database and knowledge-base, prior to producing an output list of predicted fusions. In many cases, detected fusions were orthogonally tested by clinical confirmation in order to return a medically meaningful result. B The EnFusion pipeline was tested on a dilution series of a reference control reagent (SeraCare) to determine sensitivity and limit of detection. We optimized the pipeline using the undiluted reference control reagent, identifying that by requiring ≥3 callers to have overlap for a detected fusion, and by utilizing filtering of known false positive fusion calls and cross-referencing a list of known fusions, all 14 fusions were identified. Colors representing different fusions present in the Seraseq v2 reagent are ordered by their absolute proportions. We then applied the optimized pipeline to the dilution series, showing that the numbers of identified fusions were reduced in serial dilutions, and no fusions were identified in the negative control. All images depicted in the associated figure are the authors’ own and not taken from another source
Improved precision in fusion detection, utilizing Seraseq controls, achieved by EnFusion. Data shown is from undiluted Seraseq v3 RNA-Seq, experiments performed in duplicate, averages are shown. Individual algorithms are listed by precision, in descending order. Seraseq fusions identified (true positive) are out of a possible 14 fusions
| Algorithm | Total fusions identified | Seraseq fusions identified | Sensitivity | Precision |
|---|---|---|---|---|
| Arriba | 23.5 | 13 | 92.9% | 55.3% |
| MapSplice | 22 | 12 | 85.7% | 54.6% |
| STAR-Fusion | 32 | 14 | 100.0% | 43.6% |
| FusionMap | 30 | 12.5 | 89.3% | 41.7% |
| FusionCatcher | 299.5 | 13 | 92.9% | 4.3% |
| JAFFA | 470.5 | 12.5 | 89.3% | 2.7% |
| CICERO | 1323 | 14 | 100.0% | 1.1% |
| EnFusion 2 callers | 40 | 14 | 100.0% | 35.0% |
EnFusion 2 callers + filter | 15.5 | 12 | 85.7% | 77.4% |
EnFusion 2 callers + filter + known fusion list | 17.5 | 14 | 100.0% | 80.0% |
| EnFusion 3 callers | 15.5 | 14 | 100.0% | 90.3% |
EnFusion 3 callers + filter | 12 | 12 | 85.7% | 100.0% |
EnFusion 3 callers + filter + known fusion list | 14 | 14 | 100.0% | 100.0% |
Fig. 2Clinically relevant fusions identified by the EnFusion approach in a pediatric cancer and hematologic disease cohort. A The EnFusion approach, with automated filtering, identifies significantly fewer fusions compared to individual callers. The number of fusions is plotted as log10(x + 1) to account for 0 fusions identified in some cases. Callers are sorted by the lowest median number of fusions identified to the highest. B 67 Clinically relevant fusions were identified, represented as a bar graph with decreasing fusions per individual algorithm, highlighting the sensitivity of the ensemble approach compared to individual algorithms. No individual algorithm was able to identify all 67 fusions. C Of the 67 clinically relevant fusions identified, 30 were interchromosomal chimeric (blue), 29 were intrachromosomal chimeric (orange), 3 were loss of function (green), and 5 were promoter swapping (yellow) fusions. D Of the 67 clinically relevant fusions identified, 7 are novel events (red asterisk), while the remaining 60 fusion partners had been described previously in the literature. E A stacked bar graph represents the individual fusion callers that contributed to each clinically relevant fusion. EnFusion 3 refers to the EnFusion approach with ≥3 callers and EnFusion 2 refers to the EnFusion approach with ≥2 callers. All images depicted in the associated figure are the authors’ own and not taken from another source
Fig. 3An RBPMS-MET fusion identified in a patient with an infantile fibrosarcoma-like tumor. A RBPMS-MET fusion was identified by all seven fusion callers in the filtered overlap results. The number of fusions identified by each caller is in the outer VENN diagram sections, while internal numbers indicate overlapping fusions found post-filtering (0 overlaps between callers are not shown). B The RBPMS-MET fusion is an interchromosomal event, occurring between 8p12 and 7q31.2 and joining exon 5 of RBPMS (blue) to exon 15 of MET (red). C The fusion protein product includes the RNA recognition motif domain of RBPMS and the tyrosine kinase catalytic domain of MET. D The RBPMS-MET fusion is predicted to cause constitutive phosphorylation and activation of MET, targetable using cabozantinib. All images depicted in the associated figure are the authors’ own and not taken from another source
Fig. 4Targetable NTRK1 fusion identified in an infiltrating glioma. A The BCAN-NTRK1 fusion was identified by 5 of 7 fusion callers, and was the only fusion returned by the filtered overlap results. Total fusions identified by each caller are shown, FusionMap and MapSplice identified no overlapping fusions that passed filtering (0 overlaps between callers are not shown). B The BCAN-NTRK1 fusion is an intrachromosomal event occurring on 1q23.1, joining exon 6 of BCAN (blue) and exon 8 of NTRK1 (red). C This fusion results in the juxtaposition of the tyrosine kinase catalytic domain of the NTRK1 gene to the 5′ end of the BCAN gene. D NTRK1 is highly expressed in this patient (red) compared to CNS tumors (black) in the NCH cohort (CNS tumors: n = 138), with a normalized read count that is 7.70 standard deviations above the mean (131.2). E The BCAN-NTRK1 fusion is predicted to increase expression and activation of the tyrosine kinase NTRK1, which may be inhibited by TRK inhibitor therapy (green). All images depicted in the associated figure are the authors’ own and not taken from another source
Fig. 5Identification of a novel BRAF fusion in a mixed neuronal-glial tumor. A The TRIM22-BRAF fusion was identified by all seven fusion callers and in the filtered overlap results, total fusions identified by each caller and overlapping fusions are shown (0 overlaps between callers are not shown). B The TRIM22-BRAF fusion is an interchromosomal event between 11p15.4 and 7q34, joining exon 2 of TRIM22 (blue) to exon 9 of BRAF (red). C The resulting fusion product contains the 5′ TRIM22 zinc finger binding domains and BRAF tyrosine kinase catalytic domain. D Single sample gene set enrichment analysis (ssGSEA) indicates a trend toward an enrichment of the MEK (above the 75th percentile, 0.68 standard deviations above the mean of 22,756.87), RAF (above the 75th percentile, 0.60 standard deviations above the mean of 22,635.74), and mTOR (above the 75th percentile, 0.72 standard deviations above the mean of 22,191.50) upregulated gene sets in the TRIM22-BRAF sample (red) compared to the pan-cancer NCH cohort (black) (pan-cancer cohort: n = 229). E The TRIM22-BRAF fusion is predicted to cause constitutive dimerization and activation of the BRAF kinase domain, shown in D, which could be targeted by RAF, MEK, and mTOR inhibitors (green). All images depicted in the associated figure are the authors’ own and not taken from another source