| Literature DB >> 30382840 |
Trung Nghia Vu1, Wenjiang Deng1, Quang Thinh Trac2, Stefano Calza3, Woochang Hwang4, Yudi Pawitan5.
Abstract
BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands.Entities:
Keywords: Fusion equivalence class; Fusion gene; Quasi-mapping; RNA sequencing
Mesh:
Substances:
Year: 2018 PMID: 30382840 PMCID: PMC6211471 DOI: 10.1186/s12864-018-5156-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1FuSeq pipeline for fusion gene detection: quasi-mapping of read pairs to extract mapped reads and split reads; statistical tests and filtering to eliminate false positive fusion genes; collecting and merging fusion gene candidates from mapped reads and split reads; de novo assembly to verify and determine fusion sequences; and exporting information of final candidates to files
Fusion discoveries in the cancer datasets. The results for TopHat-Fusion, SOAPfuse and JAFFA are collected from a recent study [14]
| FusionMap | TRUP | TopHat-Fusion | JAFFA | SOAPfuse | FuSeq | ||
|---|---|---|---|---|---|---|---|
| Breast cancer (TP27) | Total | 47 | 0 | 261 | 42 | 61 | 53 |
| TP | 12 | 0 | 24 | 20 | 24 | 22 | |
| Recall | 0.44 | - | 0.89 | 0.74 | 0.89 | 0.81 | |
| Precision | 0.26 | - | 0.09 | 0.48 | 0.39 | 0.42 | |
| F1 | 0.32 | - | 0.17 | 0.58 | 0.55 | 0.55 | |
| 0.32 | - | 9.1e-06 | 0.71 | 1 | - | ||
| Breast cancer (TP99) | Total | 47 | 0 | 261 | 42 | 61 | 53 |
| TP | 22 | 0 | 35 | 28 | 41 | 36 | |
| Recall | 0.22 | - | 0.35 | 0.28 | 0.41 | 0.36 | |
| Precision | 0.47 | - | 0.13 | 0.67 | 0.67 | 0.68 | |
| F1 | 0.30 | - | 0.19 | 0.40 | 0.51 | 0.47 | |
| 0.32 | - | 1.1e-08 | 1 | 1 | - | ||
| Melanoma | Total | 19 | 0 | 29 | 4 | 108 | 21 |
| TP | 3 | 0 | 4 | 2 | 10 | 7 | |
| Recall | 0.27 | - | 0.36 | 0.18 | 0.91 | 0.64 | |
| Precision | 0.16 | - | 0.14 | 0.5 | 0.09 | 0.33 | |
| F1 | 0.20 | - | 0.2 | 0.27 | 0.17 | 0.44 | |
| 0.48 | - | 0.32 | 0.62 | 0.02 | - | ||
| Glioma | Total | 191 | 209 | 308 | 904 | 299 | 188 |
| TP | 28 | 20 | 29 | 30 | 22 | 29 | |
| Recall | 0.90 | 0.65 | 0.94 | 0.97 | 0.71 | 0.94 | |
| Precision | 0.15 | 0.10 | 0.09 | 0.03 | 0.07 | 0.15 | |
| F1 | 0.25 | 0.17 | 0.12 | 0.05 | 0.13 | 0.26 | |
| 0.89 | 0.13 | 0.09 | 5.5e-08 | 0.02 | - |
We select the best result from the different runs ofcomparison. TP= true positive fusion genes, Total= total discovered fusion-gene candidates, P-value = two-sided p-value of Fisher’s exact test of the difference in precision between FuSeq vs each of the other methods
Fig. 2Discovery of AKAP9-BRAF fusion by FuSeq in the spike-in dataset SRR1659964. Also shown are the contig from the de novo assembly and a sample of 4 read-pairs near the junction break
Fusion discoveries in the spike-in dataset. The results for TopHat-Fusion, SOAPfuse and JAFFA for sample SRR1659964 are collected from a recent study [9]
| One sample(SRR1659964) | 20 samples | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| TP | Total | Precision | Recall | F1 | TP | Other | |||
| FusionMap | 7 | 26 | 0.27 | 0.78 | 0.40 | 0.78 | 142/180 | 283 | 0.04 |
| TRUP | 4 | 9 | 0.44 | 0.44 | 0.44 | 1 | 80/180 | 63 | 0.15 |
| JAFFA | 5 | 13 | 0.39 | 0.56 | 0.46 | 1 | 138/180 | 114 | 0.12 |
| SOAPfuse | 4 | 13 | 0.31 | 0.44 | 0.36 | 1 | NA | NA | NA |
| TopHat-Fusion | 1 | 6 | 0.17 | 0.11 | 0.13 | 0.66 | 133/180 | 925 | 1.6e-22 |
| FuSeq | 9 | 25 | 0.36 | 1 | 0.53 | - | 179/180 | 228 | - |
TP= true positive fusion genes, Other= unvalidated fusion genes, Total= total discovered fusion-gene candidates, P-value = two-sided p-value of Fisher’s exact test of the difference in precision between FuSeq vs each of the other methods
Fig. 3Comparisons of the operating characteristics in validated datasets. In each panel FuSeq result is given as a solid curve, and other results as dash curves or points. The results of JAFFA, SOAPfuse and TopHat-Fusion for breast-cancer and glioma datasets and melanoma and spike-in datasets are taken from Davison et al. [14] and Liu et al. [9], respectively. For purpose of visual comparison, the x-axis of the plots is limited mostly by the curves of FuSeq
Verification of fusion genes by de novo assembly
| FuSeq | FuSeq + de novo assembly | ||
|---|---|---|---|
| Breast cancer | Total | 22 | 4 |
| TP27 | 9 | 3 | |
| TP99 | 16 | 4 | |
| Melanoma | Total | 10 | 1 |
| TP | 3 | 1 | |
| Glioma | Total | 28 | 18 |
| TP | 4 | 4 | |
| Spike-in | Total | 25 | 12 |
| TP | 9 | 9 |
TP= true positive fusion genes, Total= total discovered fusion-gene candidates
Fig. 4Comparison of fusion gene detection methods in computational time. Four methods JAFFA, SOAPfuse, TopHat-Fusion and FuSeq are compared according to the average computational time per sample over a range of sample sizes. Comparison results of TRUP and FusionMap and FuSeq for sample SRR1659964 from the spike-in dataset are located at ∼94M of the x-axis. Note that both axes are in log-scale