| Literature DB >> 35251131 |
Wenjiang Deng1, Sarath Murugan1, Johan Lindberg1, Venkatesh Chellappa1, Xia Shen1,2,3, Yudi Pawitan1, Trung Nghia Vu1.
Abstract
Several fusion genes are directly involved in the initiation and progression of cancers. Numerous bioinformatics tools have been developed to detect fusion events, but they are mainly based on RNA-seq data. The whole-exome sequencing (WES) represents a powerful technology that is widely used for disease-related DNA variant detection. In this study, we build a novel analysis pipeline called Fuseq-WES to detect fusion genes at DNA level based on the WES data. The same method applies also for targeted panel sequencing data. We assess the method to real datasets of acute myeloid leukemia (AML) and prostate cancer patients. The result shows that two of the main AML fusion genes discovered in RNA-seq data, PML-RARA and CBFB-MYH11, are detected in the WES data in 36 and 63% of the available samples, respectively. For the targeted deep-sequencing of prostate cancer patients, detection of the TMPRSS2-ERG fusion, which is the most frequent chimeric alteration in prostate cancer, is 91% concordant with a manually curated procedure based on four other methods. In summary, the overall results indicate that it is challenging to detect fusion genes in WES data with a standard coverage of ∼ 15-30x, where fusion candidates discovered in the RNA-seq data are often not detected in the WES data and vice versa. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.Entities:
Keywords: acute myeloid leukemia; discordant read; fusion gene; prostate cancer; split read; whole exome sequencing
Year: 2022 PMID: 35251131 PMCID: PMC8888970 DOI: 10.3389/fgene.2022.820493
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Workflow of Fuseq-WES to detect fusion genes from whole-exome sequencing data.
FIGURE 2Construction of fusion equivalence class and fusion transcripts; prediction of fusion genes.
Number of supporting reads for PML-RARA in BeatAML samples.
| Sample ID | Fusion | Discordant Read | Split Read | Total |
|---|---|---|---|---|
| Sample_13-00 226 | PML-RARA | 2 | 2 | 4 |
| Sample_14-00 831 | PML-RARA | 0 | 2 | 2 |
| Sample_20-00 147 | PML-RARA | 5 | 7 | 12 |
| Sample_20-00 566 | PML-RARA | 1 | 1 | 2 |
The number of samples harboring fusion genes detected from RNA-seq data; number of matched samples with exome sequencing data and number of samples carrying fusion genes identified using WES data in BeatAML and TCGA dataset, respectively.
| BeatAML RNA-seq data | WES data | Fuseq-WES | |
|---|---|---|---|
| PML-RARA | 16 | 11 | 4 |
| CBFB-MYH11 | 25 | 24 | 15 |
| RUNX1-RUNX1T1 | 9 | 6 | 0 |
| — | TCGA RNA-seq data | WES data | Fuseq-WES |
| PML-RARA | 16 | 6 | 3 |
| CBFB-MYH11 | 11 | 6 | 0 |
| RUNX1-RUNX1T1 | 7 | 4 | 2 |
Comparison of detection results for TMPRSS2-ERG fusion in ProBio patients using the IGV and Fuseq-WES methods. Overall there is 91% concordance between the two methods.
| Positive IGV | Negative IGV | Total | |
|---|---|---|---|
| Postive Fuseq-WES | 36 | 5 | 41 |
| Negative Fuseq-WES | 1 | 23 | 24 |
| Total | 37 | 28 | 65 |
Fuseq-WES detection results (in terms of the number of supporting reads) for TMPRSS2-ERG fusion in 10 ProBio samples. For each ProBio sample, we obtained random subsamples of the reads of the original data at various lower-coverage levels.
| Sample | 150x (10%) | 75x (5%) | 30x (2%) | 15x (1%) | 7.5x (0.05%) |
|---|---|---|---|---|---|
| 1 | 17 | 12 | 6 | 2 | 0 |
| 2 | 3 | 1 | 0 | 0 | 0 |
| 3 | 38 | 17 | 9 | 5 | 2 |
| 4 | 5 | 3 | 0 | 0 | 0 |
| 5 | 5 | 3 | 1 | 0 | 0 |
| 6 | 2 | 1 | 2 | 0 | 0 |
| 7 | 1 | 1 | 0 | 0 | 0 |
| 8 | 160 | 64 | 28 | 12 | 4 |
| 9 | 5 | 2 | 2 | 2 | 1 |
| 10 | 65 | 31 | 16 | 7 | 5 |