| Literature DB >> 30004447 |
Hai Xu1, Xiaojin Wu2, Dawei Sun3, Shijun Li4, Siwen Zhang5, Miao Teng6, Jianlong Bu7, Xizhe Zhang8, Bo Meng9, Weitao Wang10, Geng Tian11, Huixin Lin12, Dawei Yuan13, Jidong Lang14, Shidong Xu15.
Abstract
With the development and application of next-generation sequencing (NGS) and target capture technology, the demand for an effective analysis method to accurately detect gene fusion from high-throughput data is growing. Hence, we developed a novel fusion gene analyzing method called single-end gene fusion (SEGF) by starting with single-end DNA-seq data. This approach takes raw sequencing data as input, and integrates the commonly used alignment approach basic local alignment search tool (BLAST) and short oligonucleotide analysis package (SOAP) with stringent passing filters to achieve successful fusion gene detection. To evaluate SEGF, we compared it with four other fusion gene discovery analysis methods by analyzing sequencing results of 23 standard DNA samples and DNA extracted from 286 lung cancer formalin fixed paraffin embedded (FFPE) samples. The results generated by SEGF indicated that it not only detected the fusion genes from standard samples and clinical samples, but also had the highest accuracy and sensitivity among the five compared methods. In addition, SEGF was capable of detecting complex gene fusion types from single-end NGS sequencing data compared with other methods. By using SEGF to acquire gene fusion information at DNA level, more useful information can be retrieved from the DNA panel or other DNA sequencing methods without generating RNA sequencing information to benefit clinical diagnosis or medication instruction. It was a timely and cost-effective measure with regard to research or diagnosis. Considering all the above, SEGF is a straightforward method without manipulating complicated arguments, providing a useful approach for the precise detection of gene fusion variation.Entities:
Keywords: fusion detection; single-end gene fusion; single-end next-generation sequencing data
Year: 2018 PMID: 30004447 PMCID: PMC6070977 DOI: 10.3390/genes9070331
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Information of genes for gene fusion detection evaluation.
| Chromosome | Start Position | End Position | Gene Symbol |
|---|---|---|---|
| Chr2 | 29415640 | 30144477 |
|
| Chr2 | 42396490 | 42559688 |
|
| Chr6 | 117609530 | 117747018 |
|
| Chr5 | 149781200 | 149792499 |
|
| Chr4 | 25657435 | 25680368 |
|
| Chr10 | 43572517 | 43625797 |
|
| Chr10 | 61548506 | 61666414 |
|
| Chr10 | 32297938 | 32345371 |
|
| Chr20 | 43953929 | 43977064 |
|
| Chr6 | 159186773 | 159239340 |
|
| Chr1 | 154134289 | 154164611 |
|
| Chr12 | 59265937 | 59314319 |
|
| Chr6 | 117881433 | 117923705 |
|
| ChrX | 133594175 | 133634698 |
|
Figure 1The scheme shows how the single-end gene fusion (SEGF) works. Firstly, there is pre-processing of raw sequencing data, including trimming of the first and last N bp (red part) and merging the first and last M bp of the remaining sequence as paired soft-clipped contigs (PSCs) (green and blue part); the last remaining part (black part) was discarded, and not used in the following analysis. Basic local alignment search tool (BLAST) and Short Oligonucleotide Analysis Package (SOAP) were used to align PSCs into target gene references (yellow part) and genomic references (orange part) separately, keeping the result unique and fully mapped to reduce the influence of genomic repetitive regions. The mutual sequences of the two filtered results are considered as fusion sequences. If the number of mutual reads was larger than three, then the sample was considered as fusion positive, otherwise, fusion negative.
Comparison of gene fusion detection results for reference standards.
| Method | TP | TN | FP | FN | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| BWA-ALN + FACTERA | 7 | 0 | 0 | 16 | 30.43% | - |
| BWA-ALN + CREST | 0 | 0 | 0 | 23 | 0.00% | - |
| BWA-MEM + FACTERA | 11 | 0 | 0 | 12 | 47.83% | - |
| BWA-MEM + CREST | 0 | 0 | 0 | 23 | 0.00% | - |
| SEGF | 22 | 0 | 0 | 1 | 95.65% | - |
TP: true positive, TN: true negative, FP: false positive, FN: false negative, BWA: Burrows-wheeler aligner; FACTERA: Fusion and Chromosomal Translocation Enumeration and Recovery Algorithm CREST: Clipping REveals STructure.
Comparison of ALK–EML4 fusion detection results for clinical samples.
| Method | Sample | TP Rate | TP | TN | FP | FN | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|---|
| BWA-ALN + FACTERA | 286 | 0.35% | 1 | 270 | 0 | 15 | 6.25% | 100.00% |
| BWA-ALN + CREST | 286 | 1.40% | 4 | 270 | 0 | 12 | 25.00% | 100.00% |
| BWA-MEM + FACTERA | 286 | 0.70% | 2 | 270 | 0 | 14 | 12.50% | 100.00% |
| BWA-MEM + CREST | 286 | 2.80% | 8 | 270 | 0 | 8 | 50.00% | 100.00% |
| SEGF | 286 | 3.85% | 11 | 270 | 0 | 5 | 68.75% | 100.00% |