| Literature DB >> 30200994 |
Readman Chiu1, Ka Ming Nip1, Justin Chu1, Inanc Birol2,3.
Abstract
BACKGROUND: RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data.Entities:
Keywords: Acute myeloid leukemia; Alternative splicing; Clinical genomics; Gene fusion; Internal tandem duplication; Partial tandem duplication; RNA-seq; Transcriptome assembly
Mesh:
Substances:
Year: 2018 PMID: 30200994 PMCID: PMC6131862 DOI: 10.1186/s12920-018-0402-6
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1TAP Pipeline. A Bloom filter is generated from reference transcript sequences of a target list and then applied on full transcritpome RNA-seq sequences to extract gene-specific read pairs. Reads classified to each target are segregated into separate bins and assembled using two k-mer values independently in parallel. Contigs from each k-mer assembly of each gene are merged and extracted reads are aligned to them (r2c). Gene-level assemblies are combined into a single file and aligned to the genome (c2g) and transcriptome (c2t). PAVFinder uses the c2g and c2t alignments together with contig sequences and annotation (reference sequences and gene models) to identify structural variant and novel splicing events. r2c alignments are used for determining event support and coverage estimation
Fig. 2PAVFinder detects both (a) structural rearrangements and (b) novel splicing variants. Numbers indicate reference transcript exon numbers. Dotted red lines represent novel adjacencies (joining between non-adjacent transcript sequences) and red blocks represent novel sequences. For splicing variants, canonical splice site motifs are indicated as they are checked for calling potential novel splicing events. Dotted vertical lines depict algorithm for detecting novel splicing variants by aligning contig sequences against annotated gene model
Fig. 3Simulation experiment to assess PAVFinder fusion calling performance in relation to sequencing coverage and other software. a. Design of experiment: reads simulated from fusion breakpoints and corresponding reference transcript sequences at different read depths are combined to simulate the titration series. b. Receiver Operating Characteristic (ROC) plots of PAVFinder, Tophat-Fusion [12], and deFuse [13] on 448 fusion events reported on TCGA data [39]
Leucegene AML samples analyzed in this study
| Cohort | Number of samples analyzed | GEO Accession |
|---|---|---|
| core-binding factor (CBF) | 46 | GSE62190 |
| GSE67039 | ||
| GSE52656 | ||
|
| 7 | GSE49642 |
| GSE67039 | ||
| GSE52656 | ||
| 31 | GSE52656 | |
| GSE49642 | ||
| GSE67039 | ||
| GSE52656 | ||
|
| 12 | GSE67039 |
| GSE52656 | ||
| GSE49642 | ||
| GSE62190 | ||
| GSE66917 | ||
| 1 | GSE67039 |
Previously identified aberrant splice events [45, 46] detected in the Leucegene samples analyzed
| Variant | Number of positive samples |
|---|---|
|
| 27 |
|
| 53 |
|
| 54 |
|
| 77 |
|
| 92 |
|
| 51 |
|
| 3 |
FLT3-Vb* – skipped exon 5 and 13-bp deletion at of exon 4 3′ end instead of skipped exons 5 and 7 and partial deletions of exons 6 and 8
FLT3-Vc* – skipped exons 5,6,7 and 13-bp deletion of exon 4 at 3’end and 76-bp deletion of exon 8 at 5′ end instead of 26-bp deletion of exon 8 at 5′ end