| Literature DB >> 21988959 |
Shao-Ke Lou1, Jing-Woei Li, Hao Qin, Aldrin Kay-Yuen Yim, Leung-Yau Lo, Bing Ni, Kwong-Sak Leung, Stephen Kwok-Wing Tsui, Ting-Fung Chan.
Abstract
BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths.Entities:
Mesh:
Year: 2011 PMID: 21988959 PMCID: PMC3226252 DOI: 10.1186/1471-2105-12-S5-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Alignment of two paired-end (PE) reads. Read 1 and read 2 could be spliced into two fragments as lsp0 and lsp1, respectively. lis is the gap length between two PE reads mapped onto the reference genome.
Figure 2GT distribution and insert-size distribution. A) Distribution of intron length and GT estimation. The upper right is an enlarged region from 1 to 500bp. B) Density of insert-size. S4 is marked in red and S5 is marked in green.
Figure 3A comparison of splice junctions between the GT-based model, Tophat, and SpliceMap, and with the ASTD. A) S4 GT-based model with SpliceMap; B) S5 GT-based model with SpliceMap; C) S4 GT-based model with TopHat; and D) S5 GT-based model with TopHat.
Splice sites comparison and EST validation
| S4 | Model 1 | Model 2 | Model 3 |
|---|---|---|---|
| TOTAL splice sites | 102074 | 102784 | 101974 |
| Confirmed by EST | 98109 | 98819 | 98008 |
| % confirmed by EST | 96.12% | 96.14% | 96.11% |
| Splice sites by TopHat | 66988 | ||
| Splice sites by SpliceMap | 66452 | ||
| TOTAL splice sites | 89249 | 90269 | 89189 |
| Confirmed by EST | 85938 | 86958 | 85878 |
| % confirmed by EST | 96.29% | 96.33% | 96.29% |
| Splice sites by TopHat | 60659 | ||
| Splice sites by SpliceMap | 60625 | ||
Figure 4Workflow of this study. Comparison of the GT-model to TopHat and SpliceMap