| Literature DB >> 19958476 |
Hua Bao1, Yuanyan Xiong, Hui Guo, Renchao Zhou, Xuemei Lu, Zhen Yang, Yang Zhong, Suhua Shi.
Abstract
BACKGROUND: Next-generation sequencing technologies provide exciting avenues for studies of transcriptomics and population genomics. There is an increasing need to conduct spliced and unspliced alignments of short transcript reads onto a reference genome and estimate minor allele frequency from sequences of population samples.Entities:
Mesh:
Year: 2009 PMID: 19958476 PMCID: PMC2788365 DOI: 10.1186/1471-2164-10-S3-S13
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Benchmark results of short reads alignment programs.
| Unspliced alignment | Spiced alignment | |||||
|---|---|---|---|---|---|---|
| Program | True positive (%) | False positive (%) | Running time | True positive (%) | False positive (%) | Running time |
| SHRiMP | 94.79 | 8.97 | 809 | N/A | N/A | N/A |
| SeqMap | 96.50 | 6.71 | 447 | N/A | N/A | N/A |
| SOAP | 96.41 | 6.72 | 101 | N/A | N/A | N/A |
| MAQ | 96.53 | 6.73 | 138 | N/A | N/A | N/A |
| Qpalma | N/A | N/A | N/A | 85.17 | 4.45 | 557 |
| MapNext | 96.51 | 6.72 | 209 | 83.89 | 4.31 | 231 |
A total of 1893118 reads (35 bp length, 134274 spliced and 1758844 unspliced) from 5796 coding DNA sequences of chromosome I of Arabidopsis thaliana for the query dataset were simulated. Accuracy measures were calculated under the same threshold by allowing at most two mismatches. For the unspliced alignment, the true and false positive rates equal the number of short reads with correct and incorrect alignment positions divided by 1758844, respectively. For the spliced alignment, the true and false positive rates equal the number of short reads with correct and incorrect alignment positions divided by 134274.
Accuracy of SNP detection produced by MapNext.
| Coverage | True Positives | False Positives |
|---|---|---|
| 4× | 1961 (90.70%) | 690 (29.51%) |
| 6× | 1998 (92.41%) | 23 (1.06%) |
| 8× | 2015 (93.20%) | 8 (0.37%) |
| 10× | 2043 (94.50%) | 0 (0.00%) |
| 12× | 2068 (95.65%) | 0 (0.00%) |
There were 2162 true SNPs in 50 individuals (haploid) in our simulation. Coverage equals sequencing depth per individual. QV, NQV, MMAF and MC were set at 25, 20, 0.01 and 50 (1× per individual), respectively.
Figure 1The accuracy of minor allele frequency estimation produced by MapNext. The mean (SD) of minor allele frequencies are given. QV, NQV, MMAF and MC are set 25, 20, 0.01 and 50 respectively. (a) The sequencing coverage is 2× per individual. (b) The sequencing coverage is 12× per individual.