| Literature DB >> 18307793 |
Andrew D Smith1, Zhenyu Xuan, Michael Q Zhang.
Abstract
BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from approximately 25-50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores.Entities:
Mesh:
Year: 2008 PMID: 18307793 PMCID: PMC2335322 DOI: 10.1186/1471-2105-9-128
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparison of mapping accuracy of RMAPM criterion under different parameter combinations. Comparison of mapping accuracy for reads of different lengths, and allowing different numbers of mismatches without using quality scores. Both the target (BAC) region coverage (a) and the mapping selectivity (b) are displayed. The mean of these two measures is presented in (c) as mapping accuracy. Standard error of displayed values was always ≤ 1.0% and usually < 0.1%, as estimated by mapping reads obtained from the second lane of the same sequencing run of the same BAC regions (this applies also to values in Figure 2).
Figure 2Mapping accuracy of RMAPQ criterion under varying parameters. Reads with length from 25–36 nt were mapped and 0,1, or 2 mismatches were allowed at high quality bases defined by quality score cutoffs of 4 (d) or 8 (a-d). For reference, mapping performance of RMAPM criterion with at most 2 mismatches is also shown. (a) The BAC coverage; (b) the mapping selectivity; (c) the overall mapping accuracy (equal to the mean of the BAC coverage and selectivity).(d) 2-D performance comparison in both BAC coverage and selectivity of RMAPM and RMAPQ.