| Literature DB >> 20576157 |
Nils Homer1, Stanley F Nelson, Barry Merriman.
Abstract
BACKGROUND: DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20576157 PMCID: PMC2911458 DOI: 10.1186/1471-2105-11-347
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Power of . Power calculated as the fraction of reads that correctly align. 10, 000 simulated reads from the E. Coli genome were generated.
Power of k-base encoding assuming a real-world per-base error-rate
| Power (0 SNPs) | Power (1 SNP) | Power (2 SNPs) | |
|---|---|---|---|
| 1 | 0.877 | 0.847 | 0.820 |
| 2 | 0.931 | 0.824 | 0.706 |
| 3 | 0.963 | 0.876 | 0.784 |
| 4 | 0.964 | 0.911 | 0.834 |
| 5 | 0.965 | 0.911 | 0.840 |
Power calculated as the fraction of reads that correctly align. 10, 000 simulated 50 bp reads from the E. Coli genome were generated with an estimated real-world error rate.
Figure 2False SNP discovery rate for . False positive SNP discovery rate calculated as the fraction of reads that have a SNP call after alignment when no SNP call is expected. 10, 000 simulated reads from the E. Coli genome were generated.
Figure 3False negative SNP discovery rate for . False negative SNP discovery rate calculated as the fraction of reads that do not call a SNP after alignment when a SNP call is expected. 10, 000 simulated reads from the E. Coli genome were generated.
Figure 4Flexibility of scoring systems for 5-base encoding. Power of scoring system evaluation for 5 base encoding. 1, 000 simulated reads from the E. Coli genome were generated.
Performance of k-base encoding
| Time in s (0 SNPs) | Time in s (1 SNP) | Time in s (2 SNPs) | |
|---|---|---|---|
| 1 | 7 | 7 | 7 |
| 2 | 65 | 65 | 65 |
| 3 | 403 | 346 | 403 |
| 4 | 2178 | 2166 | 2178 |
| 5 | 23464 | 23460 | 23466 |
Performance time (in seconds) of k-base encoding assuming a real-word per-base error-ate on 50 bp reads presented in Table 1.