| Literature DB >> 18515842 |
Jason D Gans1, Murray Wolinsky.
Abstract
Nucleic acid-based biochemical assays are crucial to modern biology. Key applications, such as detection of bacterial, viral and fungal pathogens, require detailed knowledge of assay sensitivity and specificity to obtain reliable results. Improved methods to predict assay performance are needed for exploiting the exponentially growing amount of DNA sequence data and for reducing the experimental effort required to develop robust detection assays. Toward this goal, we present an algorithm for the calculation of sequence similarity based on DNA thermodynamics. In our approach, search queries consist of one to three oligonucleotide sequences representing either a hybridization probe, a pair of Padlock probes or a pair of PCR primers with an optional TaqMantrade mark probe (i.e. in silico or 'virtual' PCR). Matches are reported if the query and target satisfy both the thermodynamics of the assay (binding at a specified hybridization temperature and/or change in free energy) and the relevant biological constraints (assay sequences binding to the correct target duplex strands in the required orientations). The sensitivity and specificity of our method is evaluated by comparing predicted to known sequence tagged sites in the human genome. Free energy is shown to be a more sensitive and specific match criterion than hybridization temperature.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18515842 PMCID: PMC2475610 DOI: 10.1093/nar/gkn301
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Optimized surrogate function parameters for computing thermodynamic alignments
The bases W, X, Y and Z represent any of the four bases (A, T, G or C), with the constraint that neither X and Y nor W and Z form a Watson and Crick base pair. The ‘–’ symbol represents a gap. Each parameter is a linear function of temperature: ΔH − TΔS. While the ΔS and ΔH values have units of entropy and enthalpy, they are for alignment purposes only.
Figure 1.The temperature-dependent thermodynamic alignment approximation error evaluated using randomly generated test sets with (A) 25% G + C content, (B) 50% G + C content and (C) 75% G + C content. The solid black lines show the average error in the free energy computed using only the non-optimized dynamic programming parameters of Leber et al. The dotted red lines show the average error in the two-stage free energy computed by first producing an alignment using the dynamic programming parameters of Leber et al. and then evaluating the free energy using the full duplex free-energy function of SantaLucia. Finally, the solid red lines show the average error in the two-stage free energy computed by first producing an alignment using optimized dynamic programming parameters and then evaluating the free energy using the full duplex free-energy function of SantaLucia. For all curves, the approximation error, ΔΔG, is defined as the approximate ΔG minus the exact ΔG computed by the DINAmelt server (14). The standard deviations about each point in all graphs are less than or equal to 1.5 × 10−2 kcal/mol. Each randomly generated test set of 104, 25-base sequence pairs (containing insertions, deletions and mutations) was pre-screened to insure that no sequence pairs in the test set were also in the training set used to optimize the dynamic programming parameters.
Figure 2.Comparing the time to compute single stage (fixed temperature, dynamic programming only) and two-stage thermodynamic alignments to the time to compute O(N3) exact alignments. Speedup is computed as the average of the wall clock time required to compute an exact O(N3) alignment divided by the wall clock time required to compute either a one or two-stage thermodynamic alignment. Wall clock time was computed using the Unix time command to measure the run time required to align 105 random duplexes (containing insertions, deletions and mutations) of the specified approximate length (insertions and deletions can change the duplex length). The exact alignments were performed using the hybrid-min program that is included with the DINAMelt server (UNAfold) package (14). Error bars were computed by averaging the evaluation times of 10 different randomly generated sets of duplexes for each duplex length.
Figure 3.Comparing e-PCR and ThermonucleotideBLAST in STS search specificity and sensitivity. Comparisons were made by searching a set of 2185 STS primer pairs against the complete human genome (build 35, version 1). Specificity is defined as TP/(TP + FP) and sensitivity is defined as TP/(TP + FN), where TP is the number of true positives, FP is the number of false positives and FN is the number of false negatives. The e-PCR was run using W = 12 (word size) and M = 200 (allowed deviation from the expected amplicon size) and discontiguous words (DW), as opposed to contiguous words (CW), were activated with F = 3, as described in ref. (1). In order of increasing sensitivity, the plotted e-PCR points were computed using the following parameters: (CW, g = 0, n = 0), (CW, g = 0, n = 1), (CW, g = 1, n = 1), (DW, g = 0, n = 1), (DW, g = 1, n = 1), (DW, g = 2, n = 1), (DW, g = 1, n = 2) and (DW, g = 2, n = 2), where g is the number of allowed gaps and n is the number of allowed mismatches within the 12 base word. ThermonucleotideBLAST was run using single-primer-PCR = False (to disable searching for PCR amplicons produced by a single amplicon), W = 7 (requiring an exact match of seven bases to initiate a thermodynamic alignment), s = 0.05 (salt concentration in M), t = 9 × 10−7 (strand concentration in M), l = 1000 (maximum allowed amplicon size) and dangle5 = False and dangle3 = False (to disable the use of dangling end bases at both ends of a thermodynamic alignment). Each of the five graphs shows the effects of requiring an increasing the number of exact matches at the 3′ end of each PCR primer—however, the same e-PCR curve is reproduced in each graph. To compare the specificity and sensitivity of different match criteria, ThermonucleotideBLAST searches were performed using ΔG (solid black curve), TM at 37°C (solid red curve) and TM computed using the Dinkelbach algorithm (solid blue curve).