| Literature DB >> 16145053 |
Dan Tulpan1, Mirela Andronescu, Seo Bong Chang, Michael R Shortreed, Anne Condon, Holger H Hoos, Lloyd M Smith.
Abstract
We describe a new algorithm for design of strand sets, for use in DNA computations or universal microarrays. Our algorithm can design sets that satisfy any of several thermodynamic and combinatorial constraints, which aim to maximize desired hybridizations between strands and their complements, while minimizing undesired cross-hybridizations. To heuristically search for good strand sets, our algorithm uses a conflict-driven stochastic local search approach, which is known to be effective in solving comparable search problems. The PairFold program of Andronescu et al. [M. Andronescu, Z. C. Zhang and A. Condon (2005) J. Mol. Biol., 345, 987-1001; M. Andronescu, R. Aguirre-Hernandez, A. Condon, and H. Hoos (2003) Nucleic Acids Res., 31, 3416-3422.] is used to calculate the minimum free energy of hybridization between two mismatched strands. We describe new thermodynamic measures of the quality of strand sets. With respect to these measures of quality, our algorithm consistently finds, within reasonable time, sets that are significantly better than previously published sets in the literature.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16145053 PMCID: PMC1199561 DOI: 10.1093/nar/gki773
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Control sets used as a basis for comparison with the results obtained from our algorithm
| S1 Braich | S2 Brenner | S3 Faulhammer | S4 Frutos | S5 Penchovsky and Ackermann ( | S6 Random | S7 Shortreed | S8 Shortreed | |
|---|---|---|---|---|---|---|---|---|
| Control set: original constraints | ||||||||
| Word length | 15 | 4 | 15 | 8 | 16 | 16 | 12 | 16 |
| No. of words | 40 | 8 | 20 | 108 | 24 | 24 | 64 | 64 |
| C1 | ✓ | ✓ | ✓ | ✓ | ||||
| ≥4 mismatches | ≥3 mismatches | maximize mismatches | ≥4 mismatches | |||||
| C2 | ✓ | ✓ | ||||||
| ≥4 mismatches | ≥4 mismatches | |||||||
| C3 | ✓ | |||||||
| ≤7 matches | ||||||||
| C4 | ✓ | |||||||
| ≤7 matches | ||||||||
| C5 | ✓ | ✓ | ||||||
| 35.33% | 25.00% | |||||||
| C6 | ✓ | ✓ | ✓ | ✓ | ||||
| CCC | CCC | |||||||
| 5H | CCC (5′,3′ ends) | CC (5′,3′ ends) | CC (5′,3′ ends) | |||||
| C7 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| A,C,T | A,C,T | A,C,T | A,C,T | A,C,T | A,C,T | A,C,T | ||
| T1 | ✓ | |||||||
| Implicit | ✓ | ✓ | ||||||
| T2 | ✓ | ✓ | ✓ | |||||
| T3 | min value: −6.46 ≥ −8.24 | min value: −2.62 ≥ −2.79 | min value: −4.50 ≥ −5.48 | ✓ | ✓ | |||
| Implicit | max value: −0.07 ≥ 0.00 | |||||||
| T5 | ✓ | ✓ | ✓ | ✓ | ||||
| average = 45°C | range of ±1.5°C | 1°C range | 1°C range | |||||
| Control sets: extrapolated constraints | ||||||||
| T1 range (kcal/mol) | [−16.80, −13.57] | [−1.96, −1.01] | [−17.56, −11.63] | [−8.95, −6.50] | [−17.94, −16.84] | [−19.31, −12.74] | [−12.66, −11.78] | [−16.42, −15.45] |
| T2 range (kcal/mol) | [−9.29, −0.22] | [−1.86, 0.00] | [−7.98, −1.29] | [−6.83, 0.00] | [−8.72, −2.26] | [−8.35, −0.86] | [−8.94, −0.93] | [−7.62, −1.10] |
| T3 range (kcal/mol) | [−4.48, 0.00] | [−0.64, 0.00] | [−5.02, 0.00] | [−8.24, 0.00] | [−2.79, 0.00] | [−5.48, −0.00] | [−5.06, 0.00] | [−9.06, 0.00] |
| T5 range (°C) | [46.64, 55.75] | [−67.19, −48.07] | [40.87, 58.20] | [17.06, 27.34] | [55.15, 58.65] | [43.56, 62.23] | [42.38, 43.50] | [51.81, 52.77] |
Each column of the table, other than the leftmost, corresponds to one set. Each column header gives a set ID and, for all but set S6, a reference to the paper in which the set was published. The next two rows report the strand length and the number of strands in each set. Following these, there is one row per type of combinatorial or thermodynamic constraint C1 to T5 (except for T4—see Materials and Methods for details). A check mark in a column indicates that a constraint of the column's type was enforced when designing the set, and further information about the composition of the strands is given where available. The second part of the table describes thermodynamic properties of control sets. For each set, the rows of the table give the free energy ranges corresponding to constraints T1 (desired word–complement interactions), T2 (undesired word–complement interactions) and T3 (undesired complement–complement interactions), respectively and the melting temperature range corresponding to constraint T5. Note that these thermodynamic constraints were not used as design constraints for the control sets, other than those of the accompanying paper of Shortreed et al. (6).
Figure 1Positive free energy gaps of correct and incorrect word–complement pairs for set S5 Penchovsky. The three curves represent (from left to right) the cumulative distribution of the free energy values of all correct word–complement hybrids, of all incorrect word–complement hybrids and of all (incorrect) complement–complement hybrids. The two dots represent the specific values of i and j that determine the free energy gap as defined in Equation 1.
Figure 5Outline of the stochastic local search procedure for DNA strand design.
Comparison of the quality of the control sets and our improved and enlarged sets
| Sets comparison | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Set | Word length | No. of words | δ [δ*] (kcal/mol) | ρ (°C) | Pairwise sensitivity | Pairwise specificity | Pairwise discrimination | CombFold MFE (kcal/mol) | τ [τ*] (kcal/mol) | Run time (CPU s/m/h) |
| S1 Braich | 15 | 40 | 6.25 [4.28] | 9.11 | 0.95 | 1.00 | 354.36 | −1.25 | 4.10 [1.84] | |
| S1-opt | 15 | 40 | 4.97 [3.78] | |||||||
| S1-1 | 15 | 40 | 7.57 [7.55] | 1.79 | 1.00 | 1.00 | 939.63 | −0.09 | 1.72 [1.36] | 2.6 h |
| S1-1-opt | 15 | 40 | 3.93 | |||||||
| S1-2 | 15 | 114 | 6.42 [6.33] | 5.20 | 0.99 | 1.00 | 377.32 | −0.64 | 1.53 [1.53] | 1.8 h |
| S1-2-opt | 15 | 114 | 2.44 | |||||||
| S2 Brenner | 4 | 8 | −0.19 [−0.85] | 19.12 | 0.00 | 1.00 | 0.73 | −3.19 | ||
| S2-1 | 4 | 8 | −0.08 [−0.08] | 13.49 | 0.00 | 1.00 | 1.30 | 0.00 | 4 s | |
| S2-2 | 4 | 13 | −0.47 | 17.23 | 0.00 | 1.00 | 0.82 | −3.04 | 12 s | |
| S3 Faulhammer | 15 | 20 | 6.13 [3.65] | 17.33 | 0.78 | 1.00 | 968.14 | −0.46 | 3.93 [−0.06] | |
| S3-opt | 15 | 20 | 5.43 [0.94] | |||||||
| S3-1 | 15 | 20 | 8.59 [8.46] | 3.81 | 0.99 | 1.00 | 5324.62 | −1.63 | 5.84 [5.32] | 15.6 min |
| S3-1-opt | 15 | 20 | 6.24 [6.16] | |||||||
| S3-2 | 15 | 110 | 6.25 [6.13] | 11.56 | 0.97 | 1.00 | 986.16 | −4.28 | 2.28 | 4.7 h |
| S3-2-opt | 15 | 110 | 3.33 | |||||||
| S4 Frutos | 8 | 108 | −0.21 [−2.22] | 10.28 | 0.00 | 0.95 | 6.25 | −11.19 | ||
| S4-1 | 8 | 108 | 1.28 [0.89] | 6.67 | 0.04 | 0.99 | 7.70 | −11.58 | 4 min | |
| S4-2 | 8 | 173 | 1.59 [0.73] | 5.71 | 0.06 | 0.99 | 15.67 | −13.07 | 1 h | |
| S5 Penchovsky | 16 | 24 | 8.79 [8.12] | 3.50 | 1.00 | 1.00 | 3569.76 | 0.00 | 6.88 [5.78] | |
| S5-opt | 16 | 24 | 7.09 [6.14] | |||||||
| S5-1 | 16 | 24 | 9.27 [9.20] | 1.95 | 1.00 | 1.00 | 5704.45 | 0.00 | 5.83 | 1.5 h |
| S5-1-opt | 16 | 24 | 6.14 | |||||||
| S5-2 | 16 | 44 | 8.90 [8.82] | 3.45 | 1.00 | 1.00 | 4163.51 | 0.00 | 4.57 [4.16] | 17 h |
| S5-2-opt | 16 | 44 | 5.61 [5.20] | |||||||
| S6 Random | 16 | 24 | 6.10 [4.39] | 18.67 | 0.90 | 1.00 | 522.73 | −2.34 | 3.74 [1.90] | 0.01 s |
| S7 Shortreed1 | 12 | 64 | 2.87 [2.84] | 1.11 | 0.79 | 0.97 | 23.20 | −1.03 | 2.91 [2.76] | |
| S7-opt | 12 | 64 | 2.91 [2.76] | |||||||
| S7-1 | 12 | 64 | 3.72 [3.65] | 1.07 | 0.88 | 0.98 | 49.59 | 0.00 | 0.23 [−0.64] | 1.9 h |
| S7-1-opt | 12 | 64 | 0.23 [−0.64] | |||||||
| S7-2 | 12 | 144 | 3.01 [2.85] | 1.11 | 0.79 | 0.97 | 27.74 | −3.94 | 0.00 [−1.83] | 2.7 h |
| S7-2-opt | 12 | 144 | 0.00 [−1.83] | |||||||
| S8 Shortreed2 | 16 | 64 | 6.77 [6.39] | 0.96 | 0.99 | 1.00 | 4739.83 | −5.50 | 5.59 [5.25] | |
| S8-opt | 16 | 64 | 5.59 [5.25] | |||||||
| S8-1 | 16 | 64 | 8.15 [8.09] | 0.94 | 0.99 | 1.00 | 5057.76 | −2.52 | 2.78 [2.74] | 1.7 h |
| S8-1-opt | 16 | 64 | 3.55 [3.49] | |||||||
| S8-2 | 16 | 80 | 7.91 [7.85] | 0.95 | 0.99 | 1.00 | 4093.02 | −4.32 | 3.69 [3.50] | 8.2 h |
| S8-2-opt | 16 | 80 | 3.69 [3.50] | |||||||
Each pair of rows (between delimiting lines) corresponds to one set, and lists quality measures for the control set, the improved set (indicated by the suffix ‘−1’) and the enlarged set (indicated by the suffix ‘−2’). Improvements over the control set are highlighted in boldface. The columns, from the left, give (i) set ID, (ii) the strand length for the set, (iii) the number of strands in the set, (iv) the free energy gaps δ and δ*, (v) the melting temperature interval width, (vi) the pairwise sensitivity, (vii) the pairwise specificity, (vii) the pairwise discrimination, (ix) the minimum free energy value as computed with CombFold v1.0 (19), (x) the minimum free energy gaps τ and τ* for junctions, and (xi) the run time, measured as total CPU time on our reference machine for running the respective experimental protocol until the given set was obtained. (See Materials and Methods for details). In the melting temperature column, the measurements were obtained using a function of the PairFold v1.1 package (18). Si-opt MFE values for junctions have been obtained after optimizing the arrangement of strands in subsets such that τ* is maximized (where τ* is the free energy gap that accounts for junctions, as defined in Equation 20).
Figure 2Correlation between pairwise discrimination values and duplex free energy gaps for sets S5 Penchovsky (control) and S5-1 (improved); each point in the plot corresponds to the discrimination and free energy gap values of a given word and an incorrect complement.
Figure 3Concentration of words, complements, perfect and imperfect matches as a function of duplex free energy gaps for sets (a) S5 Penchovsky (control), and (b) S5-1 (improved); each point in the plot corresponds to the equilibrium concentration of one single strand or duplex.
Figure 4Outline of the stochastic local search procedure used to partition strands into groups for use in DNA computations. A ‘bad junction’ is a junction with MFE lower than v.