| Literature DB >> 19528077 |
Tobias Mann1, Richard Humbert, Michael Dorschner, John Stamatoyannopoulos, William Stafford Noble.
Abstract
We developed a primer design method, Pythia, in which state of the art DNA binding affinity computations are directly integrated into the primer design process. We use chemical reaction equilibrium analysis to integrate multiple binding energy calculations into a conservative measure of polymerase chain reaction (PCR) efficiency, and a precomputed index on genomic sequences to evaluate primer specificity. We show that Pythia can design primers with success rates comparable with those of current methods, but yields much higher coverage in difficult genomic regions. For example, in RepeatMasked sequences in the human genome, Pythia achieved a median coverage of 89% as compared with a median coverage of 51% for Primer3. For parameter settings yielding sensitivities of 81%, our method has a recall of 97%, compared with the Primer3 recall of 48%. Because our primer design approach is based on the chemistry of DNA interactions, it has fewer and more physically meaningful parameters than current methods, and is therefore easier to adjust to specific experimental requirements. Our software is freely available at http://pythia.sourceforge.net.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19528077 PMCID: PMC2715258 DOI: 10.1093/nar/gkp443
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Species accounted for in primer feasibility analysis. The solid line is the top strand of the template;the dashed line is the bottom strand of the template; the arrow with the square end is the left primer; the arrow with the round end is the right primer; three dashed lines indicate binding (or folding) via hydrogen bonding. (A) Desired binding interactions. High rates of binding are desired between the primers and the template priming regions. (B) Undesired binding and folding reactions. Primers should not fold, dimerize or bind to the target outside of the priming regions.
Figure 2.Flowchart of the Pythia algorithm. Inputs are the genomic sequence, locus coordinates and user-specified parameters. In Step 1, Pythia identifies all primer pairs meeting the user-specified requirements and sorts these primer pairs by the sum of the differences between the computed and target primer melting temperatures. In Step 2, Pythia computes the thermodynamic quality metric for the top ranked candidate. If this candidate meets a user-specified metric threshold, then Pythia proceeds to Step 3. If not, the top ranked candidate is removed from the list and Pythia returns to Step 2. In Step 3, Pythia performs a specificity check. If the primer passes the specificity check, it is given to the user, and the program terminates. If not, the top ranked candidate is removed from the list and Pythia returns to Step 2.
Training set sizes
| Threshold | Dataset | ROC score |
|---|---|---|
| size | ||
| 0.8 | 642 | 0.9995 |
| 0.85 | 1474 | 0.9986 |
| 0.9 | 3056 | 0.9951 |
| 0.95 | 10 498 | 0.9937 |
The number of training points for each acceptability threshold. For each threshold, we show the number of examples used to train the SVM, and the ROC and ROC50 scores. We assessed SVM performance using 5-fold cross-validation
Concordance between gel and melting curves
| Gel label | Valid melting curve | Invalid melting curve | ||
|---|---|---|---|---|
| Stringent | Permissive | Stringent | Permissive | |
| Clean | 172 | 199 | 33 | 41 |
| Not clean | 38 | 11 | 16 | 8 |
For a selected set of PCR primers, we compared the results of melting curve analysis to agarose gel analysis of PCR amplicon. Melting curves were classified as valid or invalid based on melting curve morphology, and gel lanes were classified as clean or not clean at two levels of stringency. In each table entry, the numbers correspond to the number of reactions with the corresponding gel and melting curve label at stringent and permissive levels of gel scoring stringency
Genomic characteristics of selected human genome regions
| Region | Chromosome | Interval | Interval | Length | Description |
|---|---|---|---|---|---|
| start | stop | (Kb) | |||
| 1 | 16 | 147 000 | 164 000 | 17 | High GC |
| content | |||||
| 2 | 16 | 181 000 | 215 000 | 34 | Repetitive |
| 3 | 11 | 5 252 000 | 5 277 000 | 25 | Typical |
We compared the ability of Pythia to the ability of the P316 algorithm to tile these regions. We show the location, size and a brief description of each locus
Primer design performance for selected human regions
| Region | P316 | P316 | Pythia | Pythia | ||
|---|---|---|---|---|---|---|
| PCRs | success rate | PCRs | success rate | |||
| Permissive (%) | Stringent (%) | Permissive (%) | Stringent (%) | |||
| 1 | 49 | 94 | 80 | 41 | 94 | 81 |
| 2 | 93 | 94 | 81 | 102 | 94 | 81 |
| 3 | 63 | 92 | 78 | 43 | 94 | 81 |
Shown are the number of PCRs and the extrapolated success rates for permissive and stringent criteria
Figure 3.Primer pair design coverages for interspersed repeat regions. Design coverage is defined as the fraction of an interval covered by PCR product sequences. (A) Histogram of coverages for Pythia (mean 80%). (B) Histogram of coverages for P316 (mean 50%).
Primer design method acceptability assessments
| Pythia/P316 evaluation | Melting curve | |
|---|---|---|
| Valid | Invalid | |
| Acceptable | 276/17 | 15/3 |
| Unacceptable | 17/322 | 3/39 |
We show Pythia acceptability assessment of P316 primers and P316 acceptability assessment of Pythia primers. We assessed the ability of the Pythia primer pair quality metric to predict the quality of the P316 primers and vice versa. The first number in each cell shows the Pythia assessment of P316 primers, and the second number shows the P316 assessment of the Pythia primers. For example, 276 primer pairs designed by P316 were acceptable to Pythia and had a valid melting curve, whereas only 17 of the primer pairs designed by Pythia were acceptable to the P316 program and had a valid melting curve