| Literature DB >> 23126469 |
James L Kitchen1, Jonathan D Moore, Sarah A Palmer, Robin G Allaby.
Abstract
BACKGROUND: Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23126469 PMCID: PMC3561117 DOI: 10.1186/1471-2105-13-287
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Input file parameters for MCMC-ODPR
| Genome file for BLAST searching | hv_mRNA_PUT.fas |
| Max degeneracy (base pairs) | 3 |
| Minumum melting temperature (centigrade) | 50 |
| Maximum melting temperature (centigrade) | 60 |
| Minimum amplicon length (base pairs) | 50 |
| Maximum amplicon length (base pairs) | 250 |
| Initial overlap (base pairs) | 0 |
| Number of optimisations | 10000 |
| Maximum gap between sequences (base pairs) | 10 |
| Save interim optimizations? | 0 |
| Verbose output? | 0 |
| Cost tolerance | 0 |
| Output file name | mcmc_odpr_results.out |
| Restart from previous run? | 0 |
| Probability of removing redundant primer pairs | 0 |
| Proportion of iterations to be considered as `early' | 0 |
| Weight greedy methods according to optimization? | 1 |
| Remove non-reusable primers from initial design? | 0 |
| Proportion of failed weight check proposals to accept (heating) | 0.2 |
The parameters that may be input by the user into the MCMC-ODPR parameter file are in the left-most column. If the user chooses not to enter one of these parameters, then the default value on the right-most column will be used.
Figure 1Cost results of MCMC-ODPR over 10,000 optimizations. The program was run 1000 times, with the first 100 being run to 1000 iterations, the second 100 run to 2000 iterations and so on until 10,000 iterations was reached. A dataset of 30 FASTA sequences was input, with no SNP processing specified and a melting temperature of 60 degrees centigrade only.
The variance in the number of primers obtained from 1–5x redundancy
| 5x | 14 | 14 | 0.01 | 0 |
| 4x | 867 | 881 | 1.28 | 0.03 |
| 3x | 130 | 1005 | 4.39 | 0.05 |
| 2x | 9 | 1020 | 99.54 | 0.1 |
| 1x | 0 | 1020 | 437.41 | 0.22 |
The algorithm was repeated 1020 times with a small subset of 30 sequences. The maximum achieved redundancy obtained by MCMC-ODPR in each run was recorded and when no primers were obtained at a certain level of redundancy, a zero was added as an observation, ensuring observations at all levels of redundancy summed to 1020. Column 2 gives the number of frequency of primers found at each level of redundancy throughout all 1020 runs when their numbers were greater than none. The average number of primers identified by the algorithm each run and the standard error is presented.
Figure 2Cost of the designed primers over a Tm range. The dataset of 247 FASTA sequences were input into MCMC-ODPR with no SNP processing and 10,000 optimization iterations. The cost of the primers designed by MCMC-ODPR is plotted over a melting temperature range.
Figure 3Coverage of the designed primers over a Tm range. The dataset of 247 FASTA sequences were input into MCMC-ODPR with no SNP processing and 10,000 optimization iterations. The percentage sequence coverage of target amplicons is plotted over a melting temperature range.
Figure 4Coverage of the designed primers over a Tm range. The dataset of 247 FASTA sequences were input into MCMC-ODPR with no SNP processing and 10,000 optimization iterations. The degree of primer reuse generated MCMC-ODPR is plotted over a melting temperature range.
Performance of primer design software for single sequence input
| Dehydrin 9 | Primer3 | 1000 | 4(18) | 80 | 401 | 40.1 | 0.2 |
| | Primer-BLAST | 1000 | 6(12) | 116 | 527 | 52.7 | 0.22 |
| | MCMC-ODPR | 1000 | 9(10) | 197 | 958 | 95.8 | 0.21 |
| Beta-amylase 1 | Primer3 | 3733 | 10(66) | 160 | 691 | 18.51 | 0.23 |
| | Primer-BLAST | 3733 | 14(36) | 334 | 998 | 26.73 | 0.33 |
| | MCMC-ODPR | 3733 | 34(34) | 682 | 3636 | 97.4 | 0.19 |
| C-repeat binding factor 3 like protein | Primer3 | 1515 | 8(26) | 160 | 487 | 32.15 | 0.33 |
| | Primer-BLAST | 1515 | 6(14) | 120 | 473 | 31.22 | 0.25 |
| MCMC-ODPR | 1515 | 14(14) | 274 | 1500 | 99.01 | 0.18 |
MCMC was input with one sequence corresponding to one gene (first column) and run for 10,000 iterations. The cost per bases covered in nucleotides was used as a comparison of performance. MCMC-ODPR performs less optimally with one sequence, yet was able to outperform single sequence programs Primer3 and Primer-BLAST in all examples but one.
Total number of primer sites identified shown in parentheses.
Performance of primer design software for multiple sequence input
| MCMC-ODPR | 1521(1786) | 30103 | 162196 | 77.37 | 0.19 | 637 |
| BatchPrimer3 | 372(1153) | 15427 | 62292 | 29.71 | 0.25 | 2.98 |
| PAMPS | 1205(13598) | 25284 | 39240 | 18.72 | 0.64 | 5.19 |
MCMC-ODPR was run with 184 sequences for 10,000 iterations with the cost per bases covered and runtime was used as a comparison statistic for performance. MCMC_ODPR outperformed multiple sequence input programs BatchPrimer3 and PAMPS in terms of cost per bases covered, yet was clearly slower in terms of runtime.
Total number of primer sites identified shown in parentheses.
* MCMC-ODPR Execution was performed on a 2.26 GHz Intel® Core™ 2 Duo MacBook running Mac OS X version 10.6.8 with 2 GB RAM. Execution time for BatchPrimer 3 was provided by the BatchPrimer3 web server (primer design server 1) found at http://probes.pw.usda.gov/batchprimer3/. Execution for PAMPS was performed on a 2.31 GHz AMD Phenomtm 8650 Triple-Core Processor PC running Windows XP with 3.5 GB of RAM.
MCMC_ODPR execution time with different numbers of sequences input
| 184 | 637 |
| 100 | 343.34 |
| 30 | 228.9 |
Execution time was measured when 184, 100 and 30 sequences were input into the program. Execution was performed on a on a 2.26 GHz Intel® Core™ 2 Duo MacBook running Mac OS X version 10.6.8 with 2 GB RAM.