Literature DB >> 21349862

PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers.

William A Walters1, J Gregory Caporaso, Christian L Lauber, Donna Berg-Lyons, Noah Fierer, Rob Knight.   

Abstract

MOTIVATION: PCR amplification of DNA is a key preliminary step in many applications of high-throughput sequencing technologies, yet design of novel barcoded primers and taxonomic analysis of novel or existing primers remains a challenging task.
RESULTS: PrimerProspector is an open-source software package that allows researchers to develop new primers from collections of sequences and to evaluate existing primers in the context of taxonomic data. AVAILABILITY: PrimerProspector is open-source software available at http://pprospector.sourceforge.net CONTACT: rob.knight@colorado.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21349862      PMCID: PMC3072552          DOI: 10.1093/bioinformatics/btr087

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Using next-generation sequencing methods to characterize hundreds of samples simultaneously in a single sequencing run has revolutionized microbial ecology (Tringe and Hugenholtz, 2008). However, primer design for such studies remains challenging. The primers must amplify an appropriate region of DNA that is the right length for sequencing and also taxonomically informative (Liu ; Wang ); a linker that is not complementary to the target in any one of many diverse species must be inserted before the barcode to avoid differential amplification (Hamady ); and the set of barcodes must be checked to avoid formation of secondary structure within or between primers (i.e. primer-dimers) or between the barcodes and the primers. Additionally, the techniques need to be generic rather than tied to one taxonomic outline or database, so that many different target genes can be studied. Here we present PrimerProspector, an open-source software package for primer design and analysis built using the PyCogent toolkit (Knight ), that resolves these issues. We recently applied PrimerProspector to identify the 16S rRNA 515f/806r primer pair as nearly universal to archaea and bacteria, and to optimize this primer pair for increased sensitivity across these domains. This optimized primer pair, applied successfully in several recent studies (Bates ; Caporaso ; G.Bergmann et al., manuscript in preparation), has provided novel insight into archaeal and bacterial community membership in soils by allowing for more accurate determination of the abundances of taxa missed by many commonly used canonical primer pairs, e.g. the Verrucomicrobia. No existing tools specifically address the issues associated with designing barcoded polymerase chain reaction (PCR) primers for community analysis. Primer design is a large field and we cannot survey it comprehensively in this article, but among a selection of related tools, Primer Validator (http://bioinfo.unice.fr/454) allows taxonomic assessment but does not generate de novo primers, or allow a customizable 3′ weighted scoring system to predict successful amplification of tested primers. BarCrawl (Frank, 2009) allows design of barcodes for specified PCR primers but not design of the primers themselves, so is a useful complement to PrimerProspector. RDP's Probe Match (Cole ) will report sequences matching a probe, as does Greengenes' probe function (DeSantis ), but these tools are tied to the respective 16S rRNA databases and do not have support for barcodes. Primrose and OligoCheck (Ashelford ) are useful for small numbers of target sequences, but do not scale well to thousands or tens of thousands of sequences, as is necessary when designing universal or near-universal primers, and do not incorporate differential weighting of 5′ and 3′ bases in primer scoring. Primer BLAST uses Primer3 software to build primers of a specified length against one target sequence, and then BLASTs the results against other databases to ensure that putative primers do not target BLAST hits. This functionality is also a useful complement to that provided in PrimerProspector. While applications of PrimerProspector to date have focused on SSU rRNA primer design, PrimerProspector can be used for any nucleic acid sequences and allows users to design de novo primers based upon arbitrary multiple sequence alignments. User-specifiable design parameters include primer length, degeneracy and targeted regions for generation of primers. Existing or de novo primers can be analyzed for predicted taxonomic coverage, as shown in Figure 1. Finally, common pitfalls in primer design can be identified, such as likely barcode-primer secondary structure, regions susceptible to primer dimerization and disparate GC content between primer pairs. Convenient reports show amplicons or simulated reads that cover regions of sequences that are not phylogenetically informative or are of unsuitable lengths for sequencing.
Fig. 1.

Taxonomic coverage summary of the 515f/806r 16S SSU rRNA primer pair at the phylum level for (A) archaea, (B) eukarya and (C) bacteria. The y-axes represent percent coverage and the value on top of each bar is the total number of reference sequences in each taxon. In this analysis, the reference sequences were derived from the Silva database, and filtered at 97% sequence identity with uclust (Edgar, 2010). Archaeal and bacterial sequences shorter than 1450 bases, and eukaryotic sequences less that 1800 bases, were excluded from the reference set. As illustrated, this primer pair is nearly universal for archaeal and bacterial 16S but is generally poor for eukaryotic (notably metazoan) 18S sequences. This plot and additional PrimerProspector analyses informed the decision to use this primer pair in Caporaso ), Bates ) and G.Bergmann et al. (2010). Comparisons with the unoptimized primer pair and with an alternative popular pair (27f/338r) are shown as Supplementary Figures S1 and S2, respectively.

Taxonomic coverage summary of the 515f/806r 16S SSU rRNA primer pair at the phylum level for (A) archaea, (B) eukarya and (C) bacteria. The y-axes represent percent coverage and the value on top of each bar is the total number of reference sequences in each taxon. In this analysis, the reference sequences were derived from the Silva database, and filtered at 97% sequence identity with uclust (Edgar, 2010). Archaeal and bacterial sequences shorter than 1450 bases, and eukaryotic sequences less that 1800 bases, were excluded from the reference set. As illustrated, this primer pair is nearly universal for archaeal and bacterial 16S but is generally poor for eukaryotic (notably metazoan) 18S sequences. This plot and additional PrimerProspector analyses informed the decision to use this primer pair in Caporaso ), Bates ) and G.Bergmann et al. (2010). Comparisons with the unoptimized primer pair and with an alternative popular pair (27f/338r) are shown as Supplementary Figures S1 and S2, respectively.

2 METHODS

De novo design of primers is performed by finding short conserved sequences in a given multiple sequence alignment to act as a 3′ binding site for new primers. Once these sites have been identified, full-length forward or reverse de novo primers are generated by incorporating the N upstream or downstream bases, where N is 15 by default. De novo full-length primers can then be sorted according to sensitivity, specificity or degeneracy, and compared with known primers to find matches or significant overlap. Specificity for particular target groups, such as archaea, can be obtained by supplying an optional alignment of sequences from which to exclude matches. Primer analyses, including the prediction of taxonomic coverage, rely upon scoring primers against target sequences. To predict its taxonomic coverage, a primer is locally aligned to full-length target sequences with known taxonomies, and scored based on gap, 3′ mismatch and non-3′ mismatch counts. An example of the graphical output is provided in Supplementary Figure S3. The final five bases are considered to be the 3′ region by default, and are considered to be the most important for PCR amplification. The scoring scheme is parameterizable. The RDP Classifier (Wang ) is used to classify the resulting sequence fragments, and the accuracy is displayed both in terms of which taxa are amplified and in terms of classification level of the resulting fragments. PrimerProspector supports retraining of the RDP Classifier for taxa coverage analysis based on different reference taxonomies. Descriptions of the scripts included in PrimerProspector, the various outputs generated by PrimerProspector and an example based on the F515/R806 primer pair are included in the online documentation at http://pprospector.sourceforge.net/.

3 CONCLUSIONS

PCR amplification continues to be a key step in many high-throughput sequencing applications such as barcoded marker gene-based microbial community analyses. PrimerProspector represents a significant advance over prior work in this area by providing a single tool to facilitate primer design and analysis, including support for barcodes (and associated linkers). PrimerProspector is a fast and extensible framework for primer design and analysis, and has already been successfully applied to help researchers identify the most relevant and useful primers for their application, starting with multiple sequence alignments for any nucleic acid sequence. Funding: Bill and Melinda Gates foundation; Crohn's and Colitis foundation of America; Howard Hughes Medical Institute; National Institutes of Health Signaling and Cell Cycle Regulation Training Grant (T32GM008759) in part. Conflict of Interest: none declared.
  12 in total

1.  PRIMROSE: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the RDP-II database.

Authors:  Kevin E Ashelford; Andrew J Weightman; John C Fry
Journal:  Nucleic Acids Res       Date:  2002-08-01       Impact factor: 16.971

2.  Search and clustering orders of magnitude faster than BLAST.

Authors:  Robert C Edgar
Journal:  Bioinformatics       Date:  2010-08-12       Impact factor: 6.937

3.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

Authors:  J Gregory Caporaso; Christian L Lauber; William A Walters; Donna Berg-Lyons; Catherine A Lozupone; Peter J Turnbaugh; Noah Fierer; Rob Knight
Journal:  Proc Natl Acad Sci U S A       Date:  2010-06-03       Impact factor: 11.205

4.  Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors:  T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal:  Appl Environ Microbiol       Date:  2006-07       Impact factor: 4.792

5.  Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

Authors:  Qiong Wang; George M Garrity; James M Tiedje; James R Cole
Journal:  Appl Environ Microbiol       Date:  2007-06-22       Impact factor: 4.792

6.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.

Authors:  Micah Hamady; Jeffrey J Walker; J Kirk Harris; Nicholas J Gold; Rob Knight
Journal:  Nat Methods       Date:  2008-02-10       Impact factor: 28.547

Review 7.  A renaissance for the pioneering 16S rRNA gene.

Authors:  Susannah G Tringe; Philip Hugenholtz
Journal:  Curr Opin Microbiol       Date:  2008-10-08       Impact factor: 7.934

8.  Examining the global distribution of dominant archaeal populations in soil.

Authors:  Scott T Bates; Donna Berg-Lyons; J Gregory Caporaso; William A Walters; Rob Knight; Noah Fierer
Journal:  ISME J       Date:  2010-11-18       Impact factor: 10.302

9.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis.

Authors:  J R Cole; B Chai; R J Farris; Q Wang; S A Kulam; D M McGarrell; G M Garrity; J M Tiedje
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  PyCogent: a toolkit for making sense from sequence.

Authors:  Rob Knight; Peter Maxwell; Amanda Birmingham; Jason Carnes; J Gregory Caporaso; Brett C Easton; Michael Eaton; Micah Hamady; Helen Lindsay; Zongzhi Liu; Catherine Lozupone; Daniel McDonald; Michael Robeson; Raymond Sammut; Sandra Smit; Matthew J Wakefield; Jeremy Widmann; Shandy Wikman; Stephanie Wilson; Hua Ying; Gavin A Huttley
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  157 in total

1.  Microbial life in Bourlyashchy, the hottest thermal pool of Uzon Caldera, Kamchatka.

Authors:  Nikolay A Chernyh; Andrey V Mardanov; Vadim M Gumerov; Margarita L Miroshnichenko; Alexander V Lebedinsky; Alexander Y Merkel; Douglas Crowe; Nikolay V Pimenov; Igor I Rusanov; Nikolay V Ravin; Mary Ann Moran; Elizaveta A Bonch-Osmolovskaya
Journal:  Extremophiles       Date:  2015-09-08       Impact factor: 2.395

2.  Exoelectrogenic capacity of host microbiota predicts lymphocyte recruitment to the gut.

Authors:  Aaron Conrad Ericsson; Daniel John Davis; Craig Lawrence Franklin; Catherine Elizabeth Hagan
Journal:  Physiol Genomics       Date:  2015-04-07       Impact factor: 3.107

3.  Obese Mice Losing Weight Due to trans-10,cis-12 Conjugated Linoleic Acid Supplementation or Food Restriction Harbor Distinct Gut Microbiota.

Authors:  Laura J den Hartigh; Zhan Gao; Leela Goodspeed; Shari Wang; Arun K Das; Charles F Burant; Alan Chait; Martin J Blaser
Journal:  J Nutr       Date:  2018-04-01       Impact factor: 4.798

4.  Unusual biology across a group comprising more than 15% of domain Bacteria.

Authors:  Christopher T Brown; Laura A Hug; Brian C Thomas; Itai Sharon; Cindy J Castelle; Andrea Singh; Michael J Wilkins; Kelly C Wrighton; Kenneth H Williams; Jillian F Banfield
Journal:  Nature       Date:  2015-06-15       Impact factor: 49.962

5.  Different Amplicon Targets for Sequencing-Based Studies of Fungal Diversity.

Authors:  Francesca De Filippis; Manolo Laiola; Giuseppe Blaiotta; Danilo Ercolini
Journal:  Appl Environ Microbiol       Date:  2017-08-17       Impact factor: 4.792

6.  Comparison of Collection Methods for Fecal Samples in Microbiome Studies.

Authors:  Emily Vogtmann; Jun Chen; Amnon Amir; Jianxin Shi; Christian C Abnet; Heidi Nelson; Rob Knight; Nicholas Chia; Rashmi Sinha
Journal:  Am J Epidemiol       Date:  2016-12-16       Impact factor: 4.897

7.  Linking patterns of net community production and marine microbial community structure in the western North Atlantic.

Authors:  Seaver Wang; Yajuan Lin; Scott Gifford; Rachel Eveleth; Nicolas Cassar
Journal:  ISME J       Date:  2018-05-01       Impact factor: 10.302

8.  A multi-amplicon 16S rRNA sequencing and analysis method for improved taxonomic profiling of bacterial communities.

Authors:  Andrew E Schriefer; Paul F Cliften; Matthew C Hibberd; Christopher Sawyer; Victoria Brown-Kennerly; Lauren Burcea; Elliott Klotz; Seth D Crosby; Jeffrey I Gordon; Richard D Head
Journal:  J Microbiol Methods       Date:  2018-09-29       Impact factor: 2.363

Review 9.  Ancient and modern environmental DNA.

Authors:  Mikkel Winther Pedersen; Søren Overballe-Petersen; Luca Ermini; Clio Der Sarkissian; James Haile; Micaela Hellstrom; Johan Spens; Philip Francis Thomsen; Kristine Bohmann; Enrico Cappellini; Ida Bærholm Schnell; Nathan A Wales; Christian Carøe; Paula F Campos; Astrid M Z Schmidt; M Thomas P Gilbert; Anders J Hansen; Ludovic Orlando; Eske Willerslev
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-01-19       Impact factor: 6.237

10.  Effects of exposure to bisphenol A and ethinyl estradiol on the gut microbiota of parents and their offspring in a rodent model.

Authors:  Angela B Javurek; William G Spollen; Sarah A Johnson; Nathan J Bivens; Karen H Bromert; Scott A Givan; Cheryl S Rosenfeld
Journal:  Gut Microbes       Date:  2016-09-13
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.