| Literature DB >> 23189198 |
Christopher A Gulvik1, T Chad Effler, Steven W Wilhelm, Alison Buchan.
Abstract
Development and use of primer sets to amplify nucleic acid sequences of interest is fundamental to studies spanning many life science disciplines. As such, the validation of primer sets is essential. Several computer programs have been created to aid in the initial selection of primer sequences that may or may not require multiple nucleotide combinations (i.e., degeneracies). Conversely, validation of primer specificity has remained largely unchanged for several decades, and there are currently few available programs that allows for an evaluation of primers containing degenerate nucleotide bases. To alleviate this gap, we developed the program De-MetaST that performs an in silico amplification using user defined nucleotide sequence dataset(s) and primer sequences that may contain degenerate bases. The program returns an output file that contains the in silico amplicons. When De-MetaST is paired with NCBI's BLAST (De-MetaST-BLAST), the program also returns the top 10 nr NCBI database hits for each recovered in silico amplicon. While the original motivation for development of this search tool was degenerate primer validation using the wealth of nucleotide sequences available in environmental metagenome and metatranscriptome databases, this search tool has potential utility in many data mining applications.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23189198 PMCID: PMC3506598 DOI: 10.1371/journal.pone.0050362
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1De-MetaST transformation of nucleotide sequences into a binary representation.
The binary representation for each of the 16 possible nucleotide character inputs is shown in the upper box. The lower box provides an example of the transformation using a mock primer sequence. Spaced gaps are shown for instructional purposes and do not occur in the De-MetaST search routine.
Figure 2Flowchart outlining De-MetaST-BLAST user actions and corresponding computational processes.
Fwd, Forward; Rev, Reverse; NCBI, National Center for Biotechnology Information.
boxB and 16S rRNA gene in silico amplicons identified in representative metagenomes using De-MetaST-BLAST.
| CAMERA Metagenome Database Queried |
| Unique | 16S rRNA gene | Unique 16S rDNA reads | Database Size [Mbp] | Number of Reads | Average Read Length [bp] | Sequencing Method(s) |
| CAM_PROJ_FarmSoil.read.fa | 2 | 2 | 6 | 6 | 155 | 1.38E+05 | 1117 | dideoxysequencing (Sanger) |
| CAM_PROJ_GOS.read.fa | 100 | 86 | 3710 | 965 | 11598 | 1.36E+07 | 915 | dideoxysequencing (Sanger) and pyrosequencing (454) |
| CAM_PROJ_AntarcticaAquatic.read.fa | 44 | 43 | 4758 | 1665 | 23819 | 6.46E+07 | 369 | dideoxysequencing (Sanger) and pyrosequencing (454) |
The primers boxB171F (5′ CARGGNGAYACNGARCC 3′) and boxB265R (5′ YTTNCCRTCNCKRTCNGT 3′) were used to target an approximately 300 bp region of boxB.
Unique reads were identified using MOTHUR (v.1.27.0) [46].
The primers 358f (5′ CCTACGGGAGGCAGCAG 3′) and 517r (5′ ATTACCGCGGCTGCTGG 3′) [47] were used to target an approximately 190 bp amplicon in the 16S rRNA gene.
Average read length was estimated by dividing the database size by number of reads. The AntarcticaAquatic database is dominated by pyrosequencing derived reads (98% of all reads), while the GOS dataset is dominated by Sanger derived reads; the exact distribution for GOS reads is not available.
Figure 3Example of De-MetaST-BLAST output.
Text within the box denotes the spreadsheet output for a boxB primer set search against the WASECA Farm Soil Metagenome (AAFX01000000) [41] that recovers two in silico amplicons. Column descriptors are shown in color; select columns have been truncated due to space constraints. For the “excision info” column, the first alphanumeric character reports the “hit” number within a read (i.e. “1” indicates it is the first in silico amplicon found within a single read). The subsequent alphanumeric characters denote the primer orientation yielding the amplicon (F = forward, R = reverse). Whether a unique read identifier is returned is contingent upon the database itself.
Runtime duration of De-MetaST.
| Files Input | Database size [Mbytes] | Sequences indatabase [*105] | Nucleotides inDatabase [Mbp] | Hits | Real Time [s] | User Time [s] | System Time [s] |
| 1 | 206.1 | 1.4 | 154 | 2 | 11.7 | 11.7 | 0.02 |
| 2 | 412.2 | 2.8 | 309 | 4 | 23.5 | 23.4 | 0.05 |
| 3 | 618.3 | 4.2 | 463 | 6 | 35.2 | 35.1 | 0.07 |
| 4 | 824.4 | 5.5 | 618 | 8 | 47.6 | 47.5 | 0.10 |
| 5 | 1030.5 | 7.0 | 772 | 10 | 58.6 | 58.5 | 0.12 |
| 1 | 206.1 | 1.4 | 154 | 4 | 11.9 | 11.9 | 0.02 |
| 2 | 412.2 | 2.8 | 309 | 8 | 23.3 | 23.3 | 0.05 |
| 3 | 618.3 | 4.2 | 463 | 12 | 35.6 | 35.5 | 0.08 |
| 4 | 824.4 | 5.5 | 618 | 16 | 47.3 | 47.1 | 0.10 |
| 5 | 1030.5 | 7.0 | 772 | 20 | 58.2 | 58.0 | 0.12 |
The datasets used for benchmarking were manipulations of the Waseca Farm Soil metagenome (AAFX01000000); the average sequence read length in these datasets is 1117 bp.