| Literature DB >> 21930509 |
Tiayyba Riaz1, Wasim Shehzad, Alain Viari, François Pompanon, Pierre Taberlet, Eric Coissac.
Abstract
Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experimental constraints such as marker length or specifically targeted taxa. The key step of the algorithm is the identification of conserved regions among reference sequences for anchoring primers. We propose an efficient algorithm based on data mining, that allows the analysis of huge sets of sequences. We evaluate the efficiency of ecoPrimers by running it on three different sequence sets: mitochondrial, chloroplast and bacterial genomes. Identified barcode markers correspond either to barcode regions already in use for plants or animals, or to new potential barcodes. Results from empirical experiments carried out on a promising new barcode for analyzing vertebrate diversity fully agree with expectations based on bioinformatics analysis. These tests demonstrate the efficiency of ecoPrimers for inferring new barcodes fitting with diverse experimental contexts. ecoPrimers is available as an open source project at: http://www.grenoble.prabi.fr/trac/ecoPrimers.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21930509 PMCID: PMC3241669 DOI: 10.1093/nar/gkr732
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Strict primer algorithm (SPA) used for finding strict repeats.
Figure 2.Comparison of time and memory usages of the both versions of the SPA. (a) Memory used with respect to the sequences processed without data mining step. Memory used increases rapidly until strict quorum (70%) starts taking effect after 271 (30% of 905) sequences have been processed (b) Same but with data mining step. Only a small number of prefix of 13 bases for primers of length18 bases pass the strict quorum, hence memory used is significantly small. (c) Time required to process the sequences without data mining increases exponentially until strict quorum starts making effect and after that time becomes linear. (d) With the data mining step added, time required becomes linear.
The five best primer pairs proposed by ecoPrimers to amplify potential barcode markers specific of vertebrates
| Primer Name | Sequences | Amplified | Fragment size (bp) | Region | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Direct | Reverse | P1 | P2 | Min | Max | Average | ||||||
| 52.6 | 52.3 | 1221 | 31 | 0.968 | 0.858 | 85 | 117 | 105.38 | 16S RNA | |||
| 12 | 52.3 | 50.7 | 1236 | 7 | 0.980 | 0.720 | 73 | 110 | 98.32 | 12S RNA | ||
| 55.6 | 54.4 | 1256 | 18 | 0.996 | 0.459 | 63 | 84 | 82.03 | 12S RNA | |||
| similar to 16Sr | 56.1 | 52.1 | 1253 | 59 | 0.994 | 0.196 | 53 | 59 | 58.22 | 16S RNA | ||
| 52.1 | 56.1 | 1253 | 35 | 0.994 | 0.195 | 54 | 60 | 57.22 | 16S RNA | |||
16Sr primers were proposed by Palumbi et al. (14) for mammal identification (37). Amplified E and C columns indicate electronically amplified species counts belonging respectively to the vertebrate example set and to the non-vertebrate counterexample set.
The five best primer pairs proposed by ecoPrimers to amplify potential barcode markers specific of eubacteria
| Sequences | Amplified | Fragment size (bp) | Region | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Direct | Reverse | P1 | P2 | Min | Max | Average | ||||
| 60.5 | 60.8 | 603 | 1.00 | 0.927 | 668 | 987 | 699.07 | 16S RNA | ||
| 60.8 | 47.5 | 603 | 1.00 | 0.910 | 392 | 708 | 417.52 | 16S RNA | ||
| 60.8 | 64.9 | 603 | 1.00 | 0.907 | 525 | 844 | 556.49 | 16S RNA | ||
| 61.1 | 64.9 | 603 | 1.00 | 0.842 | 370 | 666 | 380.21 | 16S RNA | ||
| 69.6 | 60.8 | 603 | 1.00 | 0.819 | 128 | 598 | 152.66 | 16S RNA | ||
Amplified E column indicates electronically amplified species count belonging to the Eubacteria data set.
The five best primer pairs proposed by ecoPrimers to amplify potential barcode markers specific of vascular plants
| Primer name | Sequences | Amplified | Fragment size (bp) | Region | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Direct | Reverse | P1 | P2 | Min | Max | Average | |||||
| similar to | 56.1 | 53.5 | 114 | 0.966 | 0.711 | 10 | 90 | 45.65 | |||
| similar to | 52.7 | 58.4 | 114 | 0.966 | 0.658 | 13 | 93 | 48.65 | |||
| similar to | 53.0 | 58.4 | 111 | 0.941 | 0.649 | 20 | 100 | 55.96 | |||
| 41.9 | 48.9 | 116 | 0.983 | 0.647 | 100 | 103 | 100.3 | ||||
| 54.8 | 53.4 | 112 | 0.949 | 0.652 | 17 | 97 | 52.73 | ||||
g/h primers were proposed by Taberlet et al. (15) for vascular plant identification. Amplified E column indicates electronically amplified species count belonging to the vascular plant example set.
Number of vertebrate species exhibiting from 0 to 3 mismatches for forward and reverse 12S-V5 primers
| Number of mismatches | Number of species | |
|---|---|---|
| Forward primer | Reverse primer | |
| 0 | 3272 | 4592 |
| 1 | 2031 | 1021 |
| 2 | 465 | 291 |
| 3 | 158 | 20 |
Count of sequences observed per sample after Solexa sequencing of 4 PCR amplicons
| Feces | |||||
|---|---|---|---|---|---|
| Common leopard | Snow leopard | Leopard cat | |||
| 1 | 2 | ||||
| Predator | Common leopard ( | 2460 | – | – | – |
| Snow leopard ( | – | 10 807 | - | - | |
| Leopard cat ( | – | – | 1982 | 9765 | |
| Prey | Domestic goat ( | 2969 | – | – | – |
| Siberian ibex ( | – | 1256 | – | – | |
| Shrew ( | – | – | – | 964 | |
| Chukar partridge ( | – | – | 1711 | ||
| Muree hill frog ( | – | – | – | 982 | |
Each of them corresponds to one predator feces.