| Literature DB >> 18665273 |
Aron J Fazekas1, Kevin S Burgess, Prasad R Kesanakurti, Sean W Graham, Steven G Newmaster, Brian C Husband, Diana M Percy, Mehrdad Hajibabaei, Spencer C H Barrett.
Abstract
A universal barcode system for land plants would be a valuable resource, with potential utility in fields as diverse as ecology, floristics, law enforcement and industry. However, the application of plant barcoding has been constrained by a lack of consensus regarding the most variable and technically practical DNA region(s). We compared eight candidate plant barcoding regions from the plastome and one from the mitochondrial genome for how well they discriminated the monophyly of 92 species in 32 diverse genera of land plants (N = 251 samples). The plastid markers comprise portions of five coding (rpoB, rpoC1, rbcL, matK and 23S rDNA) and three non-coding (trnH-psbA, atpF-atpH, and psbK-psbI) loci. Our survey included several taxonomically complex groups, and in all cases we examined multiple populations and species. The regions differed in their ability to discriminate species, and in ease of retrieval, in terms of amplification and sequencing success. Single locus resolution ranged from 7% (23S rDNA) to 59% (trnH-psbA) of species with well-supported monophyly. Sequence recovery rates were related primarily to amplification success (85-100% for plastid loci), with matK requiring the greatest effort to achieve reasonable recovery (88% using 10 primer pairs). Several loci (matK, psbK-psbI, trnH-psbA) were problematic for generating fully bidirectional sequences. Setting aside technical issues related to amplification and sequencing, combining the more variable plastid markers provided clear benefits for resolving species, although with diminishing returns, as all combinations assessed using four to seven regions had only marginally different success rates (69-71%; values that were approached by several two- and three-region combinations). This performance plateau may indicate fundamental upper limits on the precision of species discrimination that is possible with DNA barcoding systems that include moderate numbers of plastid markers. Resolution to the contentious debate on plant barcoding should therefore involve increased attention to practical issues related to the ease of sequence recovery, global alignability, and marker redundancy in multilocus plant DNA barcoding systems.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18665273 PMCID: PMC2475660 DOI: 10.1371/journal.pone.0002802
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sequencing success and efficacy for six coding and three non-coding regions.
| Region: |
| 23S rDNA |
|
|
|
|
|
|
|
| Aligned sequence length (bp) | 656 | 363 | 590 | 487 | 607 | 946 | 200–760 | 242–735 | 260–673 |
| Unaligned length; mean (range), including end gaps | 656 (-) | 362.7 (359–363) | 470.5 (429–481) | 487 (-) | 606.8 (588–607) | 735.3 (325–895) | 392.7 (142–699) | 545.5 (240–589) | 403.0 (172–629) |
| Position in | 42–697 (656) | 2091–2453 (363) | 1704–2175 (472) | 1895–2381 (487) | 27–633 (607) | 525–1309 (785) | |||
| No. of species successfully amplified and sequenced | 69 | 90 | 87 | 89 | 92 | 84 | 92 | 88 | 79 |
| No. of samples successfully amplified and sequenced | 170 | 236 | 231 | 238 | 251 | 220 | 249 | 239 | 214 |
| % sequencing success | 72.0 | 100.0 | 92.0 | 94.8 | 100 | 87.6 | 99.2 | 95.2 | 85.3 |
| Total no. primer pairs used | 1 | 1 | 5 | 3 | 2 | 10 | 1 | 1 | 1 |
| Mean number of reads in contig per sample | 2.00 | 2.00 | 2.27 | 2.48 | 2.27 | 2.96 | 2.83 | 2.44 | 2.67 |
| % of sequences that are <80% bidirectional | 1.1 | 0 | 4.7 | 6.9 | 2.4 | 25.5 | 19.7 | 6.3 | 27.1 |
Sequences from the first seven regions were sought for 251 samples representing 92 species. Cox1 and 23S rDNA were attempted for 236 samples and 90 species. The sequence ranges used in the analysis are provided in reference to the complete plastid and mitochondrial genomes of Arabidopsis thaliana (Genbank accessions NC 000932, NC 001284).
Aligned across angiosperms and gymnosperms only.
Aligned across individual genera only.
Based on trimmed alignments for coding regions.
92 species attempted, except 23S rDNA and cox1 (90 species).
251 individuals attempted for all genes except 23S rDNA and cox1 (236 individuals);
Percentage sequencing success (i.e., number of individuals successfully sequenced/number of individuals attempted);
The number of reads represents the mean number of unidirectional sequences from successful amplifications that are required to establish a reliable sequence for each sample;
Sequences with less than 80% bidirectional coverage are primarily due to the presence of homopolymer runs.
Number of species per genus resolved as monophyletic for each of nine candidate barcoding regions.
| Major clade | Genus | No. of species surveyed | Region | ||||||||
|
| 23S rDNA |
|
|
|
|
|
|
| |||
| Angiosperms |
| 5 | 0 | 0 | 0 | 0 | 2 | 3 | 3 | 0 | 0 |
|
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | M | 0 | |
|
| 4 | 0 | 0 | 1 | 0 | 1 | 2 | 2 | 1 | 3 | |
|
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
|
| 2 | 0 | 0 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | NA | NA | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 3 | 3 | 1 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 4 | 0 | 0 | 2 | 1 | 3 | 3 | 3 | 3 | 3 | |
|
| 4 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 1 | |
|
| 3 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | |
|
| 3 | 0 | 0 | 2 | 1 | 2 | 3 | 3 | 3 | 2 | |
|
| 4 | 1 | 0 | 3 | 1 | 3 | 4 | 2 | 4 | 2 | |
|
| 3 | 0 | 0 | 1 | 0 | 1 | 1 | 2 | 1 | 2 | |
|
| 2 | M | 0 | 2 | 0 | 0 | 2 | 2 | 2 | 2 | |
|
| 2 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |
|
| 2 | NA | NA | M | 0 | 0 | M | 0 | 0 | 0 | |
|
| 7 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | |
|
| 2 | M | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
|
| 3 | 0 | 0 | 1 | 0 | 2 | 1 | 3 | 1 | 1 | |
| Gymnosperms |
| 2 | M | 0 | 2 | 2 | 0 | 2 | 2 | 0 | 2 |
|
| 2 | NA | NA | 0 | M | 2 | 2 | 0 | 0 | 2 | |
|
| 3 | 1 | 0 | 2 | 1 | 3 | 3 | 3 | 2 | 2 | |
| Monilophytes |
| 3 | M | 0 | M | 1 | 1 | M | 1 | 1 | 1 |
|
| 2 | M | 0 | M | M | 2 | M | 2 | M | M | |
| Lycophytes |
| 2 | M | 2 | 2 | M | 2 | 2 | 2 | 2 | M |
| Bryophytes |
| 3 | M | 1 | 2 | 2 | 2 | 1 | 3 | 3 | M |
|
| 2 | NA | NA | 2 | 2 | 2 | M | 2 | 2 | M | |
|
| 2 | NA | NA | M | 2 | 2 | M | 2 | 2 | M | |
|
| 2 | M | 0 | 2 | 2 | 2 | M | 2 | 2 | M | |
| No. of species amplified for at least two populations | 58 | 82 | 85 | 88 | 92 | 80 | 92 | 88 | 78 | ||
| Percent species monophyletic (of those amplified and sequenced) | - | 10 | 7 | 43 | 29 | 48 | 56 | 59 | 45 | 44 | |
| Percent species monophyletic (of those attempted) | - | 6.7 | 6.7 | 39.1 | 27.2 | 47.8 | 47.8 | 58.7 | 43.5 | 35.9 | |
Values indicate the number of species for each genus identified as monophyletic with at least 70% bootstrap support. ‘M’ indicates that none of the species in that genus had more than one sample, or that only one species was amplified; ‘NA’: amplifications not attempted in all species for this genus.
Low resolution (1 of 3 species) attributed to partial amplification success in this genus, rather than failure to form a monophyletic group.
Figure 1Relation between sequence variation (PICs = parsimony-informative characters summed across genus-level comparisons) and percentage species resolution (species supported as monophyletic within genera with at least 70% bootstrap support) for a selection of single and multilocus combinations.
The number of regions used per combination is indicated by different symbols and colors (see legend). The specific regions used in each combination are noted in Table S2 (note that the combinations exclude cox1 and 23S rDNA). Circled symbols correspond to combinations proposed in the recent plant barcoding literature (see text): 1) rbcL+trnH-psbA; 2) matK+rpoC1+rpoB; 3) matK+rpoC1+trnH-psbA; 4) matK+atpF–atpH+psbK–psbI; 5) matK+atpF–atpH+trnH-psbA. All regions are from the plastid genome (except cox1; mitochondrial genome).
Figure 2Relation between the number of plastid regions used and mean percentage species resolution (species supported as monophyletic within genera with at least 70% bootstrap support).
Means (±SD) for two to six regions are based on the relatively arbitrarily chosen combinations of regions considered here (note that the plastid 23S rDNA locus and the mitochondrial locus cox1 were not considered in these combinations). Least square regression: R 2 = 0.73; y = 0.52+0.11 ln x = 0.05 (ln x−1.1)2; F 1,43 = 56.8, P<0.0001. Note: cpDNA = plastid DNA.
Number of species per genus resolved as monophyletic for each of five proposed multilocus barcoding combinations.
| Citation for the proposed combination: | Proposed barcoding combinations | ||||||
|
|
|
|
|
| |||
| Kress and Erickson | Chase et al. | Chase et al. | Lee et al. | Lee et al. | |||
| Major clade | Genus | No. of species surveyed | |||||
| Angiosperms |
| 5 | 3 | 3 | 3 | 3 | 3 |
|
| 2 | 0 | 0 | 0 | M | M | |
|
| 4 | 4 | 2 | 4 | 4 | 4 | |
|
| 2 | 0 | 0 | 0 | 0 | 0 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 3 | 2 | 3 | 3 | 3 | 3 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 4 | 4 | 3 | 4 | 4 | 3 | |
|
| 4 | 3 | 1 | 2 | 2 | 3 | |
|
| 3 | 1 | 1 | 1 | 1 | 1 | |
|
| 3 | 3 | 3 | 3 | 3 | 3 | |
|
| 4 | 3 | 4 | 4 | 4 | 4 | |
|
| 3 | 2 | 2 | 2 | 2 | 3 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 6 | 0 | 0 | 0 | 0 | 0 | |
|
| 2 | 0 | M | M | M | M | |
|
| 7 | 0 | 2 | 2 | 1 | 2 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | 0 | 0 | 0 | 0 | 0 | |
|
| 3 | 3 | 2 | 3 | 3 | 3 | |
| Gymnosperms |
| 2 | 2 | 2 | 2 | 2 | 2 |
|
| 2 | 0 | 0 | 0 | 2 | 2 | |
|
| 3 | 3 | 3 | 3 | 3 | 3 | |
| Monilophytes |
| 3 | 1 | 1 | 1 | 1 | 1 |
|
| 2 | 2 | M | 0 | 0 | M | |
| Lycophytes |
| 2 | 2 | 2 | 2 | 2 | 2 |
| Bryophytes |
| 3 | 3 | 3 | 3 | 3 | 3 |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
|
| 2 | 2 | 2 | 2 | 2 | 2 | |
| No. of species amplified for at least two populations | 92 | 90 | 92 | 92 | 90 | ||
| Percent species monophyletic (of those amplified and sequenced) | 64 | 61 | 65 | 66 | 69 | ||
| Percent species monophyletic (of those attempted) | 64 | 60 | 65 | 66 | 67 | ||
Values indicate the number of species for each genus identified as monophyletic with at least 70% bootstrap support for each multi-locus combination; ‘M’ indicates that none of the species in that genus had more than one sample, or that only one species was amplified and sequenced.