| Literature DB >> 23565134 |
Ingrid Parmentier1, Jérôme Duminil, Maria Kuzmina, Morgane Philippe, Duncan W Thomas, David Kenfack, George B Chuyong, Corinne Cruaud, Olivier J Hardy.
Abstract
BACKGROUND: DNA barcoding of rain forest trees could potentially help biologists identify species and discover new ones. However, DNA barcodes cannot always distinguish between closely related species, and the size and completeness of barcode databases are key parameters for their successful application. We test the ability of rbcL, matK and trnH-psbA plastid DNA markers to identify rain forest trees at two sites in Atlantic central Africa under the assumption that a database is exhaustive in terms of species content, but not necessarily in terms of haplotype diversity within species. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2013 PMID: 23565134 PMCID: PMC3615068 DOI: 10.1371/journal.pone.0054921
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sequencing success and intra-specific sequence polymorphism of trees from two African rain forests for rbcL, matK and trnH-psbA.
|
|
|
| |
| Korup National Park (Cameroon) – 272 sp. | |||
| N ind. tested: | 708 | 620 | 772 |
| Final sequencing success: N ind. (% ind.) | 595 (84%) | 397 (64%) | 618 (80%) |
| Final sequencing success: N sp. (% sp.) | 266 (98%) | 230 (85%) | 264 (97%) |
| Sequencing success at first trial (% ind./% sp.) | 77%/94% | 48%/63% | 71%/92% |
| N ind. per sp.: mean ± SD (min. – max.) | 2.2±0.7 (1–4) | 1.7±0.7 (1–4) | 2.3±0.8 (1–4) |
| N sp. with sequences for ≥2 samples | 237 | 139 | 226 |
| Number of species with several haplotypes: | |||
| - all samples per marker: N sp. (% sp.) | 5 (2%)a | 7 (5%)a | 42 (19%)b |
| - 219 samples from 102 sp. with seq. for the 3 barcodes and ≥2 ind./sp.: N sp. (% sp.) | 3 (3%)a | 4 (5%)a | 28 (27%)b |
| Korup National Park (Cameroon) and Monts de Cristal (Gabon) – 24 shared sp. | |||
| N sp. with good seq. in both forests | 23 | 12 | 13 |
| Species with several haplotypes: N sp. (% sp.) | 6 (26%)a | 1 (8%)a | 9 (70%)b |
sp.: species, ind.: individuals, seq.: sequences, N: number.
Shared superscript letters indicate markers that do not differ significantly in the proportion of species with several haplotypes (χ2 tests).
Barcoding success of African rain forest trees at a local scale for species identification (a) and genus identification (b).
| (a) Species identification | Correct (%) | Multiple/Wrong (%) | Query samples | Barcoding database | ||||||
| GD | PI | GD | PI | N. ind. | N. sp. | N. ge. | N. ind. | N. sp. | N. ge. | |
| All samples with good quality sequences | ||||||||||
|
| 71.9a | 71.2a | 28.0/0.2 | 25.7/3.2 | 565 | 237 | 88 | 594 | 266 | 161 |
|
| 76.5 a | 75.5a | 22.2/1.3 | 17.6/6.9 | 306 | 139 | 100 | 396 | 230 | 145 |
|
| / | 84.3b | / | 12.6/3.1 | 579 | 226 | 144 | 617 | 264 | 157 |
| Samples with good quality sequences available for the 3 markers | ||||||||||
|
| 72.6a | 71.7a | 27.4/0 | 26.9/1.4 | 219 | 102 | 72 | 325 | 211 | 141 |
|
| 74.9a | 74.9ab | 22.8/2.3 | 15.1/10 | 219 | 102 | 72 | 325 | 211 | 141 |
|
| / | 80.8bc | / | 14.6/4.6 | 219 | 102 | 72 | 325 | 211 | 141 |
|
| 83.1b | 79.5abc | 15.1/1.8 | 16.4/4.1 | 219 | 102 | 72 | 325 | 211 | 141 |
|
| / | 85.6bcd | / | 9.6/4.6 | 219 | 102 | 72 | 325 | 211 | 141 |
|
| / | 85.8cd | / | 7.8/6.4 | 219 | 102 | 72 | 325 | 211 | 141 |
|
| / | 87.7d | / | 5/7.3 | 219 | 102 | 72 | 325 | 211 | 141 |
Shared superscript letters indicate which pDNA regions or combinations of pDNA regions do not differ significantly in their barcoding success (% correct identification, χ2 tests).
Two methods were used to evaluate barcoding identification success: the minimum genetic distance between sequence pairs (GD) and the maximal percentage identity in a basic local alignment search tool (PI). Correct identification = the individual is assigned to the correct species or genus only, multiple identification = the individual is assigned to several species or genera including the right one, wrong identification = the individual is assigned to one or several species or genera not including the right one. Only those individuals with at least one other individual of their species in the database were tested against databases containing all the available samples, except the query individual. N.: number, sp.: species, ge.: genera, ind.: individual. Note that six morpho-species belonging to unknown genera were excluded from the reference databases for genus-level identification but were kept for species-level identification.
Determinants of the barcoding success of African rain forest trees: Spearman's correlation coefficients between the proportion of individuals correctly identified at the species level and the number of individuals (N. indiv), number of haplotypes (N. haplotypes), or clade richness (Clade R. genus, Clade R. 99% PI) per species in the database.
| Mean Barcoding success | N. indiv | N. haplotypes | Clade R. genus | Clade R. PI |
|
| −0.103 NS | −0.178* | −0.431*** | −0.477*** |
|
| −0.114 NS | 0.015 NS | −0.485*** | −0.417*** |
|
| 0.054 NS | −0.123 NS | −0.364*** | −0.627*** |
P-values of tests: * P≤0.05, *** P<0.001, NS non significant (P>0.05.) The barcoding success is calculated for each species as the mean barcoding success of all individuals belonging to that species (1: assigned to the correct species only, 0: assigned to several species including the correct one, −1: assigned to one or several species not including the right species). Clade richness is either measured as the number of species in the database belonging to the same genus as the query individual (Clade R. genus), or as the number of species in the database that have samples with a percentage identity in a BLAST ≥99% with the query sample (Clade R. PI).
Barcoding success of African rain forest trees at the regional scale for genus identification using the PI method.
| Correct (%) | Multiple/Wrong (%) | N | ||
| All samples | ||||
|
| 83.9 | 6.3/9.8 | 143 | |
|
| 85.0 | 10.0/5.0 | 80 | |
|
| 88.6 | 0.0/11.4 | 88 | |
| Samples available for the 3 barcodes | ||||
|
| 84.3 | 5.9/9.8 | 51 | |
|
| 90.2 | 2.0/7.8 | 51 | |
|
| 88.2 | 0.0/11.8 | 51 | |
|
| 86.3 | 3.9/9.8 | 51 | |
|
| 90.2 | 0.0/9.8 | 51 | |
|
| 90.2 | 0.0/9.8 | 51 | |
|
| 90.2 | 0.0/9.8 | 51 | |
All χ2 tests for differences in barcoding success (% correct identification) among markers or combinations of markers were non-significant.
These results are obtained from an analysis of the highest percentage identity resulting from a BLAST of DNA sequences from samples in Gabon on a local reference database from Cameroon. The reference database contains at least one individual of the genus of the individuals from Gabon, but not always one individual of their species. Correct = the percentage of samples assigned to the correct genus only, multiple = the percentage of samples assigned to several genera including the right one, wrong = the percentage of samples assigned to one or several genera not including the right one. N: number of query samples tested.
Impact of sequence length differences, ambiguous bases or missing data on K2P distance and the output of the BLAST algorithm (Percentage Identity and Bit-Score).
| Query sample | Subject sample | K2P | BLAST output | ||||
| PI | alig. length | Mismatches | Bit-Score | % Bit-Score max. | |||
| seq_ok | seq_ok | 0 | 100 | 368 | 0 | 729 | 100 |
| seq_ok | seq_N | 0 | 99 | 368 | 3 | 712 | 98 |
| seq_ok | seq_Y | 0 | 99 | 368 | 3 | 718 | 98 |
| seq_ok | seq_short | 0 | 100 | 354 | 0 | 702 | 96 |
seq_ok is a 368 bp long sequence without missing data or ambiguous bases. It is compared to that same sequence with slight modifications representative of the limits of sequencing techniques: seq_N has three “N” within the sequence (internal missing data), seq_Y has three “C” or “T” bases replaced by a “Y” (ambiguous bases), seq_short is 14 bp shorter (missing data at each end). K2P: K2P distance obtained with the PAUP software. PI (Percentage Identity) and Bit-Score result from a BLAST analysis obtained with the BLASTCLUST software. % Bit-Score max. is the percentage of the Bit-Score obtained compared to the maximum Bit-Score (when seq_ok is blasted on itself).