| Literature DB >> 31050090 |
Jeremy C Andersen1, Peter Oboyski2, Neil Davies3, Sylvain Charlat4, Curtis Ewing5, Christopher Meyer6, Henrik Krehenwinkel7, Jun Ying Lim8, Suzuki Noriyuki9, Thibault Ramage10, Rosemary G Gillespie1, George K Roderick1.
Abstract
New genetic diagnostic approaches have greatly aided efforts to document global biodiversity and improve biosecurity. This is especially true for organismal groups in which species diversity has been underestimated historically due to difficulties associated with sampling, the lack of clear morphological characteristics, and/or limited availability of taxonomic expertise. Among these methods, DNA sequence barcoding (also known as "DNA barcoding") and by extension, meta-barcoding for biological communities, has emerged as one of the most frequently utilized methods for DNA-based species identifications. Unfortunately, the use of DNA barcoding is limited by the availability of complete reference libraries (i.e., a collection of DNA sequences from morphologically identified species), and by the fact that the vast majority of species do not have sequences present in reference databases. Such conditions are critical especially in tropical locations that are simultaneously biodiversity rich and suffer from a lack of exploration and DNA characterization by trained taxonomic specialists. To facilitate efforts to document biodiversity in regions lacking complete reference libraries, we developed a novel statistical approach that categorizes unidentified species as being either likely native or likely nonnative based solely on measures of nucleotide diversity. We demonstrate the utility of this approach by categorizing a large sample of specimens of terrestrial insects and spiders (collected as part of the Moorea BioCode project) using a generalized linear mixed model (GLMM). Using a training data set of known endemic (n = 45) and known introduced species (n = 102), we then estimated the likely native/nonnative status for 4,663 specimens representing an estimated 1,288 species (412 identified species), including both those specimens that were either unidentified or whose endemic/introduced status was uncertain. Using this approach, we were able to increase the number of categorized specimens by a factor of 4.4 (from 794 to 3,497), and the number of categorized species by a factor of 4.8 from (147 to 707) at a rate much greater than chance (77.6% accuracy). The study identifies phylogenetic signatures of both native and nonnative species and suggests several practical applications for this approach including monitoring biodiversity and facilitating biosecurity.Entities:
Keywords: DNA barcoding; Moorea BioCode; alien invasive species; biomonitoring; biosecurity; community barcoding; metabarcoding
Mesh:
Substances:
Year: 2019 PMID: 31050090 PMCID: PMC7079013 DOI: 10.1002/eap.1914
Source DB: PubMed Journal: Ecol Appl ISSN: 1051-0761 Impact factor: 4.657
Model choice
| Model | LL |
| AICc | ∆AICc |
|
|---|---|---|---|---|---|
| Average similarity + average distance + (1|order) | −63.90 | 4 | 136.15 | 0.71 | |
| Average similarity × average distance + (1|order) | −63.89 | 5 | 138.30 | 2.16 | 0.24 |
| Average distance + (1|order) | −67.58 | 3 | 141.38 | 5.23 | 0.05 |
| Average similarity + (1|order) | −74.80 | 3 | 155.77 | 19.62 | 0 |
Akaike information criterion corrected for sample size (AICc) scores for GLMM models including log‐likelihood (LL) values, number of parameters (K), AICc score, change in AICc scores from one model to the next (∆AICc), and the models weight (w).
GLMM scores for known introduced and endemic species
| Order | Introduced | Endemic |
|---|---|---|
| Araneae | 0.5695 (0.5601,0.5788) | 0.5624 (0.5477,0.5772) |
| Blattodea | ||
| Coleoptera | 0.1341 (0.1001,0.1681) | 0.1824 (0.1639,0.2009) |
| Diptera | 0.3267 (0.2887,0.3647) | 0.3518 (0.3138,0.3899) |
| Hemiptera | 0.1941 (0.1360,0.2522) | 0.2210 (0.0791,0.3629) |
| Hymenoptera | 0.1959 (0.1601,0.2317) | 0.2305 (0.0257,0.4352) |
| Lepidoptera | 0.3546 (0.3148,0.3945) | 0.3845 (0.3439,0.4250) |
| Odonata | 0.3841 (0.3350,0.4331) | |
| Psocoptera | ||
| Thysanoptera | 0.0511 (0.0465,0.0557) | |
| Grand total | 0.2582 (0.2291,0.2873) | 0.3839 (0.3444,0.4233) |
Values are reported as the mean value for species in each order followed by the lower 99% CI and the upper 99% CI in parentheses.
Numbers of endemic, introduced, and unknown species in a data set based on species‐level identifications and the categorizations in Ramage (2017)
| Training | Introduced | Endemic | Unknown | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted | Native | Nonnative | Undet | NA | Error (%) | Native | Nonnative | Undet | NA | Error (%) | Native | Nonnative | Undet | NA |
| Araneae | 6 | 100 | 7 | 3 | 0 | 25 | 11 | |||||||
| Blattodea | 1 | 0 | 4 | 3 | ||||||||||
| Coleoptera | 7 | 4 | 0 | 2 | 100 | 68 | 3 | 57 | ||||||
| Diptera | 3 | 4 | 3 | 3 | 23.08 | 1 | 1 | 0 | 56 | 91 | 29 | 121 | ||
| Hemiptera | 2 | 10 | 2 | 5 | 10.53 | 1 | 2 | 2 | 40 | 25 | 69 | 7 | 79 | |
| Hymenoptera | 3 | 22 | 1 | 9 | 8.57 | 1 | 1 | 50 | 10 | 49 | 12 | 83 | ||
| Lepidoptera | 9 | 3 | 2 | 1 | 60 | 13 | 5 | 3 | 23.81 | 104 | 35 | 44 | 109 | |
| Neuroptera | 1 | 3 | 1 | |||||||||||
| Odonata | 2 | 0 | 3 | 2 | ||||||||||
| Psocoptera | 1 | 0 | 6 | 11 | 2 | 14 | ||||||||
| Thysanoptera | 2 | 0 | 3 | 1 | ||||||||||
| Grand total | 23 | 49 | 8 | 22 | 22.55 | 24 | 10 | 5 | 6 | 22.22 | 226 | 334 | 100 | 481 |
Below each categorization, the numbers of species predicted to be likely endemic, likely introduced, as well as the number of species that could not be categorized by the model either because the GLMM value fell between the 95% Confidence Interval for Introduced and Endemic species (Undet) or because the species was represented by only a single specimen (NA), are shown along with the error rate (i.e., the percentage of species assigned as likely introduced for endemic species, or likely endemic for introduced species). The global error rate for each order is provided in the rightmost column.
Figure 1Comparison of the biogeographical categorizations based on Ramage (2017) or based on GLMM predictions. Categorizations are presented so that the biogeographic assignment in Ramage (2017) is before the colon and the biogeographic assignment based on the GLMM predictions is presented after the colon. Categorizations follow Table 3, except that two categories for the GLMM predictions (Undet and NA) have been combined into a sing category here labeled “Not Categorized.”