| Literature DB >> 27288478 |
Marko Mutanen1, Sami M Kivelä2, Rutger A Vos3, Camiel Doorenweerd3, Sujeevan Ratnasingham4, Axel Hausmann5, Peter Huemer6, Vlad Dincă4,7, Erik J van Nieukerken7, Carlos Lopez-Vaamonde8,9, Roger Vila7, Leif Aarvik10, Thibaud Decaëns11, Konstantin A Efetov12, Paul D N Hebert4, Arild Johnsen10, Ole Karsholt13, Mikko Pentinsaari14, Rodolphe Rougerie15, Andreas Segerer5, Gerhard Tarmann6, Reza Zahiri4,16, H Charles J Godfray17.
Abstract
The proliferation of DNA data is revolutionizing all fields of systematic research. DNA barcode sequences, now available for millions of specimens and several hundred thousand species, are increasingly used in algorithmic species delimitations. This is complicated by occasional incongruences between species and gene genealogies, as indicated by situations where conspecific individuals do not form a monophyletic cluster in a gene tree. In two previous reviews, non-monophyly has been reported as being common in mitochondrial DNA gene trees. We developed a novel web service "Monophylizer" to detect non-monophyly in phylogenetic trees and used it to ascertain the incidence of species non-monophyly in COI (a.k.a. cox1) barcode sequence data from 4977 species and 41,583 specimens of European Lepidoptera, the largest data set of DNA barcodes analyzed from this regard. Particular attention was paid to accurate species identification to ensure data integrity. We investigated the effects of tree-building method, sampling effort, and other methodological issues, all of which can influence estimates of non-monophyly. We found a 12% incidence of non-monophyly, a value significantly lower than that observed in previous studies. Neighbor joining (NJ) and maximum likelihood (ML) methods yielded almost equal numbers of non-monophyletic species, but 24.1% of these cases of non-monophyly were only found by one of these methods. Non-monophyletic species tend to show either low genetic distances to their nearest neighbors or exceptionally high levels of intraspecific variability. Cases of polyphyly in COI trees arising as a result of deep intraspecific divergence are negligible, as the detected cases reflected misidentifications or methodological errors. Taking into consideration variation in sampling effort, we estimate that the true incidence of non-monophyly is ∼23%, but with operational factors still being included. Within the operational factors, we separately assessed the frequency of taxonomic limitations (presence of overlooked cryptic and oversplit species) and identification uncertainties. We observed that operational factors are potentially present in more than half (58.6%) of the detected cases of non-monophyly. Furthermore, we observed that in about 20% of non-monophyletic species and entangled species, the lineages involved are either allopatric or parapatric-conditions where species delimitation is inherently subjective and particularly dependent on the species concept that has been adopted. These observations suggest that species-level non-monophyly in COI gene trees is less common than previously supposed, with many cases reflecting misidentifications, the subjectivity of species delimitation or other operational factors.Entities:
Keywords: DNA barcoding; Lepidoptera; gene tree; mitochondrial COI; mitochondrial cox1; paraphyly; polyphyly; species delimitation; species monophyly
Mesh:
Substances:
Year: 2016 PMID: 27288478 PMCID: PMC5066064 DOI: 10.1093/sysbio/syw044
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 9.160
FSchematic overview of the different potential reasons for species to be classified as para- or polyphyletic, or false-positively monophyletic in a gene tree.
FOverlap in species classified as mono-, para-, and polyphyletic using either NJ or ML methods. The number of species is indicated in each partition (the counts for monophyly exclude species represented by singletons).
Parameter estimates from a binomial generalized linear model (with a logistic link function) explaining the probability of non-monophyly
| Parameter | Estimate | Std. Error | z | P-value | Empirical P-value |
| Intercept | −0.354 | 0.190 | −1.86 | 0.063 | < 0.0001 |
| Dist. to NN | −2.22 | 0.189 | −11.8 | < 0.0001 | < 0.0001 |
| Intraspec. var. | 2.10 | 0.177 | 11.9 | < 0.0001 | < 0.0001 |
| Specimens | 0.0417 | 0.0148 | 2.82 | 0.0048 | 0.00020 |
| Dist. to NN × intraspec. var. | −0.0413 | 0.0258 | −1.60 | 0.11 | 0.0010 |
| Dist. to NN × specimens | −0.0117 | 0.0109 | −1.079 | 0.28 | 0.00010 |
| Intraspec. var. × specimens | −0.0257 | 0.00672 | −3.83 | 0.00013 | 0.00030 |
| Dist. to NN × intraspec. var. × specimens | 0.00464 | 0.00150 | 3.10 | 0.0019 | < 0.0001 |
Notes: The explanatory variables were the genetic distance to the nearest neighbor species (dist. to NN), maximum intraspecific K2P variation (intraspec. var.), and the number of specimens analyzed (specimens). Empirical P-values were derived from a permutation test (see text for details) because some fitted probabilities were numerically either zero or one, which may result in overestimated P-values when using the Wald approximation.
FProportions of species with different minimum K2P distances to their nearest neighbor in mono-, para-, and polyphyletic species. For monophyletic species, singletons were excluded.
FMonophyletic species (215 in total) showing less than 0.01 minimum K2P distance (or less than 7 nucleotide substitutions difference) to their closest species. The number of nucleotide substitutions to the nearest neighbors are indicated with arrows. The curve is not cleanly stepped because of slight variation in sequence lengths and because the substitution model employed does not assume equal likelihoods of all substitutions. Forty-eight species having K2P divergence of zero to the closest heterospecific would be rendered non-monophyletic by a single nucleotide substitution.
FThe estimated probability of species non-monophyly from a generalized linear model including as explanatory factors: genetic distance (in percent with 1% distance equaling to 0.01 K2P divergence) to nearest neighbor (vertical axes), maximum within-species K2P genetic distance (horizontal axis) and the number of specimens included in the analysis (the figures show four values). Probability values are indicated by a grayscale gradient with black = 1 and white = 0.
FProbability of finding non-monophyly as a function of the number of specimens per species included in the analysis. Points indicate proportions of non-monophyletic species in groups of species with equal number of analyzed specimens. The darkness of points indicates weights (inverses of bootstrap standard errors) used in fitting the regression curve, darker colors indicating higher weights. The curve [y=0.23×(1−e−()] is fitted with nonlinear asymptotic regression (see text for details).