| Literature DB >> 28108445 |
P Kapli1, S Lutteropp1,2, J Zhang1, K Kobert1, P Pavlidis3, A Stamatakis1,2, T Flouri1,2.
Abstract
MOTIVATION: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28108445 PMCID: PMC5447239 DOI: 10.1093/bioinformatics/btx025
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Visual representation of the mPTP dynamic programming algorithm. Each entry i at node u is computed from information stored at entries j and k of child nodes v and w, for all j and k such that . The dashed branches denote the smallest set S of branches which, by definition, must be part of the speciation process, irrespective of the resolution of other subtrees outside T
Main characteristics of empirical test datasets
| Genus | Phylum | NSp | NMSp | NSeq | AL | APD (%) |
|---|---|---|---|---|---|---|
| Annelida | 43 | 28 | 265 | 1539 | 16.7 | |
| Arthropoda | 16 | 14 | 195 | 659 | 11.8 | |
| Arthropoda | 84 | 60 | 399 | 1076 | 14.2 | |
| Arthropoda | 15 | 11 | 493 | 713 | 10.2 | |
| Arthropoda | 20 | 17 | 286 | 659 | 6.2 | |
| Arthropoda | 49 | 34 | 514 | 880 | 11.5 | |
| Arthropoda | 97 | 80 | 532 | 1495 | 12.9 | |
| Arthropoda | 29 | 24 | 641 | 679 | 16.6 | |
| Arthropoda | 35 | 33 | 861 | 1238 | 8.9 | |
| Arthropoda | 53 | 27 | 755 | 1082 | 21.4 | |
| Arthropoda | 35 | 32 | 1127 | 1256 | 8.4 | |
| Arthropoda | 100 | 76 | 1252 | 1485 | 17.5 | |
| Arthropoda | 4 | 3 | 1775 | 1548 | 2.3 | |
| Arthropoda | 148 | 109 | 2303 | 1589 | 11.2 | |
| Arthropoda | 121 | 68 | 2741 | 1544 | 10.5 | |
| Chordata | 41 | 35 | 181 | 709 | 18.9 | |
| Chordata | 12 | 10 | 282 | 658 | 14.7 | |
| Chordata | 54 | 37 | 789 | 1542 | 11.7 | |
| Cnidaria | 5 | 4 | 92 | 1002 | 12.7 | |
| Echinodermata | 13 | 8 | 75 | 1605 | 16.8 | |
| Echinodermata | 18 | 12 | 355 | 1553 | 13.6 | |
| Mollusca | 20 | 11 | 304 | 708 | 10.9 | |
| Mollusca | 24 | 12 | 686 | 675 | 11.1 | |
| Platyhelminthes | 7 | 5 | 316 | 1608 | 5.1 |
NSp, number of species; NMSp, number of monophyletic species; NSeq, number of sequences; AL, alignment length; APD, average P-distance.
Fig. 2.Average performance over all datasets of the five delimitation methods (mPTP, PTP, Usearch, Crop and ABGD) for the (A) number of species, (B) F-scores and (C) number of RTS
Percentage of RTS, F-scores and number of delimited species for the five delimitation methods (mPTP, PTP, Usearch, Crop and ABGD) for five of the empirical datasets
| mPTP | PTP | Usearch | Crop | ABGD | |
|---|---|---|---|---|---|
| Genus | |||||
| 51 | 40 | 44 | 40 | 44 | |
| 41 | 39 | 30 | 31 | 40 | |
| 64 | 60 | 57 | 55 | 60 | |
| 47 | 51 | 41 | 34 | 53 | |
| 60 | 27 | 47 | 53 | 47 | |
| Amynthas | 0.784 | 0.638 | 0.649 | 0.674 | 0.673 |
| 0.787 | 0.704 | 0.730 | 0.681 | 0.730 | |
| 0.839 | 0.844 | 0.836 | 0.832 | 0.850 | |
| 0.728 | 0.559 | 0.747 | 0.729 | 0.765 | |
| 0.882 | 0.717 | 0.812 | 0.852 | 0.828 | |
| 64 | 104 | 95 | 91 | 90 | |
| 126 | 218 | 193 | 154 | 118 | |
| 85 | 100 | 95 | 95 | 91 | |
| 139 | 444 | 183 | 217 | 157 | |
| 21 | 38 | 26 | 20 | 24 | |
Fig. 3.For each method, we fit a regression line to the points of correspondence of the percentage of monophyletic species (x-axis) to the percentage of RTS (y-axis). The Pearson coefficient (r) is given for each correlation in the corresponding color