| Literature DB >> 33267027 |
Jun Feng1, Zeyun Liu1, Hongwei Feng1, Richard F E Sutcliffe1, Jianni Liu2, Jian Han2.
Abstract
To address the instability of phylogenetic trees in morphological datasets caused by missing values, we present a phylogenetic inference method based on a concept decision tree (CDT) in conjunction with attribute reduction. First, a reliable initial phylogenetic seed tree is created using a few species with relatively complete morphological information by using biologists' prior knowledge or by applying existing tools such as MrBayes. Second, using a top-down data processing approach, we construct concept-sample templates by performing attribute reduction at each node in the initial phylogenetic seed tree. In this way, each node is turned into a decision point with multiple concept-sample templates, providing decision-making functions for grafting. Third, we apply a novel matching algorithm to evaluate the degree of similarity between the species' attributes and their concept-sample templates and to determine the location of the species in the initial phylogenetic seed tree. In this manner, the phylogenetic tree is established step by step. We apply our algorithm to several datasets and compare it with the maximum parsimony, maximum likelihood, and Bayesian inference methods using the two evaluation criteria of accuracy and stability. The experimental results indicate that as the proportion of missing data increases, the accuracy of the CDT method remains at 86.5%, outperforming all other methods and producing a reliable phylogenetic tree.Entities:
Keywords: attribute reduction; information entropy; morphological analysis; phylogenetic tree
Year: 2019 PMID: 33267027 PMCID: PMC7514795 DOI: 10.3390/e21030313
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1An example of phylogenetic inference. Photographs (A–F) show examples of Cambrian Chengjiang Lagerstätte fossils. (G) is a morphological attribute matrix, where the rows represent species and the columns represent attributes. In the column labels of the matrix, the first row represents the attribute number and the second row corresponds to the attribute name. (H) is a phylogenetic tree for selected lobopodians and arthropods from the early Cambrian era [1].
Figure 2The framework of phylogenetic inference based on the Concept Decision Tree.
The number of possible values.
| No. Attributes | No. Values | Possible Value |
|---|---|---|
| 1 | 2 |
|
| 2 |
|
|
| 3 |
|
|
|
|
|
|
|
|
|
|
Setting the chromosome bit and code.
| Site | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Code | 1 | 3 | 0 | 4 | 5 | 7 | 10 | 9 | 9 | 8 |
Figure 3The strategy of species grafting in a single decision node.
Figure 4An example of the species grafting algorithm. The red dot indicates the final graft position of the species G.
Figure 5An example of handling polymorphic trees.
Experimental data sets.
| Datasets | No. Species | No. Attributes | Reference |
|---|---|---|---|
| Pharyngodonidae | 25 | 30 | Bouamer and Morand (2003) [ |
| Hibiscus | 40 | 38 | Tang et al. (2014) [ |
| Meligethes | 42 | 60 | Lin et al. (2015) [ |
| Nemesiid spiders | 77 | 60 | Goloboff (1995) [ |
| Phrynosomatid lizards | 115 | 59 | Reeder and Wiens (1996) [ |
| liebherr | 160 | 136 | Hawaiian Platynini (Carabidae), Liebherr (1998) [ |
Figure 6Accuracies of phylogenetic analysis for different proportions of missing data.
Average accuracies of the different methods for different data sets. The bold numbers indicate the highest accuracy in the column.
| Pharyngodonidae |
|
| Nemesiid Spiders | Phrynosomatid Lizards |
| Avg. | |
|---|---|---|---|---|---|---|---|
| BI | 0.8919 | 0.8714 | 0.7828 | 0.8672 | 0.8567 | 0.8355 | 0.851 |
| ML | 0.8400 | 0.8250 | 0.7461 | 0.8659 | 0.8501 | 0.8428 | 0.828 |
| MP |
| 0.8778 | 0.7905 | 0.8730 | 0.8618 | 0.8283 | 0.855 |
| CDT | 0.8983 |
|
|
|
|
|
|
Figure 7The tree length for different proportions of missing data and for different methods.
The variance of tree length between the CDT algorithm and that calculated by the other three methods.
| Pharyngodonidae |
|
| Nemesiid Spiders | Phrynosomatid Lizards |
| Avg. | |
|---|---|---|---|---|---|---|---|
| CDT vs. BI | 0.0282 | 0.0800 | 0.4278 | 0.0282 | 0.0800 | 0.0500 | 0.1157 |
| CDT vs. ML | 0.0153 | 0.0957 | 0.0488 | 0.0153 | 0.0957 | 0.0180 | 0.0481 |
| CDT vs. MP | 0.1250 | 0.1128 | 0.0282 | 0.1250 | 0.1128 | 0.0821 | 0.0977 |
Figure 8A paleontological phylogenetic tree. The red solid dot is the node position and the position of the red square is the grafting position of the species.