| Literature DB >> 28077778 |
Mark N Puttick1,2, Joseph E O'Reilly1, Alastair R Tanner3, James F Fleming1, James Clark1, Lucy Holloway1, Jesus Lozano-Fernandez1,3, Luke A Parry1, James E Tarver1, Davide Pisani4,3, Philip C J Donoghue5.
Abstract
Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods.Entities:
Keywords: Bayesian; cladistics; morphology; palaeontology; parsimony; phylogeny
Mesh:
Year: 2017 PMID: 28077778 PMCID: PMC5247500 DOI: 10.1098/rspb.2016.2290
Source DB: PubMed Journal: Proc Biol Sci ISSN: 0962-8452 Impact factor: 5.349
Figure 2.Accuracy of nodes is higher for those closer to the tips in the asymmetrical trees. The percentage of times a node was accurately reconstructed is shown as a proportion of a quarter of a circle in anticlockwise order for Bayesian, maximum likelihood, EW-Parsimony and IW-Parsimony at each node. Accuracy of reconstructions is significantly lower in the 100 character dataset (a), and increases in the 350 character (b) and 1000 character datasets (c). (Online version in colour.)
Figure 3.Accuracy of nodes is high for all nodes in the symmetrical phylogeny. The percentage of times a node was accurately reconstructed is shown as a proportion of a quarter of a circle in anticlockwise order for Bayesian, maximum likelihood, EW-Parsimony and IW-Parsimony at each node. Accuracy of reconstructions is high in each dataset size, but there is a non-significant increase in accuracy as dataset size increases (a–c). (Online version in colour.)
Figure 1.Contour plots of Robinson–Foulds distance against phylogenetic resolution, indicating the higher accuracy of Bayesian implementations against all other methods with data generated on the asymmetrical phylogeny. The spectrum of red to yellow, reflect lower to higher density of trees. As the number of characters increases all methods converge on the correct phylogeny, although Bayesian phylogenies are generally the least resolved. The other methods achieve higher resolution but at a cost of lower accuracy. Data generated on the symmetrical phylogeny shows similar patterns but with much less variance and higher accuracy for all iterations; this lack of variance means point estimates cannot be shown as density estimates. (Online version in colour.)
Bayesian approaches produce the most accurate trees for all character sets. Mean and range (in brackets) of Robinson–Foulds distances are lower for topologies estimated using Bayesian methods for both the symmetrical and asymmetrical generating tree. Maximum likelihood is the generally the most inaccurate method for the symmetrical generating tree, and implied weights parsimony performs worst for the asymmetrical generating tree.
| equal weights parsimony | implied weights parsimony | maximum likelihood | Bayesian | |
|---|---|---|---|---|
| asymmetrical generating phylogeny | ||||
| 100 | 34.89 (22–56) | 37.85 (22–56) | 45.84 (20–58) | 28.1 (18–39) |
| 350 | 26.57 (11–51) | 29.2 (12–51) | 26.49 (6–58) | 19.21 (7–35) |
| 1000 | 17.82 (3–40) | 19.16 (2–33) | 11.94 (0–58) | 9.34 (0–31) |
| symmetrical generating phylogeny | ||||
| 100 | 8.08 (0–33) | 9.29 (0–29) | 10.1 (0–58) | 7.51 (0–29) |
| 350 | 1.33 (0–28) | 1.43 (0–28) | 1.8 (0–52) | 1.2 (0–28) |
| 1000 | 0.32 (0–26) | 0.31 (0–26) | 0.51 (0–52) | 0.31 (0–26) |
p-Values from Spearman's rank correlation between the percentage of nodes being accurately reconstructed and their distance from the root. Nodes closer to the tips are significantly more likely to be accurately reconstructed in asymmetrical trees but this is not generally true for symmetrical phylogenies.
| asymmetrical tree | symmetrical tree | |
|---|---|---|
| MB 100 | <0.001 | 0.09919 |
| maximum likelihood 100 | <0.001 | 0.027295 |
| EW 100 | <0.001 | 0.106712 |
| IW 100 | <0.001 | 0.092736 |
| MB 350 | <0.001 | 0.638242 |
| maximum likelihood 350 | <0.001 | 0.057809 |
| EW 350 | <0.001 | 0.19683 |
| IW 350 | <0.001 | 0.148108 |
| MB 1000 | <0.001 | 0.256976 |
| maximum likelihood 1000 | <0.001 | 0.085987 |
| EW 1000 | <0.001 | 0.179186 |
| IW 1000 | <0.001 | 0.287058 |
Figure 4.Alternative phylogenetic reconstruction methods alter our understanding of evolution with empirical matrices. However, the relationship of fossil seed ferns from Hilton & Bateman [19] is changed according to implementation (a–d), although Caytonia remains as sister to angiosperms in all analyses. Alternative analyses change the taxonomic affinity of Kulindroplax from Sutton et al. [22] (e–h). (Online version in colour.)
Figure 5.Alternative phylogenetic reconstruction methods produce generally congruent reconstructions of evolution with empirical matrices. For Luo et al. [20], the relationship between the haramiyids and multituberculates is largely unchanged across analyses (a–d). IW-Parsimony (g) and Bayesian analyses place Nyasasaurus as close to the earliest dinosaur (e) and IW-Parsimony places it close to the earliest diverging taxa (g), but EW-Parsimony and maximum likelihood place the taxa as a derived member of Dinosauria (f,h). (Online version in colour.)