| Literature DB >> 25634097 |
Robert W Scotland1, Mike Steel2.
Abstract
Phylogenetic methods typically rely on an appropriate model of how data evolved in order to infer an accurate phylogenetic tree. For molecular data, standard statistical methods have provided an effective strategy for extracting phylogenetic information from aligned sequence data when each site (character) is subject to a common process. However, for other types of data (e.g., morphological data), characters can be too ambiguous, homoplastic, or saturated to develop models that are effective at capturing the underlying process of change. To address this, we examine the properties of a classic but neglected method for inferring splits in an underlying tree, namely, maximum compatibility. By adopting a simple and extreme model in which each character either fits perfectly on some tree, or is entirely random (but it is not known which class any character belongs to) we are able to derive exact and explicit formulae regarding the performance of maximum compatibility. We show that this method is able to identify a set of non-trivial homoplasy-free characters, when the number [Formula: see text] of taxa is large, even when the number of random characters is large. In contrast, we show that a method that makes more uniform use of all the data-maximum parsimony-can provably estimate trees in which none of the original homoplasy-free characters support splits.Entities:
Keywords: Character compatibility; homoplasy; parsimony; phylogenetic tree
Mesh:
Year: 2015 PMID: 25634097 PMCID: PMC4395848 DOI: 10.1093/sysbio/syv008
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
FThe tree on the right resolves the tree on the left by the addition of an edge () to resolve vertex . The tree on the right displays any character that assigns taxa one state and a second state, so is compatible with this tree as well as the tree on the left (even though that tree does not display ).
FA) The probability that no two characters from among random binary characters are compatible lies above the curves shown. B) The probability that none of random binary characters on 30 or more taxa are compatible with a sequence of nontrivial binary characters that induce distinct splits lies above the curve shown (for ). C) The maximal value of allowed in Theorem 1 so that when nontrivial binary characters are sampled at random from a Yule pure-birth tree , a large ) number of trees are more parsimonious than and yet display none of the characters.