| Literature DB >> 22551229 |
Lavanya Kannan1, Ward C Wheeler.
Abstract
BACKGROUND: Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.Entities:
Year: 2012 PMID: 22551229 PMCID: PMC3377548 DOI: 10.1186/1748-7188-7-9
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Phylogenetic Networks. The figure shows two phylogenetic networks; the one on the left has only a single reticulate vertex and the one on the right has two reticulate vertices that are sisters to each other. Note that removing the edges (7, 9) and (7, 10) from the network on the right does not result in a tree where vertex 7 is a leaf. This shows that removing one incoming edge each per reticulate vertex does not necessarily produce a tree with the same leaf set as the network. The post-ordering and the pre-ordering of vertices of the network on the left are 1, 2, 8, 6, 3, 7, 5, 4, 0 and 0, 5, 6, 1, 8, 2, 7, 3, 4 respectively and for the network on the right, they are 1, 2, 9, 6, 3, 10, 7, 5, 4, 8, 0 and 0, 5, 6, 1, 9, 2, 7, 10, 3, 8, 4 respectively.
Figure 2Dynamic programming solution. The dynamic programming solution applied to a phylogenetic network. The states are 0, 1 and 2. The cost matrix used has all 1s except the diagonal elements which are all 0s. The tables shown on each vertex v are the costs, S(i) (second row) of each state, i (first row) that are computed during the post-order traversal. Also shown at the vertices are the states of the child w, namely the t((i) (third row) that correspond to the costs in the second row; when there are two children for a vertex, the entries in the third row are represented as a pair of states of the left child and the right child respectively. Each edge (v, w) is labelled with s((i) for each state i. During the pre-order traversal, the states for each vertex are selected (shown in bold). The cost of 2 highlighted in bold at the root vertex gives a lower bound, S. The state assigned at each vertex is highlighted in bold. The algorithm finds a total of three substitutions (highlighted by bold edges). This is because the states assigned at the parent vertices of the reticulate vertex give conflicting assignments of 1 and 2 respectively, of which state 1 is assigned at the reticulate vertex. Thus with an extra cost of 1, we get the score of 3 (an upper bound of the optimal score) as the parsimony score corresponding to the assignment shown. Note that the optimum parsimony score on the network is 2 (equal to the lower bound), which can be found by exhaustive search and can be realized by changing the assignment from 1 to 2 for the left parent of the reticulate vertex and from 1 to 2 for the reticulate vertex. Thus the lower bound matches with the optimal score, although the assignment corresponding to the lower bound is not conflict-free and not the same as the assignment corresponding to the optimum.
Figure 3Fitch-type solution. The Fitch-type solution applied to the same phylogenetic network and the leaf data in Figure 2. Each vertex is assigned a set of all possible states, along with a score when the network vertices are traversed in post-order. The score at the root gives a upper bound for the optimal score. The state assignments are given during the pre-order traversal phase and the number of substitutions matches with the score at the root.