| Literature DB >> 31842734 |
Santi Santichaivekin1, Ross Mawhorter1, Ran Libeskind-Hadas2.
Abstract
BACKGROUND: Maximum parsimony reconciliation in the duplication-transfer-loss model is widely used in studying the evolutionary histories of genes and species and in studying coevolution of parasites and their hosts and pairs of symbionts. While efficient algorithms are known for finding maximum parsimony reconciliations, the number of reconciliations can grow exponentially in the size of the trees. An understanding of the space of maximum parsimony reconciliations is necessary to determine whether a single reconciliation can adequately represent the space or whether multiple representative reconciliations are needed.Entities:
Keywords: Duplication-transfer-loss model; Maximum parsimony reconciliation; Phylogenetic trees
Mesh:
Year: 2019 PMID: 31842734 PMCID: PMC6915856 DOI: 10.1186/s12859-019-3203-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1DTL reconciliation. a An instance of the DTL reconciliation problem comprising a species tree (black), a gene tree (gray), and a leaf mapping. Duplication, transfer and loss costs are 1, 4, and 1, respectively. b and c Two different MPRs, each with total cost 4. d The associated reconciliation graph. Mapping nodes are indicated with double line borders. Event nodes are designated with (speciation event), (duplication event), (transfer event), or (loss event). The reconciliation traversal indicated by solid edges corresponds to the MPR in (b) and the reconciliation traversal indicated by dashed edges corresponds to the MPR in (c); bold edges indicate shared elements of the two MPRs. Figure adapted from Haack et. al [11] with permission
Running time and average normalized distance with and without loss events
| Running time (seconds) | Normalized distance (w/ loss) | Normalized distance (no loss) | ||||||
|---|---|---|---|---|---|---|---|---|
| DTL Costs | # Gene families w/ at least 104 MPRs | Average | Standard deviation | Maximum | Average | Standard Deviation | Average | Standard Deviation |
| (2,3,1) | 771 | 0.21 | 0.65 | 12.51 | 0.36 | 0.02 | 0.30 | 0.02 |
| (1,2,1) | 913 | 0.38 | 1.3 | 19.02 | 0.42 | 0.02 | 0.39 | 0.02 |
| (1,1,1) | 1492 | 3.53 | 14.25 | 295.90 | 0.42 | 0.03 | 0.41 | 0.03 |
For event costs (1,1,1), 3 of the 1492 gene families caused the algorithm to time out after five minutes and are not included in the statistics
Fig. 2Pairwise distances (with losses) for three phylogenetic trees for three gene families in the Tree of Life dataset. COG0466 has 87 leaves, COG0651 has 84 leaves, and COG0703 has 85 leaves. All are reconciled to a species tree with 100 leaves. Each of the three rows corresponds to one gene family and the three columns correspond to the DTL cost parameters (2,3,1), (1,2,1), and (1,1,1), respectively. The entry at index 0 of each vector is omitted. These examples demonstrate that the pairwise distance distributions are sensitive to event costs and may be multimodal, indicating the presence of two or more clusters in MPR space