| Literature DB >> 26915959 |
Kassian Kobert1, Leonidas Salichos2, Antonis Rokas3, Alexandros Stamatakis4.
Abstract
We present, implement, and evaluate an approach to calculate the internode certainty (IC) and tree certainty (TC) on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the IC and TC calculations. We implement our methods in RAxML and test them on empirical datasets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any dataset should also include trees containing the full species set.Entities:
Keywords: bipartition frequencies; clade support; gene trees; internode certainty.
Mesh:
Year: 2016 PMID: 26915959 PMCID: PMC4868120 DOI: 10.1093/molbev/msw040
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FOverview of the proposed methods.
FDistribution of adjusted support for observed and lossless adjustment scheme.
FExample tree set for IC calculations.
Differences D in IC/ICA Scores, between the Scores Calculated by the Adjustment Schemes and the Reference Scores for the Comprehensive Tree Set.
| Probabilistic | 0.31 | 0.20 | 0.18 | 0.08 | 0.26 | 0.18 | 0.18 | 0.12 |
| Observed | 0.42 | 0.27 | 0.15 | 0.07 | 0.39 | 0.25 | 0.19 | 0.08 |
| Lossless | 0.65 | 0.44 | 0.24 | 0.17 | 0.60 | 0.44 | 0.28 | 0.15 |
Differences D in IC/ICA Scores, between the Pruned Tree Sets Only Containing Partial Gene Trees and the Reference Values.
| Probabilistic | 0.50 | 0.52 | 0.53 | 0.53 | 0.47 | 0.48 | 0.50 | 0.50 |
| Observed | 0.50 | 0.51 | 0.53 | 0.53 | 0.45 | 0.48 | 0.50 | 0.49 |
| Lossless | 0.61 | 0.48 | 0.50 | 0.52 | 0.46 | 0.43 | 0.47 | 0.49 |
Fraction of Branches for which the Adjusted IC/ICA Scores Are Higher than the IC/ICA Reference Scores.
| All trees | ||||||||
| Probabilistic | 0.4 | 0.35 | 0.35 | 0.15 | 0.25 | 0.25 | 0.2 | 0.15 |
| Observed | 0.15 | 0.3 | 0.4 | 0.2 | 0.2 | 0.2 | 0.2 | 0.1 |
| Lossless | 0.1 | 0.25 | 0.15 | 0.25 | 0.2 | 0.2 | 0.25 | 0.1 |
| Partial trees | ||||||||
| Probabilistic | 0.8 | 0.8 | 0.85 | 0.85 | 0.8 | 0.8 | 0.85 | 0.85 |
| Observed | 0.65 | 0.75 | 0.8 | 0.85 | 0.65 | 0.75 | 0.8 | 0.85 |
| Lossless | 0.3 | 0.65 | 0.75 | 0.8 | 0.25 | 0.65 | 0.75 | 0.8 |
Note.—The top table contains values for all three adjustment schemes if all trees (comprehensive and simulated partial) are included in the analysis. The bottom table shows the values for all three methods if only partial trees are analyzed.
FDistribution of taxon number over trees in the yeast data.
FDistribution of taxon number over trees in the avian data.
FBipartition numbers corresponding to the presented tables, for the yeast data set. Taxon key: Kwal: Kluyveromyces waltii, Kthe: Kluyveromyces thermotolerans, Sklu: Saccharomyces kluyveri, Klac: Kluyveromyces lactis, Egos: Eremothecium gossypii, Zrou: Zygosacharomyces rouxii, Kpol: Kluyveromyces polysporus, Cgla: Candida glabrata, Scas: Saccharomyces castellii, Sbay: Saccharomyces bayanus, Skud: Saccharomyces kudriavzevii, Smik: Saccharomyces mikatae, Spar: Saccharomyces paradoxus, Scer: Saccharomyces cerevisiae, Clus: Candida lusitaniae, Cdub: Candida dubliniensis, Calb: Candida albicans, Ctro: Candida tropicalis, Cpar: Candida parapsilosis, Lelo: Lodderomyces elongisporus, Psti: Pichia stipitis, Cgui: Candida guilliermondii, Dhan: Debaryomyces hansenii
IC Scores for All Nontrivial Bipartitions Multiplied by 100 and Rounded Down.
| 23 Taxa | None | 95 | 29 | 9 | 3 | 48 | 27 | 5 | 95 | 2 | 14 | 1 | 56 | 94 | 75 | 71 | 71 | 7 | 1 | <1 | 99 |
| 4–23 Taxa | Probabilistic | 89 | 28 | 8 | 3 | 46 | 28 | 6 | 91 | 2 | 15 | 1 | 52 | 92 | 72 | 65 | 70 | 7 | 2 | <1 | 92 |
| 4–23 Taxa | Observed | 89 | 12 | 12 | 3 | 52 | 24 | 4 | 58 | 1 | 14 | 2 | 36 | 91 | 69 | 64 | 69 | 7 | 2 | 1 | 57 |
| 4–23 Taxa | Lossless | 82 | 2 | 15 | 2 | 39 | 26 | 5 | 41 | <1 | 10 | 3 | 15 | 89 | 61 | 56 | 65 | 7 | 1 | <1 | 68 |
Note.—The bipartition labels are shown in figure 6. The dataset can either consist of only full trees (23 taxa), or partial and full trees (4–23 taxa).
ICA Scores for All Nontrivial Bipartitions Multiplied by 100 and Rounded Down.
| 23 Taxa | None | 95 | 23 | 7 | 8 | 48 | 25 | 14 | 95 | 3 | 12 | 2 | 45 | 94 | 75 | 71 | 71 | 7 | 8 | 9 | 98 |
| 4–23 Taxa | Probabilistic | 89 | 21 | 6 | 13 | 46 | 26 | 14 | 91 | 3 | 11 | 1 | 38 | 92 | 72 | 60 | 70 | 25 | 7 | 11 | 92 |
| 4–23 Taxa | Observed | 89 | 15 | 9 | 12 | 52 | 24 | 12 | 58 | 2 | 11 | 11 | 34 | 91 | 69 | 59 | 69 | 24 | 7 | 11 | 57 |
| 4–23 Taxa | Lossless | 82 | 13 | 10 | 7 | 39 | 27 | 13 | 46 | 3 | 9 | 8 | 29 | 89 | 61 | 49 | 65 | 7 | 5 | 5 | 68 |
Note.—The bipartition labels are shown in figure 6. The datasets again either consist of only full trees (23 taxa), or partial and full trees (4-23 taxa).
IC Scores for All Nontrivial Bipartitions Multiplied by 100 and Rounded Down.
| 23 Taxa | None | 95 | 29 | 9 | 3 | 48 | 27 | 5 | 95 | 2 | 14 | 1 | 56 | 94 | 75 | 71 | 71 | 7 | 1 | <1 | 99 |
| 4–22 Taxa | Probabilistic | 93 | 64 | 61 | 58 | 72 | 66 | 59 | 85 | 39 | 46 | 43 | 64 | 95 | 77 | 83 | 78 | 56 | 49 | 47 | 93 |
| 4–22 Taxa | Observed | 89 | 23 | 58 | 36 | 80 | 75 | 70 | 80 | 1 | 1 | <1 | 20 | 93 | 79 | 82 | 78 | 54 | 13 | 16 | 43 |
| 4–22 Taxa | Lossless | 80 | 24 | 58 | 12 | 66 | 57 | 32 | 68 | 24 | 12 | 12 | 2 | 88 | 54 | 42 | 49 | 43 | 12 | 38 | 7 |
Note.—The bipartition labels are shown in figure 6. Here, the dataset only contains trees with partial taxon sets.
ICA Scores for All Nontrivial Bipartitions Multiplied by 100 and Rounded Down.
| 23 Taxa | None | 95 | 23 | 7 | 8 | 48 | 25 | 14 | 95 | 3 | 12 | 2 | 45 | 94 | 75 | 71 | 71 | 7 | 8 | 9 | 98 |
| 4–22 Taxa | Probabilistic | 93 | 64 | 54 | 51 | 72 | 66 | 59 | 85 | 40 | 46 | 34 | 58 | 95 | 77 | 83 | 78 | 56 | 43 | 45 | 93 |
| 4–22 Taxa | Observed | 89 | 23 | 48 | 33 | 80 | 75 | 70 | 80 | 17 | 20 | 18 | 20 | 93 | 79 | 82 | 78 | 54 | 29 | 24 | 43 |
| 4–22 Taxa | Lossless | 80 | 27 | 58 | 24 | 66 | 57 | 29 | 68 | 24 | 11 | 12 | 2 | 88 | 54 | 42 | 49 | 43 | 12 | 38 | 22 |
Note.—The bipartition labels are shown in figure 6. Again, the dataset only contains trees with partial taxon sets.
IC and ICA Scores for Different Subsets of the Data Set for the Probabilistic and Lossless Distribution Schemes.
| 48 taxa | None | −3.14 | −3.17 |
| 41–48 Taxa | Probabilistic | −2.44 | 7.72 |
| 41–48 Taxa | Lossless | −5.05 | −1.35 |
| 41–47 Taxa | Probabilistic | 9.34 | 15.75 |
| 41–47 Taxa | Lossless | 6.01 | 6.01 |