| Literature DB >> 28812701 |
Xing-Xing Shen1, Chris Todd Hittinger2, Antonis Rokas1.
Abstract
Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.Entities:
Year: 2017 PMID: 28812701 PMCID: PMC5560076 DOI: 10.1038/s41559-017-0126
Source DB: PubMed Journal: Nat Ecol Evol ISSN: 2397-334X Impact factor: 15.460
The 17 contentious branches and 6 well-established branches (controls) as well as their alternative hypotheses in three phylogenomic data matrices from plants, animals, and fungi.
| Kingdom | Branch | Maximum likelihood tree (T1) | Alternative hypothesis (T2) | |
|---|---|---|---|---|
| Plants | 0.001* | |||
| Angiosperm | Magnoliids as sister to Eudicots + Chloranthales | Magnoliids + Chloranthales as sister to Eudicots | 0.030* | |
| Bryophyte | Hornworts as sister to all other land plants | Hornworts as sister to mosses + liverworts | 0.012* | |
| Gymnosperm | Gnetales as sister to the Pinaceae, nested within the Coniferales | Gnetales as sister to the Coniferales | 2e-06* | |
| Land plant | Zygnematophyceae as sister to all land plants | Charales as sister to all land plants | 0.003* | |
| Control: Seed plant | Seed plants are monophyletic | Seed plants are paraphyletic | 3e-99* | |
| Control: Moss | Mosses are monophyletic | Mosses are paraphyletic | 1e-43* | |
| Animals | Amphibian | Gymnophiona as sister to all other amphibians | Anura as sister to all other amphibians | 6e-13* |
| Eutherian | Xenarthra + Afrotheria as sister to all other placental mammals | Afrotheria as sister to all other placental mammals | 0.036* | |
| Lungfish | Lungfishes as sister to all tetrapods | Lungfishes + coelacanths as sister to all tetrapods | 7e-41* | |
| Neoavian | Pigeons as sister to all other Neoaves | Falcons as sister to all other Neoaves | 0.322 | |
| Teleost | Elopomorpha + Osteoglossomorpha as sister to all other teleosts | Osteoglossomorpha alone as sister to all other teleosts | 2e-05* | |
| Turtle | Turtles as sister to archosaurs (birds + crocodiles) | Turtles as sister to crocodiles | 1e-29* | |
| Control: Amniote | Amniotes are monophyletic | Amniotes are paraphyletic | 2e-05* | |
| Control: Mammal | Mammals are monophyletic | Mammals are paraphyletic | 1e-06* | |
| Fungi | Ascoideaceae | Ascoideaceae as sister to Phaffomycetaceae + Saccharomycodaceae + Saccharomycetaceae | Ascoideaceae as sister to Pichiaceae + CUG-Ser clade + Phaffomycetaceae + Saccharomycodaceae + Saccharomycetaceae | 0.005* |
| 1e-07* | ||||
| 0.012* | ||||
| 2e-59* | ||||
| 1e-53* | ||||
| WGD clade | Yeasts of the WGD clade are monophyletic | Yeasts of the WGD clade are paraphyletic | 0.002* | |
| Control: Saccharomycetaceae | Yeasts of the family Saccharomycetaceae are monophyletic | Yeasts of the family Saccharomycetaceae are paraphyletic | 2e-05* | |
| Control: Pichiaceae | Yeasts of the family Pichiaceae are paraphyletic | Yeasts of the family Pichiaceae are monophyletic | 7e-05* |
For each branch, the topological test between T1 and T2 was conducted using the approximately unbiased (AU)[19] test, as implemented in the CONSEL software, version 2.0[20], with 1000 bootstrap replicates. Star symbols (*) indicate cases where T1 is significantly better than T2 (P-value < 0.05).
Figure 1A schematic representation of our approach for quantifying and visualizing phylogenetic signal in a phylogenomic data matrix
a, Two alternative phylogenetic hypotheses (T1, the unconstrained ML tree under concatenation; T2, the ML tree constrained to recover the T2 branch). b, Calculation of the difference in the gene-wise log-likelihood scores (ΔGLS) of T1 versus T2 for each gene in the data matrix. The difference in the site-wise log-likelihood scores, ΔSLS, of T1 versus T2 for each site in the data matrix is also calculated but is not shown here. The gene-wise phylogenetic signal (ΔGLS) for T1 versus T2 can be visualized either by arranging genes in the order of their placements in the data matrix (c) or in descending order of their ΔGLS values (d). Red bars denote genes supporting T1, whereas green bars denote genes supporting T2. The data for panels c and d are the actual values from the analysis of the Ascoideaceae branch in the fungal phylogenomic data matrix (Table 1). The schematic representation of our approach for quantifying and visualizing phylogenetic signal among three alternative phylogenetic hypotheses (T1, T2, and T3) is shown in Supplementary Fig. 1.
Figure 2Distributions of phylogenetic signal for 17 contentious branches in plant, animal, and fungal phylogenomic data matrices
For each branch, ΔGLS values (Y-axis) were calculated by measuring the difference in gene-wise log-likelihood scores for T1 versus T2. The distribution of ΔGLS values was visualized by displaying their values for all genes in the phylogenetic data matrix in the order of their placement in the matrix (X-axis; see Supplementary Table 1–3). As a control, we also examined the distribution of ΔGLS values for two well-established branches for each of the three data matrices (Plants, monophyly of seed plants and monophyly of mosses; Animals, monophyly of amniotes and monophyly of mammals; Fungi, monophyly of the family Saccharomycetaceae and paraphyly of the family Pichiaceae; Table 1). Red bars denote genes supporting T1, whereas green bars denote genes supporting T2. The distributions of ranked ΔGLS values for these 23 branches are provided in Supplementary Fig. 2. The specific T1 and T2 topologies compared in each of the branches examined are provided in Table 1.
Figure 3Quantification of the effect of the removal of tiny amounts of data on the branch’s topology for 17 contentious branches in plant, animal, and fungal phylogenomic data matrices
For each branch, the 1, 5, 10, 50, and 100 genes with the highest absolute ΔGLS values were excluded; we also excluded the genes with outlier ΔGLS values (the number of outlier genes is given above the corresponding bar for each branch). The Y-axis shows the difference in log-likelihood scores (ΔlnL) for the favored topological hypothesis. The different hypotheses favored are indicated by different bar colors; red bars denotes that ML trees inferred from their corresponding data matrices support T1; green bars denote ML trees supporting T2; gray bars denote ML trees supporting hypotheses other than T1 and T2. The ΔlnL values displayed on the Y-axis corresponds to the difference in log-likelihood values for T1 against either T2 or “others”. If ΔlnL is > 0, then the ML tree supports T1, whereas if ΔlnL is < 0, then ML tree supports T2 or “others”.
Figure 4Tiny amounts of data exert decisive influence in the resolution of certain contentious branches in phylogenomic studies
The effect of the removal of tiny amounts of data on the branch’s topology and bootstrap support (BS) was quantified for 17 contentious branches and 6 well-established branches (controls) in plant, animal, and fungal phylogenomic data matrices. Different colors indicate different branch topologies and levels of BS. Topologies other than T1 and T2 are collectively referred to as “Others”. (Top panel: Concatenation) The first row depicts the results of the concatenation analysis when the full data matrix is used, the second row when a single random gene is excluded, the third row when the gene with the highest absolute ΔGLS value is excluded, and the fourth row when the genes with outlier ΔGLS values are excluded. (Bottom panel: Coalescence) The first row depicts the results of the coalescence-based analysis when the full data matrix is used, the second row when one random site from every gene’s alignment is excluded, the third row when the site with the highest absolute ΔSLS value from every gene is excluded, and the fourth row when the 1% of sites with the highest absolute ΔSLS values from every gene are excluded. All topologies summarized in this figure are provided in Supplementary Figs. 5–56.
Figure 5The distribution of phylogenetic signal for three alternative topological hypotheses on the earliest-branching metazoan lineage
a, The three alternative topological hypotheses are: ctenophores as the sister group to all other metazoan phyla (Ctenophora-sister) (T1), sponges as the sister group to all other metazoans (Porifera-sister) (T2), or a clade composed of ctenophores and sponges as the sister group to all other metazoans (Porifera+Ctenophora-sister) (T3). All silhouettes come from http://phylopic.org. Human image by Andrew Farke, ctenophore and sponge images by Mali’o Kodis, adapted from photograph by Derek Keats (http://www.flickr.com/photos/dkeats/). b, Proportions of genes or sites supporting each of three alternative hypotheses for each of eight data matrices from three phylogenomic studies[11,29,30]. Note that two different non-animal outgroup sets are used in the studies by Ryan et al.[11] and by Whelan et al.[29]: datasets whose labels include the word “Choanoflagellata” use only choanoflagellate taxa as outgroups, whereas datasets labeled with “Opisthokonta” use fungal, holozoan taxa, including choanoflagellates, as outgroups. Values in parentheses next to data matrices’ names indicate the number of genes present in each phylogenomic data matrix. The ΔGLS values for the genes across each data matrix are provided in Supplementary Table 9 and their distributions are shown in Supplementary Figs. 62 and 63. The phylograms of all concatenation ML analyses following the removal of the gene with the highest ΔGLS value as well as those following the removal of the genes with outlier ΔGLS values in the eight data matrices can be found in Supplementary Figs. 65a–65h.