| Literature DB >> 27604879 |
Ren Ren1, Yazhou Sun2, Yue Zhao3, David Geiser4, Hong Ma5, Xiaofan Zhou6.
Abstract
A comprehensive and reliable eukaryotic tree of life is important for many aspects of biological studies from comparative developmental and physiological analyses to translational medicine and agriculture. Both gene-rich and taxon-rich approaches are effective strategies to improve phylogenetic accuracy and are greatly facilitated by marker genes that are universally distributed, well conserved, and orthologous among divergent eukaryotes. In this article, we report the identification of 943 low-copy eukaryotic genes and we show that many of these genes are promising tools in resolving eukaryotic phylogenies, despite the challenges of determining deep eukaryotic relationships. As a case study, we demonstrate that smaller subsets of ∼20 and 52 genes could resolve controversial relationships among widely divergent taxa and provide strong support for deep relationships such as the monophyly and branching order of several eukaryotic supergroups. In addition, the use of these genes resulted in fungal phylogenies that are congruent with previous phylogenomic studies that used much larger datasets, and successfully resolved several difficult relationships (e.g., forming a highly supported clade with Microsporidia, Mitosporidium and Rozella sister to other fungi). We propose that these genes are excellent for both gene-rich and taxon-rich analyses and can be applied at multiple taxonomic levels and facilitate a more complete understanding of the eukaryotic tree of life.Entities:
Keywords: eukaryotic phylogeny; fungal phylogeny; phylogenomics; single-copy genes
Mesh:
Substances:
Year: 2016 PMID: 27604879 PMCID: PMC5631032 DOI: 10.1093/gbe/evw196
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary of Supermatrix Datasets
|
|
|
|
|
|
|---|---|---|---|---|
| Euk-S33G478 | 478 | 129,463 | 16.79 | |
| Euk-S33G478-sub1 to -sub7 | 33 | 24–144 | 17,518–18,067 | 11.32–22.86 |
| Euk-S33G465 | 465 | 142,013 | 17.40 | |
| Euk-S33G465-sub1 to -sub7 | 19-140 | 19,534-19,869 | 13.18–21.17 | |
| Euk-S42G478 | 478 | 121,817 | 17.95 | |
| Euk-S42G478-sub1 to -sub7 | 42 | 24-144 | 16,758-17,453 | 13.65–24.38 |
| Euk-S42G465 | 465 | 133,064 | 17.98 | |
| Euk-S42G465-sub1 to -sub7 | 19-140 | 18,551-19,367 | 14.04–22.38 | |
| Euk-S39G20 | 39 | 20 | 9,819 | 11.07 |
| Euk-S39G25 | 25 | 13,886 | 10.12 | |
| Fun-S114G24 | 112 | 24 | 16,106 | 13.07 |
| Euk-S33G52 | 33 | 30,938 | 13.89 | |
| Euk-S35G52 | 35 | 52 | 30,700 | 14.73 |
| Euk-S40G52 | 40 | 30,645 | 15.99 | |
| Euk-S42G52 | 42 | 30,202 | 16.96 |
a MCM2-9, MLH1/4, MSH2/6, SMC1-6, DMC1, RAD51.
bWith the addition of RPA1, RPB1, RPC1, eIF1A, eIF5B.
c MCM2-7, MLH1-4, MSH1-6, SMC1-6, DMC1, RAD51.
*See supplementary tables S1, S2, S3, S5, and S7, Supplementary Material online for the complete list of species, genes and Support Information online for important lineages in each dataset.
. 1.—Bayesian analyses of eukaryotic phylogeny with 33 representative species. (A) An unrooted Bayesian tree estimated from Euk-S33G68/138/78/139. (B) The tree estimated from Euk-S33G478. The topologies were estimated by Phylobayes using CAT model. The five eukaryotic supergroups are colored as following; red, Amoebozoa; black, Opisthokonta; green, Archaeplastida; blue, SAR; and brown, Excavata. Posterior Probability (PP) support values are shown for each nodes. Black dots indicate 100% PP support. In (A), black dots indicate nodes receiving 100% support from all four datasets. Dashes indicate the lack of support for the relationship from the relevant dataset(s).
Support for Major Eukaryotes Clades in the Analyses of Subsets of Euk-S33G478 and Euk-S33G465
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Animals | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| fungi | 100 | 100 | nm* | 56 | 100 | 100 | 99 | 100 | 100 | 100 | 100 | 100 | 100 | 99 |
| Opisthokonta | 100 | 100 | nm | 99 | 100 | 100 | 100 | 100 | 100 | 96 | 100 | 100 | 100 | 100 |
| Amoebozoa | 100 | 100 | nm | nm | 100 | nm | 50 | 50 | 100 | 100 | 99 | 100 | nm | 50 |
| Archaeplastida | nm | 58 | nm | 100 | nm | nm | nm | 85 | 99 | nm | 84 | nm | nm | nm |
| Excavata | nm | Nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm |
| Alveolates | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Stramenopiles | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Alveolates + Stramenopiles | 100 | 100 | 63 | 100 | nm | 100 | nm | 100 | 100 | 99 | 99 | 100 | 95 | 100 |
Note.—nm - not monophyletic.
. 2.—Schematic representation of the evolutionary histories of (A) recA/RAD51 family; (B) MSH family; (C) MLH family; (D) SMC family; and (E) MCM family. The topologies are adopted from previous phylogenetic analysis on each gene families. Dotted boxes denote ancient duplication events in early eukaryotes. Prokaryotic outgroups are shown in grey.
. 3.—A matrix showing the distribution of selected marker genes in eukaryotes. The presence/absence of genes is highlighted by color: blank, absence; green, single copy; blue, two copies due to lineage-specific duplication; purple, more than two copies due to lineage-specific duplications; red, more than one copy due to duplications shared by more than one species. The Index of Single Copyness (ISC) is defined as , where n is the total number of species, m is equal to the gene copy number for species with a single copy of the gene or more than one copies of terminal paralogs (m is > 0; for species that do not have the gene, 1/m = 0), k is equal to the total number of species with paralogs shared by two or more species. This matrix includes the 18 genes that were included in both the Euk-S39 and the Fun-S114 datasets, and four commonly used eukaryotic marker genes as comparison.
. 4.—A Bayesian tree of 39 eukaryotes using 20 genes. The topology was estimated from the Euk-S39G20 dataset by Phylobayes using CAT model. The five eukaryotic supergroups are colored as following; red, Amoebozoa; black, Opisthokonta; green, Archaeplastida; blue, SAR; and brown, Excavata. The branch leading to Giardia is shown as a quarter of the original length. Posterior Probability (PP) support values from Bayesian analyses using Euk-S39G20 (first number) and Euk-S39G25 (second number) are shown for each nodes. Black dots indicate 100% PP support from both 20- and 25-gene analyses.
. 5.—Cladogram of 100 fungal species with 14 other eukaryotic species using 24 genes. The topology was estimated from Fun-S114G24 dataset by Phylobayes using CAT model. Black dots indicate 100% support from both Posterior Probability (PP) and bootstrap (BS) support from ML analysis (based on 100 replicates). Support values are only shown for nodes that do not receive 100% support. Dashes indicate lack of support from the ML analysis.
. 6.—Bayesian analyses of eukaryotic phylogeny using 478 Class I marker genes with 42 species. Topologies were estimated from Euk-S42G478 by Phylobayes using CAT model. The five eukaryotic supergroups are colored as following; red, Amoebozoa; black, Opisthokonta; green, Archaeplastida; blue, SAR; and brown, Excavata. Posterior Probability (PP) support values are shown for each nodes. Black dots indicate 100% PP support.
Support for Major Eukaryotes Clades in the Analyses of Subsets of Euk-S42G478 and Euk-S42G465
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Animals | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Fungi | 100 | 100 | nm | nm | 100 | 99 | 84 | 100 | 100 | 100 | 100 | 100 | 100 | 99 |
| Opisthokonta | 100 | 100 | nm | 100 | 100 | 99 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Apusozoa + Opisthokonta | nm | nm | nm | 99 | nm | nm | 59 | nm | nm | 99 | nm | nm | nm | nm |
| Amoebozoa | 99 | 100 | nm | nm | 59 | 79 | 65 | 100 | nm | 100 | 93 | 99 | 99 | nm |
| Archaeplastida | 99 | nm | nm | nm | nm | nm | nm | nm | nm | 91 | nm | nm | nm | nm |
| Excavata | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm | nm |
| Rhizaria | 100 | nm | 100 | nm | nm | 100 | 99 | nm | nm | 100 | 100 | nm | 100 | 100 |
| Alveolates | 100 | 100 | 100 | 100 | nm | 100 | 99 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Stramenopiles | 100 | 100 | 50 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| SAR | 100 | nm | nm | nm | nm | 100 | nm | nm | nm | 99 | Nm | 50 | nm | 100 |
Note.—nm - not monophyletic.