| Literature DB >> 25249790 |
Arshan Nasir1, Kyung Mo Kim2, Gustavo Caetano-Anollés1.
Abstract
The origins of diversified life remain mysterious despite considerable efforts devoted to untangling the roots of the universal tree of life. Here we reconstructed phylogenies that described the evolution of molecular functions and the evolution of species directly from a genomic census of gene ontology (GO) definitions. We sampled 249 free-living genomes spanning organisms in the three superkingdoms of life, Archaea, Bacteria, and Eukarya, and used the abundance of GO terms as molecular characters to produce rooted phylogenetic trees. Results revealed an early thermophilic origin of Archaea that was followed by genome reduction events in microbial superkingdoms. Eukaryal genomes displayed extraordinary functional diversity and were enriched with hundreds of novel molecular activities not detected in the akaryotic microbial cells. Remarkably, the majority of these novel functions appeared quite late in evolution, synchronized with the diversification of the eukaryal superkingdom. The distribution of GO terms in superkingdoms confirms that Archaea appears to be the simplest and most ancient form of cellular life, while Eukarya is the most diverse and recent.Entities:
Mesh:
Year: 2014 PMID: 25249790 PMCID: PMC4164138 DOI: 10.1155/2014/706468
Source DB: PubMed Journal: Archaea ISSN: 1472-3646 Impact factor: 3.273
Figure 1Overview of the phylogenomic methodology. A matrix of raw census of GOTMF terms was normalized, standardized, and rescaled for phylogenetic reconstruction. Trees of functions (ToFs) were polarized by maximum character state (i.e., V) while trees of life were polarized (ToLs) by the minimum value (0) in the matrix.
Figure 2The distribution and evolution of GOTMF terms. (a) A Venn diagram illustrates the sharing patterns of molecular functions in the seven taxonomic groups (reproduced from [28]). Numbers of terms in Venn taxonomic groups and in superkingdoms are given in parentheses and are reflected by the areas of the diagram. (b) A ToF (tree length = 99,594 steps) portraying the evolution of GOTMF terms. Molecular activities present in all three superkingdoms are colored red while those unique to a superkingdom or shared by at most two are colored blue. The inset displays the most basal taxa. GO: 0004715 is the “nonmembrane spanning protein tyrosine kinase activity.”
Figure 3Order of the evolutionary appearance of Venn taxonomic groups. (a) Scatter plot highlighting the distribution of GOTMF terms with respect to evolutionary time (nd) and distribution in genomes (f). (b) Boxplots displaying the distribution of GOTMF terms with respect to evolutionary time (nd) in the seven taxonomic groups. The most ancient GOTMF term in each taxonomic group (and outliers) is indexed with numbers 1, “ATP binding [GO: 0005524]”; 2, “DNA replication origin binding [GO: 0003688]”; 3, “penicillin binding [GO: 0008658]”; 4, “2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase activity [GO: 0008666]”; 5, “UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity [GO: 0008766]”; 6, “chorismate lyase activity [GO: 0008813]”; 7, “CCR1 chemokine receptor binding [GO: 0031726]”; 8, “methylenetetrahydromethanopterin dehydrogenase activity [GO: 0030268]”; and 9, “nicotinamine synthase activity [GO: 0030410]”.
List of universal GOTMF terms present in all 249 sampled genomes, sorted by nd values (ascending) (modified from [28]).
| GO Id | GO Name | Age ( | Distribution Index ( |
|---|---|---|---|
| GO:0005524 | ATP binding | 0 | 1 |
| GO:0008270 | zinc ion binding | 0.005 | 1 |
| GO:0000287 | magnesium ion binding | 0.009 | 1 |
| GO:0005525 | GTP binding | 0.014 | 1 |
| GO:0004222 | metalloendopeptidase activity | 0.023 | 1 |
| GO:0010181 | FMN binding | 0.028 | 1 |
| GO:0030145 | manganese ion binding | 0.033 | 1 |
| GO:0003924 | GTPase activity | 0.038 | 1 |
| GO:0003887 | DNA-directed DNA polymerase activity | 0.042 | 1 |
| GO:0004252 | serine-type endopeptidase activity | 0.047 | 1 |
| GO:0003746 | translation elongation factor activity | 0.052 | 1 |
| GO:0009982 | pseudouridine synthase activity | 0.056 | 1 |
| GO:0004523 | ribonuclease H activity | 0.103 | 1 |
| GO:0004826 | phenylalanine-tRNA ligase activity | 0.108 | 1 |
| GO:0004821 | histidine-tRNA ligase activity | 0.127 | 1 |
| GO:0004820 | glycine-tRNA ligase activity | 0.127 | 1 |
| GO:0004824 | lysine-tRNA ligase activity | 0.136 | 1 |
| GO:0004831 | tyrosine-tRNA ligase activity | 0.150 | 1 |
| GO:0004618 | phosphoglycerate kinase activity | 0.169 | 1 |
| GO:0004634 | phosphopyruvate hydratase activity | 0.174 | 1 |
| GO:0004749 | ribose phosphate diphosphokinase activity | 0.174 | 1 |
| GO:0003952 | NAD+ synthase (glutamine-hydrolyzing) activity | 0.178 | 1 |
| GO:0004815 | aspartate-tRNA ligase activity | 0.183 | 1 |
| GO:0004807 | triose-phosphate isomerase activity | 0.183 | 1 |
| GO:0004813 | alanine-tRNA ligase activity | 0.188 | 1 |
| GO:0003917 | DNA topoisomerase type I activity | 0.192 | 1 |
List of outlier GOTMF terms in superkingdom taxonomic groups.
| Taxonomic group | GO Id | GO Name | Age ( | Distribution Index ( |
|---|---|---|---|---|
| ABE | GO:0003810 | protein-glutamine gamma-glutamyltransferase activity | 0.97 | 0.06 |
| ABE | GO:0004715 | non-membrane spanning protein tyrosine kinase activity | 1 | 0.18 |
|
| ||||
| AB | GO:0008658 | penicillin binding | 0.08 | 0.76 |
| AB | GO:0015415 | phosphate ion transmembrane-transporting atpase activity | 0.21 | 0.85 |
| AB | GO:0009030 | thiamine-phosphate kinase activity | 0.21 | 0.69 |
| AB | GO:0008966 | phosphoglucosamine mutase activity | 0.22 | 0.76 |
| AB | GO:0015412 | molybdate transmembrane-transporting atpase activity | 0.22 | 0.66 |
| AB | GO:0019134 | glucosamine-1-phosphate N-acetyltransferase activity | 0.23 | 0.66 |
| AB | GO:0008881 | glutamate racemase activity | 0.23 | 0.65 |
| AB | GO:0008763 | UDP-N-acetylmuramate-L-alanine ligase activity | 0.24 | 0.73 |
| AB | GO:0008784 | alanine racemase activity | 0.24 | 0.73 |
| AB | GO:0008760 | UDP-N-acetylglucosamine 1-carboxyvinyltransferase activity | 0.25 | 0.61 |
| AB | GO:0008965 | phosphoenolpyruvate-protein phosphotransferase activity | 0.25 | 0.57 |
| AB | GO:0008984 | protein-glutamate methylesterase activity | 0.25 | 0.59 |
| AB | GO:0000286 | alanine dehydrogenase activity | 0.27 | 0.48 |
| AB | GO:0016960 | ribonucleoside-diphosphate reductase activity, thioredoxin disulfide as acceptor | 0.28 | 0.53 |
| AB | GO:0008855 | exodeoxyribonuclease VII activity | 0.31 | 0.72 |
| AB | GO:0009381 | excinuclease ABC activity | 0.31 | 0.80 |
|
| ||||
| B | GO:0008766 | UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity | 0.24 | 0.73 |
| B | GO:0008961 | phosphatidylglycerol-prolipoprotein diacylglyceryl transferase activity | 0.25 | 0.64 |
| B | GO:0008832 | dGTPase activity | 0.26 | 0.55 |
| B | GO:0009002 | serine-type D-Ala-D-Ala carboxypeptidase activity | 0.31 | 0.60 |
| B | GO:0008882 | [glutamate-ammonia-ligase] adenylyltransferase activity | 0.36 | 0.41 |
| B | GO:0008914 | leucyltransferase activity | 0.36 | 0.45 |
| B | GO:0019146 | arabinose-5-phosphate isomerase activity | 0.38 | 0.31 |
| B | GO:0019143 | 3-deoxy-manno-octulosonate-8-phosphatase activity | 0.38 | 0.33 |
| B | GO:0004456 | phosphogluconate dehydratase activity | 0.38 | 0.23 |
| B | GO:0008693 | 3-hydroxydecanoyl-[acyl-carrier-protein] dehydratase activity | 0.38 | 0.22 |
| B | GO:0008918 | lipopolysaccharide 3-alpha-galactosyltransferase activity | 0.66 | 0.01 |
| B | GO:0030733 | fatty acid O-methyltransferase activity | 0.66 | 0.00 |
|
| ||||
| AE | GO:0004579 | dolichyl-diphosphooligosaccharide-protein glycotransferase activity | 0.77 | 0.10 |
| AE | GO:0004965 | G-protein coupled GABA receptor activity | 0.93 | 0.05 |
Figure 4Scatter plots displaying the distribution of GOTMF terms with respect to evolutionary time (nd) in Archaea (a), Bacteria (b), and Eukarya (c).
Figure 5The tripartite division of the cellular world. (a) A ToL (tree length = 87,892) generated from the genomic census of GOTMF terms in 249 free-living genomes resolves the three primary superkingdoms. Archaeal species (red) occupy the most basal positions in a paraphyletic manner, while monophyletic Bacteria (blue) and Eukarya (green) are evolutionarily derived. Numbers on branches indicate bootstrap support values. (b) A 3D-scatter plot dissects organisms into three superkingdoms: Archaea, Bacteria, and Eukarya. Genomes are labeled as in (a).