| Literature DB >> 25309572 |
Gustavo Caetano-Anollés1, Jay E Mittenthal2, Derek Caetano-Anollés1, Kyung Mo Kim3.
Abstract
Time-calibrated phylogenomic trees of protein domain structure produce powerful chronologies describing the evolution of biochemistry and life. These timetrees are built from a genomic census of millions of encoded proteins using models of nested accumulation of molecules in evolving proteomes. Here we show that a primordial stem line of descent, a propagating series of pluripotent cellular entities, populates the deeper branches of the timetrees. The stem line produced for the first time cellular grades ~2.9 billion years (Gy)-ago, which slowly turned into lineages of superkingdom Archaea. Prompted by the rise of planetary oxygen and aerobic metabolism, the stem line also produced bacterial and eukaryal lineages. Superkingdom-specific domain repertoires emerged ~2.1 Gy-ago delimiting fully diversified Bacteria. Repertoires specific to Eukarya and Archaea appeared 300 millions years later. Results reconcile reductive evolutionary processes leading to the early emergence of Archaea to superkingdom-specific innovations compatible with a tree of life rooted in Bacteria.Entities:
Keywords: molecular clock; phylogenetic analysis; protein evolution; protein folds; structure
Year: 2014 PMID: 25309572 PMCID: PMC4161044 DOI: 10.3389/fgene.2014.00306
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1A timetree of protein domain structures describes molecular history within the context of the geological record. A census of protein domain structure in the proteomes of 420 free-living organisms representing the three superkingdoms of life was conducted at the FSF level of structural abstraction in SCOP (Kim and Caetano-Anollés, 2011). The color array of the evolutionary heat map in the center describes the distribution of genomic abundances of 1420 FSFs in the 420 organisms that were surveyed. Gray cells imply an abundance of 0 (the absence of the domain structure altogether). Red-to-blue hues represent increasing abundance levels, from 1 to N = 15,112 counts of a same FSF structure. Abundance values in the array were coded as discrete phylogenetic characters using an alphanumeric scheme 0–9 and A–N and arranged in transposable data matrices for phylogenetic analysis. Characters transform according to linearly ordered and reversible pathways. Maximum parsimony was used as the optimality criteria to generate a ToL (left of matrix) and ToD (below matrix) using a combined parsimony ratchet and iterative search approach. These trees were used to order rows and columns in the heat map matrix. The ages of FSFs are time-calibrated with a global molecular clock of fold structures that spans 3.8 billion years (Gy) of history and associates diagnostic domain structures with multiple geological ages derived from the study of fossils and microfossils, geochemical, biochemical, and biomarker data (colored circles: red, biochemistries and lineages; orange, organismal diversification; blue, nitrogen assimilation and other biomarkers; black, boundary events). Interpolations of crucial biochemical developments are indicated in the timeline (Kim and Caetano-Anollés, 2011). Below the heat map are evolutionary mappings of FSF sets belonging to Venn distribution groups of domains unique (A, B, E), shared (BE, AB and AE) or ubiquitous (ABC) among superkingdoms. The Venn diagram shows a significant number of shared FSFs. A tree of superkingdoms inferred from Venn group appearance in the timeline is overlapped onto the heat matrix, and depicts a possible stem-line of descent in yellow. We note that the timetree and molecular timelines that are shown benefit from standard molecular evolutionary techniques (e.g., phylogenies of sequences, physiologies, and morphology), inorganic and organic geochemistry (e.g., distributions of trace elements in shales or banded iron formations or concentrations of organic compounds like steroids that are diagnostic of certain taxonomies), micropaleontology and paleontology (the distribution of physical fossils, with morphology providing evidence for the presence of specific organisms), and other sources of history.