| Literature DB >> 24499935 |
Arshan Nasir1, Kyung Mo Kim2, Gustavo Caetano-Anollés1.
Abstract
Domains are modules within proteins that can fold and function independently and are evolutionarily conserved. Here we compared the usage and distribution of protein domain families in the free-living proteomes of Archaea, Bacteria and Eukarya and reconstructed species phylogenies while tracing the history of domain emergence and loss in proteomes. We show that both gains and losses of domains occurred frequently during proteome evolution. The rate of domain discovery increased approximately linearly in evolutionary time. Remarkably, gains generally outnumbered losses and the gain-to-loss ratios were much higher in akaryotes compared to eukaryotes. Functional annotations of domain families revealed that both Archaea and Bacteria gained and lost metabolic capabilities during the course of evolution while Eukarya acquired a number of diverse molecular functions including those involved in extracellular processes, immunological mechanisms, and cell regulation. Results also highlighted significant contemporary sharing of informational enzymes between Archaea and Eukarya and metabolic enzymes between Bacteria and Eukarya. Finally, the analysis provided useful insights into the evolution of species. The archaeal superkingdom appeared first in evolution by gradual loss of ancestral domains, bacterial lineages were the first to gain superkingdom-specific domains, and eukaryotes (likely) originated when an expanding proto-eukaryotic stem lineage gained organelles through endosymbiosis of already diversified bacterial lineages. The evolutionary dynamics of domain families in proteomes and the increasing number of domain gains is predicted to redefine the persistence strategies of organisms in superkingdoms, influence the make up of molecular functions, and enhance organismal complexity by the generation of new domain architectures. This dynamics highlights ongoing secondary evolutionary adaptations in akaryotic microbes, especially Archaea.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24499935 PMCID: PMC3907288 DOI: 10.1371/journal.pcbi.1003452
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Evolutionary dynamics of FFs and organismal persistence strategies.
A) A Venn diagram describes the distribution of FFs in the seven taxonomic groups (reproduced from [27]) B) Boxplots represent the distributions of domain ages (nd) for each taxonomic group. Numbers within each distribution indicate group medians, hollow circles the outliers, while the shaded regions identify important evolutionary epochs. Geological time (Gy) was inferred from a molecular clock of protein folds [51], [52]. FFs were identified by SCOP css: c.37.1.12, ABC transporter ATPase domain-like; d.122.1.1, Heat shock protein 90, HSP 90, N-terminal domain; c.116.1.4, tRNA(m1G37)-methyltransferase TrmD; g.3.10.1, Colipase-like; a.4.5.41, Transcription factor E/IIe-alpha, N-terminal domain. C) Boxplots represent the distribution index (f-value) of FF domains for each taxonomic group. Numbers within each distribution indicate group medians. Hollow circles represent outliers. D) A 3D scatter plot describes the persistence strategies of Archaea (red), Bacteria (blue), and Eukarya (green). All axes are in logarithmic scale. Numbers in parenthesis indicate total number of proteomes available for study in each superkingdom.
Descriptive statistics on the total number of proteomes (N), minimum (min), maximum (max) and median values for raw counts of occurrence, abundance and ratio of FFs in each superkingdom.
| Occurrence | Abundance | Ratio | ||||||||
| Superkingdom | ||||||||||
| Archaea | 48 | 174 | 293 | 236 | 264 | 598 | 377.5 | 1.46 | 2.10 | 1.64 |
| Bacteria | 239 | 239 | 824 | 426 | 376 | 1958 | 883 | 1.52 | 3.40 | 1.98 |
| Eukarya | 133 | 364 | 1089 | 674 | 982 | 19917 | 2875 | 2.24 | 20.41 | 4.04 |
The superscripts identify individual species.
Staphylothermus marinus,
Methanosarcina acetovirans,
Thermoplasma volcanium,
Haloarcula marismortui,
Dehalococcoides sp.,
Citrobacter koseri,
Burkholderia xenovorans,
Nitratiruptor sp.,
Rhodococcus sp.,
Paramecium tetraurelia,
Homo sapiens,
Malassezia globosa,
Takifugu rubripes.
Figure 2Functional annotation of FF domains.
A) Stacked bar plots describe the distribution of molecular functions in each of the seven taxonomic groups. The size of each bar is proportional to the percentage of FF domains in each functional category, while the numbers indicate total counts of FFs annotated in that category. B–H) Scatter plots illustrate the emergence of molecular functions in taxonomic groups. The x-axes represents evolutionary time (nd), while the y-axes indicate the distribution index (f-value) of FFs. Evolutionary epochs identified as previously. Numbers in parenthesis indicate total number of FF domains in each taxonomic group for which SUPERFAMILY functional annotations (based on SCOP 1.73) were available.
Names, SCOP css, and f-value of informational FF domains present in the AE taxonomic group. FFs are sorted by f-value in a descending manner.
| No. | Name | SCOP | Distribution index ( |
| 1 | L30e/L7ae ribosomal proteins | d.79.3.1 | 0.99 |
| 2 | Ribosomal protein L3 | b.43.3.2 | 0.99 |
| 3 | L15e family | d.12.1.2 | 0.97 |
| 4 | Ribosomal protein L10e family | d.41.4.1 | 0.92 |
| 5 | TATA-box binding protein (TBP), C-terminal domain family | d.129.1.1 | 0.86 |
| 6 | N-terminal domain of eukaryotic peptide chain release factor subunit 1, ERF1 family | d.91.1.1 | 0.80 |
| 7 | DNA polymerase processivity factor | d.131.1.2 | 0.77 |
| 8 | Sm motif of small nuclear ribonucleoproteins, SNRNP family | b.38.1.1 | 0.76 |
| 9 | Eukaryotic DNA topoisomerase I, N-terminal DNA-binding fragment family | e.15.1.1 | 0.71 |
| 10 | Eukaryotic DNA topoisomerase I, catalytic core family | d.163.1.2 | 0.71 |
| 11 | eEF-1beta-like family | d.58.12.1 | 0.64 |
| 12 | Eukaryotic type KH-domain (KH-domain type I) family | d.51.1.1 | 0.56 |
| 13 | RNA polymerase subunit RPB10 family | a.4.11.1 | 0.55 |
| 14 | RPB5 family | d.78.1.1 | 0.55 |
| 15 | Ribosomal protein L19 (L19e) family | a.94.1.1 | 0.38 |
| 16 | Ribosomal protein L13 family | c.21.1.1 | 0.31 |
| 17 | DNA replication initiator (cdc21/cdc54) N-terminal domain family | b.40.4.11 | 0.27 |
| 18 | Initiation factor IF2/eIF5B, domain 3 family | c.20.1.1 | 0.27 |
| 19 | AlaX-like family | d.67.1.2 | 0.04 |
| 20 | VMA1-derived endonuclease (VDE) PI-SceI protein | d.95.2.2 | 0.02 |
Names, SCOP Id and css, and evolutionary age (nd) of FFs that were identified by keyword search ‘Mitochondria’ on the dataset of 2,397 FF domains.
| SCOP Id | SCOP | Description | Age ( |
| 69533 | c.55.3.7 | Mitochondrial resolvase ydc2 catalytic domain | 0.55 |
| 81422 | f.23.5.1 | Mitochondrial cytochrome c oxidase subunit VIIb | 0.59 |
| 81426 | f.23.6.1 | Mitochondrial cytochrome c oxidase subunit VIIc (aka VIIIa) | 0.59 |
| 81418 | f.23.4.1 | Mitochondrial cytochrome c oxidase subunit VIIa | 0.63 |
| 111358 | f.45.1.1 | Mitochondrial ATP synthase coupling factor 6 | 0.64 |
| 81414 | f.23.3.1 | Mitochondrial cytochrome c oxidase subunit Vic | 0.65 |
| 54530 | d.25.1.1 | Mitochondrial glycoprotein MAM33-like | 0.71 |
| 81410 | f.23.2.1 | Mitochondrial cytochrome c oxidase subunit Via | 0.71 |
| 81405 | f.23.1.1 | Mitochondrial cytochrome c oxidase subunit IV | 0.73 |
| 47158 | a.23.4.1 | Mitochondrial import receptor subunit Tom20 | 0.74 |
| 103507 | f.42.1.1 | Mitochondrial carrier | 0.96 |
FFs are sorted by nd value in an ascending manner.
Figure 3Phylogenomic patterns in the three superkingdoms.
A) A ToL reconstructed from the genomic abundance counts of 2,397 FF domains (2,262 parsimony informative, tree length = 128,752, RI = 0.76, g = −0.33) describing the evolution of 420 free-living organisms. Values on branches indicate bootstrap support values. Taxa were colored red for Archaea, blue for Bacteria and green for Eukarya. B) A ToL reconstructed from the presence/absence of 2,397 FF domains (2,249 parsimony informative, tree length = 30,599, RI = 0.79, g = −0.28) describing the evolution of 420 free-living organisms. Values on branches indicate bootstrap support values. Taxa are colored as in A. Difference between trees was calculated using the nodal module of TOPD/FMTS package [50].
Figure 4Global patterns of gains and losses in superkingdoms.
A) Sum of gains and losses for each FF domain is represented in boxplots for Total, Archaea, Bacteria, and Eukarya reconstructions using abundance and occurrence models. Numbers in parentheses indicate total number of parsimony informative characters in each analysis. A horizontal red line passes through zero on the x-axis. B) Histograms comparing the relative counts of gains and losses for each FF domain character, plotted on the nd scale. Bars in red and blue indicate gains and losses respectively. The global gain-to-loss ratios are listed along with the total number of gain and loss events and gain-to-loss ratios. n is the number of parsimony informative characters in each analysis. C) Histograms comparing the distribution of FF gains and losses in Archaea, Bacteria and Eukarya. Bars in red and blue indicate gains and losses respectively. The x-axes indicates evolutionary time. Numbers in parenthesis indicate total number of proteomes in each dataset.
Figure 5Cumulative numbers of gains and losses.
Scatter plots reveal an approximately linear trend in the accumulation of FF gains and losses in both the global analysis (A) and in individual superkingdoms (B). Gains are identified in red while losses in blue. The three evolutionary epochs are marked with corresponding gain-to-loss ratios in italics.
Figure 6Equal sampling of proteomes.
Boxplots comparing the distribution of net gains and losses in 100 random phylogenetic trees for both abundance (A) and occurrence (B). Numbers in parentheses indicate group median values.
GO accessions, names and P-values for highly-specific biological processes that were significantly associated (FDR<0.01) with FF gains in Archaea, Bacteria, and Eukarya.
| Superkingdom | No. | GO accession | Biological processes | |
| Archaea | 1 | GO:0006099 | tricarboxylic acid cycle | 5.38E-06 |
| 2 | GO:0006090 | pyruvate metabolic process | 2.80E-05 | |
| 3 | GO:0006637 | acyl-CoA metabolic process | 4.01E-05 | |
| 4 | GO:0035384 | thioester biosynthetic process | 3.32E-04 | |
| 5 | GO:0006144 | purine nucleobase metabolic process | 5.71E-04 | |
| 6 | GO:0006213 | pyrimidine nucleoside metabolic process | 6.38E-04 | |
| Bacteria | 1 | GO:0000272 | polysaccharide catabolic process | 1.26E-04 |
| Eukarya | 1 | GO:0045995 | regulation of embryonic development | 1.44E-06 |
| 2 | GO:0051588 | regulation of neurotransmitter transport | 3.35E-06 | |
| 3 | GO:0001707 | mesoderm formation | 7.48E-06 | |
| 4 | GO:0001649 | osteoblast differentiation | 1.29E-05 | |
| 5 | GO:0050870 | positive regulation of T cell activation | 3.45E-05 | |
| 6 | GO:0030336 | negative regulation of cell migration | 8.88E-05 | |
| 7 | GO:0048017 | inositol lipid-mediated signaling | 1.05E-04 | |
| 8 | GO:0000165 | MAPK cascade | 1.16E-04 | |
| 9 | GO:0051291 | protein heterooligomerization | 1.21E-04 | |
| 10 | GO:0046620 | regulation of organ growth | 2.43E-04 | |
| 11 | GO:0051099 | positive regulation of binding | 3.00E-04 | |
| 12 | GO:0043627 | response to estrogen stimulus | 3.00E-04 | |
| 13 | GO:0051216 | cartilage development | 2.96E-04 | |
| 14 | GO:0061180 | mammary gland epithelium development | 2.96E-04 | |
| 15 | GO:0030856 | regulation of epithelial cell differentiation | 3.02E-04 | |
| 16 | GO:0051703 | intraspecies interaction between organisms | 4.13E-04 | |
| 17 | GO:0032496 | response to lipopolysaccharide | 4.07E-04 | |
| 18 | GO:0032946 | positive regulation of mononuclear cell proliferation | 5.10E-04 | |
| 19 | GO:0032869 | cellular response to insulin stimulus | 5.10E-04 | |
| 20 | GO:0045580 | regulation of T cell differentiation | 6.59E-04 | |
| 21 | GO:0060191 | regulation of lipase activity | 6.59E-04 | |
| 22 | GO:0045834 | positive regulation of lipid metabolic process | 6.59E-04 | |
| 23 | GO:0050673 | epithelial cell proliferation | 6.59E-04 | |
| 24 | GO:0021761 | limbic system development | 8.39E-04 | |
| 25 | GO:0046634 | regulation of alpha-beta T cell activation | 8.39E-04 | |
| 26 | GO:0045667 | regulation of osteoblast differentiation | 8.39E-04 | |
| 27 | GO:0007492 | endoderm development | 8.39E-04 | |
| 28 | GO:0044089 | positive regulation of cellular component biogenesis | 1.04E-03 | |
| 29 | GO:0007530 | sex determination | 1.04E-03 | |
| 30 | GO:0045598 | regulation of fat cell differentiation | 1.04E-03 | |
| 31 | GO:0051057 | positive regulation of small GTPase mediated signal transduction | 1.25E-03 | |
| 32 | GO:0048749 | compound eye development | 1.31E-03 | |
| 33 | GO:0050773 | regulation of dendrite development | 1.31E-03 | |
| 34 | GO:0060443 | mammary gland morphogenesis | 1.31E-03 | |
| 35 | GO:2001236 | regulation of extrinsic apoptotic signaling pathway | 1.31E-03 | |
| 36 | GO:0016055 | Wnt receptor signaling pathway | 1.31E-03 | |
| 37 | GO:0046488 | phosphatidylinositol metabolic process | 1.31E-03 |
GO accessions, names and P-values for highly-specific biological processes that were significantly associated (FDR<0.01) with FF loss in Bacteria.
| Superkingdom | No. | GO accession | Biological processes | |
| Bacteria | 1 | GO:0042398 | cellular modified amino acid biosynthetic process | 3.10E-04 |
| 2 | GO:0072528 | pyrimidine-containing compound biosynthetic process | 3.10E-04 |
No significant biological process was lost in either Archaea or Eukarya.