Literature DB >> 24188869

Integration of two ancestral chaperone systems into one: the evolution of eukaryotic molecular chaperones in light of eukaryogenesis.

David Bogumil¹, David Alvarez-Ponce, Giddy Landan, James O McInerney, Tal Dagan.

Abstract

Eukaryotic genomes are mosaics of genes acquired from their prokaryotic ancestors, the eubacterial endosymbiont that gave rise to the mitochondrion and its archaebacterial host. Genomic footprints of the prokaryotic merger at the origin of eukaryotes are still discernable in eukaryotic genomes, where gene expression and function correlate with their prokaryotic ancestry. Molecular chaperones are essential in all domains of life as they assist the functional folding of their substrate proteins and protect the cell against the cytotoxic effects of protein misfolding. Eubacteria and archaebacteria code for slightly different chaperones, comprising distinct protein folding pathways. Here we study the evolution of the eukaryotic protein folding pathways following the endosymbiosis event. A phylogenetic analysis of all 64 chaperones encoded in the Saccharomyces cerevisiae genome revealed 25 chaperones of eubacterial ancestry, 11 of archaebacterial ancestry, 10 of ambiguous prokaryotic ancestry, and 18 that may represent eukaryotic innovations. Several chaperone families (e.g., Hsp90 and Prefoldin) trace their ancestry to only one prokaryote group, while others, such as Hsp40 and Hsp70, are of mixed ancestry, with members contributed from both prokaryotic ancestors. Analysis of the yeast chaperone-substrate interaction network revealed no preference for interaction between chaperones and substrates of the same origin. Our results suggest that the archaebacterial and eubacterial protein folding pathways have been reorganized and integrated into the present eukaryotic pathway. The highly integrated chaperone system of yeast is a manifestation of the central role of chaperone-mediated folding in maintaining cellular fitness. Most likely, both archaebacterial and eubacterial chaperone systems were essential at the very early stages of eukaryogenesis, and the retention of both may have offered new opportunities for expanding the scope of chaperone-mediated folding.

Entities: Chemical

Keywords: molecular chaperones; origin of eukaryotes; protein evolution

Mesh：

Substances：

Year: 2013 PMID： 24188869 PMCID： PMC3907059 DOI： 10.1093/molbev/mst212

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

The symbiogenic model for the origin of eukaryotes posits that eukaryotes arose via a symbiotic association of two distantly related prokaryotes (Sagan 1967; Rivera and Lake 2004; Embley and Martin 2006; Pisani et al. 2007; Lane 2009; Alvarez-Ponce et al. 2013). Opinions about the precise taxonomic classification and metabolic capacities of the prokaryote involved are still divided, however there is a wide agreement among scientists that the host was an archaebacterium (Martin and Muller 1998; Cox et al. 2008; Williams et al. 2012) and the endosymbiont was an alpha-proteobacterium (Gray et al. 1999; Gabaldón and Huynen 2003; Esser et al. 2004). The eubacterial endosymbiont subsequently evolved into the mitochondrion organelle, a process that was accompanied by a massive DNA transfer from the symbiont into the host genome, the evolution of a mitochondrial protein import apparatus, a drastic miniaturization of the mitochondrial genome, and an increased complexity of the nuclear genome (Martin and Herrmann 1998; Martin 2003; Timmis et al. 2004). Phylogenomic studies show, accordingly, that eukaryotic genomes are a mosaic of genes of eubacterial and archaebacterial ancestry (Esser et al. 2004; Pisani et al. 2007; Thiergart et al. 2012; Alvarez-Ponce et al. 2013). Evolutionary analysis of genes in the model eukaryote Saccharomyces cerevisiae reveals that about 37% of the genes can be traced back to either an archaebacterial or a eubacterial ancestor (Cotton and McInerney 2010). Thus, eukaryotic innovations probably account for a sizeable fraction of eukaryotic genomes. Yet, the proportion of eukaryotic genes of demonstrable prokaryotic origin is quite substantial considering the complications involved in this kind of analysis. The long divergence time elapsed since the symbiotic event limits our ability to detect prokaryotic homologs to some prokaryote-derived proteins and reduces the accuracy of phylogenetic inference for others. Furthermore, lateral gene transfer events between the eubacterial and archaebacterial lineages (e.g., Deppenmeier et al. 2002; Large and Lund 2009; Williams et al. 2010; Nelson-Sathi et al. 2012) may have obscured the genetic record of the symbiosis event, leading to an ambiguous classification of eukaryotic genes. The chimerical origin of eukaryotic genomes is imprinted in the functional role of proteins within the cell. Many proteins that perform an informational function (e.g., replication, transcription, and translation) are of archaebacterial origin while many genes of eubacterial origin perform operational functions (e.g., metabolism, amino acid synthesis, and regulatory genes) (Rivera et al. 1998; Esser et al. 2004; Cox et al. 2008; Cotton and McInerney 2010; Alvarez-Ponce and McInerney 2011; Alvarez-Ponce et al. 2013). Eukaryotic genes of archaebacterial origin are more essential regardless of the bias towards informational functions (Cotton and McInerney 2010; Alvarez-Ponce and McInerney 2011). Furthermore, the eukaryotic protein–protein interaction network still bears the markings of a chimerical ancestry, with proteins from the same origin—archaebacterial or eubacterial—being interconnected at a frequency that is significantly above the expected by chance (Alvarez-Ponce and McInerney 2011). Thus, when considered as a whole, the eukaryotic proteome can be described as a partially integrated version of two ancestral ingredients. In this study, we have set forth to examine the evolution of the eukaryotic protein folding pathway in light of the symbiogenic model. Molecular chaperones are proteins that assist the folding and unfolding of other proteins, as well as the complex assembly and stabilization of protein and nucleic acids interactions (Hartl and Hayer-Hartl 2009; Large et al. 2009). Chaperones often function in assembly-line-like pathways where various chaperones interact consecutively with the same substrate driving the transition of the newly synthetized peptide into a functional protein (Young et al. 2004). Chaperones are essential in all living organisms and have been shown to play a role as capacitors of phenotypic variation (Rutherford and Lindquist 1998; Queitsch et al. 2002) and drivers of increased fitness within organisms facing a high mutational load (Fares et al. 2002; Maisnier-Patin et al. 2005). Furthermore, their function as biochemical mediators of protein assembly played an important role in shaping genomic landscapes (Bogumil and Dagan 2010; Williams and Fares 2010; Bogumil et al. 2012). The utility of molecular chaperones is thought to be constrained by a delicate balance between their help in mitigating the effects of protein misfolding and the slower rate of protein production and maturation of their substrate (Bogumil and Dagan 2012). Archaebacteria and eubacteria harbor slightly different repertoires of chaperone families. The Hsp40 and Hsp70 chaperone families are present in both domains (Macario et al. 1991; Macario et al. 1993), whereas other chaperone systems, such as chaperonins, differ in their composition and assembly. Here we study the extent to which the chimeric origin of eukaryotes is detectable in the eukaryotic protein folding pathway of contemporary genomes. We infer the ancestry of yeast chaperones and their substrates, examine the yeast chaperone repertoire, and use a network approach to study the relationship between chaperones and their substrates in light of their origin.

Results

Prokaryotic Ancestry of S. cerevisiae Proteins

To determine the prokaryotic origin of yeast proteins, we searched for their prokaryotic homologs among 82 archaebacterial and 1,074 eubacterial genomes. A total of 1,230 yeast proteins had detectable homologs in one or more prokaryotic genomes. The remaining proteins did not manifest detectable homology with prokaryotic proteins, and we therefore consider them to be eukaryotic innovations. A total of 689 phylogenetic trees were reconstructed for yeast proteins having more than three homologs belonging to both archaebacteria and eubacteria. Yeast proteins were classified according to the prokaryotic domain within which they branch. Our analysis revealed 289 proteins of archaebacterial ancestry, 803 of eubacterial ancestry, and 138 of an unresolved prokaryotic ancestry. All phylogenetic trees are provided in supplementary tables S1 and S2, Supplementary Material online.

The Mosaic Structure of the S. cerevisiae Chaperone Repertoire

Of the 64 known yeast molecular chaperones, 46 had homologs in prokaryotic genomes. These were classified based on their tree topology into 11 chaperones of archaebacterial ancestry and 25 chaperones of eubacterial ancestry. The ancestry of the remaining ten chaperones could not be resolved from the data (fig. 1). The Hsp90 family in yeast includes two paralogs whose sequences are highly similar (96% identity at the amino acid level). Both paralogs are homologous to eubacterial htpG sequences exclusively, and hence the yeast Hsp90 is clearly of eubacterial origin. The prefoldin (PFD) chaperones transfer target proteins to the chaperone-containing T-complex polypeptide 1 (CCT) system for further folding (Vainberg et al. 1998). The yeast genome encodes six PFD paralogs whose protein sequences are 15.2 ± 3.8% identical. Three of the six PFDs have homologs in prokaryotic genomes, all of which are archaebacterial. The remaining three paralogs had no detectable homologs in prokaryotic genomes applying the sequence similarity threshold used in this study (>25% identical amino acids). This indicates that PFD is an archaebacterial contribution to eukaryotic genomes, and the family further diversified within eukaryotes. All five small heat shock proteins (sHsp) were inferred to be of eubacterial ancestry. Hsp26 is homologous to eubacterial sequences only, and the four paralogous genes Hsp31, Hsp32, Hsp33, and Sno4 clearly branch within the eubacterial clade, although homologs in halophilic and methanogenic archaebacteria were found as well. Members of the Hsp100 chaperone family (Clp) play a role in protein disaggregation (Parsell et al. 1994). Of the three Hsp100 proteins in yeast, one is localized in the mitochondria and two are cytosolic (van Dyck et al. 1998). The mitochondrial Clp protein Mcx1 was inferred to be of eubacterial origin. The cytosolic Hsp104 was inferred to have been derived from an archaebacterial AAA+ ATPase, while the second cytosolic Hsp78 is of ambiguous ancestry. The Hsp40 and Hsp70 families include chaperones with eubacterial as well as archaebacterial ancestry, although the majority of chaperones from these particular families are of eubacterial descent.

Fig. 1.

Yeast chaperones and their reconstructed ancestries. Archaebacterial ancestry is shown in red and eubacterial ancestry in blue. Chaperones with ambiguous ancestry or no homology to prokaryotic proteins are colored in purple and gray, respectively. Here we use the same structural model for all members of the same family; Note that paralogs may deviate in their protein structures. Molecule plots were generated using the PyMOL Molecular Graphics System, version 1.5.0.4 (Schrödinger, LLC). Eukaryotic genomes typically encode two chaperonin systems: the type I mitochondrial Hsp60/Hsp10 system (GroEL/ES-like) and the type II chaperonin (CCT-like). The type I chaperonin system is usually viewed as a eubacterial set of chaperones; however, it is also encoded in the genomes of several methanogenic and halophilic archaebacteria (e.g., Deppenmeier et al. 2002). The yeast Hsp60 branched in between a purely archaebacterial clade and a purely eubacterial clade. Consequently, it was classified as of ambiguous prokaryotic ancestry. The cochaperone Hsp10 is clearly of eubacterial origin. This classification fits well with its localization in the mitochondrion. The type II eukaryotic chaperonins comprise eight different protein subunits (Archibald et al. 1999; Valpuesta et al. 2002). These chaperones are usually viewed as archaebacterial; however, several Clostridia species encode type II chaperonins as well (Techtmann and Robb 2010; Williams et al. 2010). An archaebacterial ancestry was inferred for Tcp1 and a eubacterial origin was inferred for Cct4 and Cct8. The other five CCT genes were classified as ambiguous as they branch between clostridial and archaebacterial homologs.

Connectivity in the Chaperone Interaction Network and Protein Ancestry

The chaperone–substrate interaction (CSI) network is based on a large-scale screening for proteins that interact with 64 chaperones encoded in Saccharomyces cerevisae (Gong et al. 2009). The CSI network contains 4,340 substrate proteins that interact with at least one chaperone and a total of 21,428 CSIs. Interactions in the CSI network are unweighted and do not reflect their relative prevalence. We reduced the data set to include only those chaperones and substrates for which prokaryotic ancestry could be determined. This network contained 36 chaperones and 790 substrates. A total of 3,058 interactions included in the network were classified into four classes based on the ancestry of both the chaperones and substrates (inset in fig. 2).

Fig. 2.

Prokaryotic origin and connectivity distribution. Asterisks indicate the observed percentage of edges in the network, and bars show the mean expected frequency from randomization simulations. Lines indicate the 1–99 percentile range. Abbreviations: A, archaebacterial, B, eubacterial; uppercase indicates chaperones and lowercase indicates substrates. The network connectivity pattern is not biased toward a higher number of interactions between chaperones and substrates of the same ancestry (χ2 test; P = 0.52, inset in fig. 2). This type of network data may sometimes be biased by nodes having extreme connectivity or gene expression levels. To guard against such a possibility, we classified both chaperones and substrates into high/low categories according to the following properties: network connectivity degree, mRNA expression, and protein expression. We repeated the analysis with subsets of the network defined by these contrasts and observed the same pattern as in the full network, indicating that the result is robust (see supplementary table S3, Supplementary Material online). Moreover, this conclusion still holds when considering only substrates that interact with at least two chaperones or more (χ2 test; P = 0.49). Although the mean connectivity degree of substrates of archaebacterial ancestry (5.33) is higher than that of substrates of eubacterial ancestry (4.81), this difference is not statistically significant (Wilcoxon rank-sum test, P = 0.07). To further test for possible biases in the network connectivity pattern, we examined the ratio of eubacterial to archaebacterial interaction partners for each chaperone and substrate, and tested for differences in the distributions of ratios in the two ancestry groups. We found no significant difference in the distributions of the chaperone ancestry ratio between archaeal and eubacterial substrates (Wilcoxon rank-sum test, P = 0.62), and no significant difference in the distributions of the substrate ancestry ratio between archaeal and eubacterial chaperones (Wilcoxon rank-sum test, P = 0.18). We further tested whether any of the four chaperone–substrate ancestry combinations is enriched in the network by conducting a network randomization test with 10,000 randomization replicates (fig. 2). This analysis shows that none of the four interaction types is found at a frequency that is significantly different from the random expectation (at a false discovery rate [FDR] of 0.01).

Protein Ancestry and Protein Function

Substrates in the network were further classified into two major functional categories according to their annotation in the Gene Ontology database (GO, Ashburner et al. 2000). Substrates whose annotation includes the terms “translation,” “transcription,” “DNA-dependent DNA replication,” or their subterms were classified as proteins performing an informational function. The remaining substrates were classified as operational proteins (Rivera et al. 1998; Cotton and McInerney 2010). Combining the functional classification with prokaryotic ancestry reconstruction revealed that 59% of the 216 archaebacterial substrates and 15% of the 528 eubacterial substrates found in GO perform informational functions. Hence, substrates of archaebacterial origin are enriched for informational functions (P < 10−16, using χ2 test), confirming the known correlations between prokaryotic ancestry and protein function (Esser et al. 2004; Cotton and McInerney 2010; Alvarez-Ponce and McInerney 2011; Alvarez-Ponce et al. 2013). In addition, we found that informational substrates interact with a larger number of different chaperones than operational substrates (Wilcoxon rank-sum test, P < 10−16).

Prokaryotic Ancestry and Protein Physicochemical Properties

A comparison of protein physicochemical properties between the two ancestry groups revealed several significant differences. The differences are manifest in proteins that interact with chaperones as well as in proteins that are chaperon independent. Interestingly, the differences observed in chaperone independent proteins are significantly larger than those in the chaperone substrates (fig. 3).

Fig. 3.

Differences in protein physicochemical properties between proteins of eubacterial and archaebacterial origin. Enrichment in proteins of eubacterial origin is on the left and shown as blue shades and that of proteins of archaebacterial origin on the right and shown as red shades. Chaperone substrates are in dark shades and proteins not connected to chaperones are in light shades. Asterisks denote statistical significance (Kolmogorov–Smirnov tests); * denotes 5% FDR and ** 1% FDR; Asterisks to the left of slash refers to tests contrasting protein ancestries and asterisks to the right of slash refers to tests contrasting substrates with chaperone-independent proteins. Bar lengths indicate the enrichment ratio in log 10 scale. Eubacterial substrates were found to be longer on average, in agreement with previous studies (Alvarez-Ponce and McInerney 2011). In addition, eubacterial substrates are also enriched in hydrophobic and aromatic amino acids in comparison to archaebacterial substrates. Archaebacterial substrates are more conserved, more highly expressed, and are encoded by higher proportions of preferred codons than eubacterial substrates (fig. 3). Biases in the three latter properties fit well with the known correlation among evolutionary rates, expression level, and codon usage bias (Grantham et al. 1981; Sharp and Li 1987; Pál et al. 2001; Drummond et al. 2005; Pál et al. 2006). In addition, substrates of archaebacterial origin were enriched for positively charged amino acids as well as arginine, lysine, and valine. On the other hand, substrates of eubacterial origin are significantly enriched in cysteine, histidine, isoleucine, leucine, phenyl-alanine, proline, serine, and tryptophane (fig. 3). Most of the above differences in substrate physicochemical properties are observed when contrasting informational and operational proteins, as expected from the congruence between the ancestral and functional classifications.

Discussion

Our evolutionary reconstruction of the ancestry of chaperones involved in the yeast protein, folding pathway reveals that chaperones of different descent are used in a coordinated fashion to fold common substrates. For example, the Hsp40/Hsp70 system in yeast comprises a total of 21 Hsp40 and 14 Hsp70 genes from diverse origins including archaebacterial, eubacterial, and eukaryotic-specific proteins (ESPs). Interestingly, the Hsp40 family, with 11 ESPs, has diversified within eukaryotes to a larger extent in comparison to the Hsp70 family that includes only one ESP. The difference between the two families can be explained by their mode of function. Chaperones of the Hsp40 family are the drivers of Hsp70 substrate activity and specificity (Cyr and Douglas 1994; Kampinga and Craig 2010). Thus, the diversification of Hsp40 family within eukaryotes probably enabled the whole Hsp40–Hsp70 system to increase its operational potential. A mosaic of ancestries is observed in all chaperone families that are present in both archaebacteria and eubacteria. It is noteworthy that in contrast to cytosolic chaperones, yeast chaperones that are localized in the mitochondria are an exception. All mitochondrial chaperones that could be classified by their tree topology are inferred to be of eubacterial ancestry, underlining the role of the mitochondrion as a functional eubacterial unit within the eukaryotic cell (Esser et al. 2004). Previous studies showed that there is a significant preference for proteins to interact with partners of the same ancestry rather than across the archaebacterial–eubacterial divide (Alvarez-Ponce and McInerney 2011). Such preference can be expected if the proteins participating in specific cellular pathways are usually of a single ancestry. Because protein connectivity is higher within pathways than across pathways, common ancestry of pathway proteins will result in an overall trend for same ancestry interactions. Thus, same ancestry preference, while demonstrable on average, may still be violated when considering specific systems. Our results suggest that the general trend does not hold for the CSI network, where no preference for interaction of chaperones and substrates of the same ancestry could be observed. This indicates that the protein folding pathways have been reorganized and integrated to a larger extent in comparison to the overall protein–protein interactions within the cell. Yeast proteins originating from the two endosymbiosis partners are distinct in their physicochemical properties profile (Alvarez-Ponce and McInerney 2011). These differences, while still significant, are much smaller among proteins that utilize chaperones in their folding pathway than among chaperone-independent proteins. The molecular features that enable substrates to interact with chaperones, while not yet well understood, are likely to place constraints on the various physicochemical properties and can thus result in greater similarity of chaperone substrates when compared with other proteins. Moreover, adaptation to chaperone assisted folding is likely to affect these same features, actively driving substrates away from their ancestral profile and toward a common eukaryotic profile. For example, archaebacterial substrates are expressed in significantly higher levels than eubacterial substrates, yet these differences are still significantly smaller than those observed for proteins that do not interact with chaperones. In the crowded cell environment, successful competition for chaperones is likely to be linked to expression levels, thus putting eubacterial proteins in a disadvantage. Thus, a narrower expression range for substrate proteins may provide a competition field that is more balanced. This can be seen as a homogenizing effect of chaperones on their substrates, and from this perspective, chaperones can be viewed as inducers of the eukaryotic integrated state. Thus, chaperones have a cumulative impact on eukaryotic genome evolution. What makes molecular chaperones a class of proteins that is more amenable to integration? Chaperones are highly versatile proteins that increase the probability of their substrates to attain a functional conformation and by that can contribute significantly to the organismal fitness. Chaperones are essential in both prokaryotic domains (Hartl and Hayer-Hartl 2002; Calloni et al. 2012); hence, at the very origin of the eukaryotic cytosol, there was an absolute need for chaperones of both ancestries to assist in the folding of their respective substrates. Some molecular chaperones are very versatile, and in vitro they can assist folding of substrates from unrelated organisms, even from another prokaryotic domain (e.g., Yam et al. 2008). Moreover, similar chaperones may have similar substrate specificity and interact with similar sets of proteins. Therefore, eubacterial and archaebacterial chaperones might have had overlapping substrate sets at the initial steps of eukaryogenesis. In vivo, however, this capacity may not be sufficient, as the organism must sustain a balanced stoichiometric and energetic profile, which requires chaperones and substrate expression to be coordinated by a common regulatory regime. Thus, a nonspecific interaction pattern allows chaperones to acquire new clients without the need for intensive sequence modification or adaptation, and the evolution of a completely integrated system is expected to also include the regulatory context governing coexpression of chaperones and their substrates as well as optimizing the competitive binding of substrates and their dedicated chaperones. The effects of combining two ancestral chaperone systems may have conferred an even larger fitness benefit than was possible by either of the ancestral systems on its own. Moreover, the apparent redundancy in the chaperone repertoire may reflect not only the demands of protein folding pathways but the possibility that some chaperones are involved in other functions. Chaperones have been reported to posses such moonlighting functions (e.g., Wuppermann et al. 2008, see Henderson et al for a review). Moonlighting may also explain the expansion and diversification observed in several of the larger eukaryotic chaperone families. Nonetheless, retaining two chaperone systems would have entailed an additional energetic cost for the cell as chaperone synthesis and operation is expensive in terms of ATP usage. In the context of eukaryogenesis, this would not have posed an insurmountable problem, since the formation of mitochondria as an intracellular organelle resulted in a dramatic increase in the available energy for all cellular processes (Lane and Martin 2010). Nevertheless, energetic considerations might still play a role in the evolution of CSIs (Bogumil and Dagan 2012). In summary, in contrast with other proteins that still show a tendency to form network communities reflecting their ancestries, molecular chaperones have been able to cross the divide between the ancestral prokaryotic domains. The central role of chaperone-assisted folding in maintaining cellular fitness is reflected in the high degree of integration of an archaebacterial and a eubacterial chaperone systems into one at the origin of eukaryotes.

Materials and Methods

Data

Yeast protein sequences, amino acid usage data, functional assignments, chromosomal locations, frequencies of optimal codons, codon adaptation index, gravy scores (hydropathy index), and aromaticity scores were downloaded from the Saccharomyces Genome Database (Cherry et al. 1998). Chaperone–protein interaction data were obtained from Gong et al. (2009). The secondary structure of all proteins was inferred using PsiPred (Jones 1999), applying a threshold of 70% for the calculation of secondary structure probability. Quantitative protein expression data were obtained from Ghaemmaghami et al. (2003). The mRNA levels data were obtained from Wang et al. (2002). For the statistical analysis of protein expression levels, natural log transformation was applied. Proteins for which expression levels were not available (107 in total) or with zero expression level (1,665 proteins) were excluded from the analysis. All statistical analyses were performed using the MatLab Statistics toolbox.

Evolutionary Rate

Positional orthology assignments among 20 fungal genomes were obtained from Wapinski et al. (2007). Proteins lacking orthologs in any genome (282 in total) were excluded from the analysis. Multiple sequence alignments of all yeast open reading frames with orthologous sequences were generated with MAFFT (Katoh et al. 2005). Phylogenetic trees were reconstructed with PhyML v3.0_360-500 M (Guindon and Gascuel 2003) using the best-fit model as inferred by ProtTest 3 (Darriba et al. 2011) according to the Akaike information criterion measure (Akaike 1974). Distances from the S. cerevisiae proteins to their orthologs were calculated as the sum of branch lengths. To calculate the relative amino acid substitution rates of substrates, the distances to the 20 proteomes were first Z-transformed separately and then averaged over all orthologs (Bogumil and Dagan 2010).

Reconstruction of Prokaryotic Ancestries

We classified each of the 5,880 yeast protein-coding genes into archaebacterial, eubacterial, ambiguous prokaryotic ancestry, or eukaryote-specific, based on its phylogenetic affinities. Each yeast protein sequence was used as a query in a homology search against a database containing the proteomes of 82 archaebacteria and 1,074 eubacteria (3,792,506 proteins in total). Homology searches were carried out using position specific iterated–basic local alignment search tool (PSI-BLAST) (Altschul et al. 1997) without filtering. Global pairwise alignments of BLAST-hits were calculated using the EMBOSS package (Needleman and Wunsch 1970; Rice et al. 2000). Prokaryotic sequences with less than 25% identity were considered as having no significant similarity to the particular yeast query. Of the yeast genes, 161 had significant similarity to archaebacterial sequences exclusively (and were thus classified as being of archaebacterial ancestry), 383 had significant similarity to eubacterial sequences only (and were thus deemed as eubacterium-derived), and 686 had homologs in both prokaryotic domains. The remaining genes had no detectable prokaryotic homologs at the specified thresholds and were thus considered eukaryote-specific. To ascertain the ancestry of the 686 yeast genes with both archaebacterial and eubacterial homologs, we conducted a phylogenetic analysis. For each of these genes, a multiple sequence alignment including the 15 best BLAST hits from each prokaryotic domain was generated using MAFFT v6.843 b (Katoh and Toh 2008), and the quality of the alignment was tested with guidance (Penn et al. 2010). To be conservative in our analysis, columns with a confidence score <0.93 were removed. Phylogenetic trees were reconstructed with PhyML v3.0_360-500 M (Guindon and Gascuel 2003) using the best-fit model as inferred by ProtTest 3 (Darriba et al. 2011) according to the Akaike information criterion (Akaike 1974). We next rooted each tree on the branch that maximized the separation of archaebacterial and eubacterial sequences. The internal branch yielding the maximum ratio of archaebacteria to eubacteria content in the resulting clades was determined with the MRP function implemented in CLANN 3.2.2 (Creevey and McInerney 2005) using Spearman’s rank correlation coefficient. The yeast gene was classified as of eubacterial or archaebacterial ancestry depending on the clade within which it branched (see supplementary fig. S1, Supplementary Material online, for illustrative trees). Yeast genes were considered of ambiguous ancestry if no branch yielded a clear separation into an archaebacterial and eubacterial clades, if multiple branches separated the archaebacterial and eubacterial sequences equally well and resulted in conflicting ancestry assignments, or if the yeast gene branched between the archaebacterial and eubacterial clades. In such ambiguous cases, we repeated the analysis with a larger sample of homologous sequences, first with the 30 best BLAST hits from each domain, and if still ambiguous, with the 45 best BLAST hits from each domain. This analysis shifted 125 genes from the ambiguous to the unambiguous class. Of the 686 yeast genes with both archaebacterial and eubacterial homologs, 128 were classified as of archaebacterial ancestry, 420 as of eubacterial ancestry, and 138 as ambiguous. All phylogenetic trees are provided in supplementary tables S1 and S2, Supplementary Material online. In total, we inferred 289 proteins to be of archaebacterial ancestry, 803 of eubacterial ancestry, and 138 proteins with an unresolvable prokaryotic ancestry. The remaining yeast proteins did not show significant similarity with any prokaryotic protein.

Network Randomization

Randomization of the CSI network was carried out using the switching methodology (Stone and Roberts 1990; Artzy-Randrup and Stone 2005) implemented in an in-house MatLab script.

Supplementary Material

Supplementary figure S1 and tables S1–S3 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

71 in total

1. Reconstruction of the proto-mitochondrial metabolism.

Authors: Toni Gabaldón; Martijn A Huynen
Journal: Science Date: 2003-08-01 Impact factor: 47.728

2. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Authors: Stéphane Guindon; Olivier Gascuel
Journal: Syst Biol Date: 2003-10 Impact factor: 15.683

Review 3. Converging concepts of protein folding in vitro and in vivo.

Authors: F Ulrich Hartl; Manajit Hayer-Hartl
Journal: Nat Struct Mol Biol Date: 2009-06 Impact factor: 15.369

4. The hydrogen hypothesis for the first eukaryote.

Authors: W Martin; M Müller
Journal: Nature Date: 1998-03-05 Impact factor: 49.962

5. Why highly expressed proteins evolve slowly.

Authors: D Allan Drummond; Jesse D Bloom; Christoph Adami; Claus O Wilke; Frances H Arnold
Journal: Proc Natl Acad Sci U S A Date: 2005-09-21 Impact factor: 11.205

6. SGD: Saccharomyces Genome Database.

Authors: J M Cherry; C Adler; C Ball; S A Chervitz; S S Dwight; E T Hester; Y Jia; G Juvik; T Roe; M Schroeder; S Weng; D Botstein
Journal: Nucleic Acids Res Date: 1998-01-01 Impact factor: 16.971

7. Hsp90 as a capacitor of phenotypic variation.

Authors: Christine Queitsch; Todd A Sangster; Susan Lindquist
Journal: Nature Date: 2002-05-12 Impact factor: 49.962

Review 8. Archaeal chaperonins.

Authors: Andrew T Large; Peter A Lund
Journal: Front Biosci (Landmark Ed) Date: 2009-01-01

9. An evolutionary network of genes present in the eukaryote common ancestor polls genomes on eukaryotic and mitochondrial origin.

Authors: Thorsten Thiergart; Giddy Landan; Marc Schenk; Tal Dagan; William F Martin
Journal: Genome Biol Evol Date: 2012-02-21 Impact factor: 3.416

10. Defining the TRiC/CCT interactome links chaperonin function to stabilization of newly made proteins with complex topologies.

Authors: Alice Y Yam; Yu Xia; Hen-Tzu Jill Lin; Alma Burlingame; Mark Gerstein; Judith Frydman
Journal: Nat Struct Mol Biol Date: 2008-11-16 Impact factor: 15.369

9 in total

Review 1. The ring of life hypothesis for eukaryote origins is supported by multiple kinds of data.

Authors: James McInerney; Davide Pisani; Mary J O'Connell
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-09-26 Impact factor: 6.237

2. Pervasive convergent evolution and extreme phenotypes define chaperone requirements of protein homeostasis.

Authors: Yasmine Draceni; Sebastian Pechmann
Journal: Proc Natl Acad Sci U S A Date: 2019-09-16 Impact factor: 11.205

3. On the evolution of chaperones and cochaperones and the expansion of proteomes across the Tree of Life.

Authors: Mathieu E Rebeaud; Saurav Mallik; Pierre Goloubinoff; Dan S Tawfik
Journal: Proc Natl Acad Sci U S A Date: 2021-05-25 Impact factor: 11.205

Review 4. Towards a Dynamic Interaction Network of Life to unify and expand the evolutionary theory.

Authors: Eric Bapteste; Philippe Huneman
Journal: BMC Biol Date: 2018-05-29 Impact factor: 7.431

5. Hsp90 promotes kinase evolution.

Authors: Jennifer Lachowiec; Tzitziki Lemus; Elhanan Borenstein; Christine Queitsch
Journal: Mol Biol Evol Date: 2014-09-21 Impact factor: 16.240

6. A comprehensive analysis of the Omp85/TpsB protein superfamily structural diversity, taxonomic occurrence, and evolution.

Authors: Eva Heinz; Trevor Lithgow
Journal: Front Microbiol Date: 2014-07-21 Impact factor: 5.640

Review 7. Nuclear functions of prefoldin.

Authors: Gonzalo Millán-Zambrano; Sebastián Chávez
Journal: Open Biol Date: 2014-07 Impact factor: 6.411

8. Novel cryo-EM structure of an ADP-bound GroEL-GroES complex.

Authors: Sofia S Kudryavtseva; Evgeny B Pichkur; Igor A Yaroshevich; Aleksandra A Mamchur; Irina S Panina; Andrei V Moiseenko; Olga S Sokolova; Vladimir I Muronetz; Tatiana B Stanishneva-Konovalova
Journal: Sci Rep Date: 2021-09-14 Impact factor: 4.379

Review 9. Prefoldin Function in Cellular Protein Homeostasis and Human Diseases.

Authors: Ismail Tahmaz; Somayeh Shahmoradi Ghahe; Ulrike Topf
Journal: Front Cell Dev Biol Date: 2022-01-17

9 in total