| Literature DB >> 19180181 |
Abstract
Although functionally related proteins can be reliably predicted from phylogenetic profiles, many functional modules do not seem to evolve cohesively according to case studies and systematic analyses in prokaryotes. In this study we quantify the extent of evolutionary cohesiveness of functional modules in eukaryotes and probe the biological and methodological factors influencing our estimates. We have collected various datasets of protein complexes and pathways in Saccheromyces cerevisiae. We define orthologous groups on 34 eukaryotic genomes and measure the extent of cohesive evolution of sets of orthologous groups of which members constitute a known complex or pathway. Within this framework it appears that most functional modules evolve flexibly rather than cohesively. Even after correcting for uncertain module definitions and potentially problematic orthologous groups, only 46% of pathways and complexes evolve more cohesively than random modules. This flexibility seems partly coupled to the nature of the functional module because biochemical pathways are generally more cohesively evolving than complexes.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19180181 PMCID: PMC2615111 DOI: 10.1371/journal.pcbi.1000276
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Datasets used in this study.
| Dataset | Number of Modules | Average Module Size |
| SGD | 106 | 4.56 |
| KEGG | 92 | 14.89 |
| MIPS | 199 | 5.91 |
| Aloy | 87 | 6.95 |
| PE | 433 | 4.37 |
| Socio-affinity | 461 | 11.15 |
| All | 1285 | 8.02 |
| All curated | 447 | 7.51 |
The number of modules and the average number of subunits in the modules are listed per dataset, as well as for the nonredundant combination of all datasets (‘all’) and of all curated datasets (‘all curated’). The SGD pathways and KEGG datasets are curated and consist mainly of metabolic pathways. The PE and socio-affinity datasets both result from clustering Tandem Affinity Purification (TAP) data. The differences between these two datasets include the fact that PE clusters are based on raw data from the study by Krogan et al. [30] as well as from Gavin et al. [14], the similarity score (Purification Enrichment versus Socio-affinity) and the algorithm used to cluster the proteins. MIPS and Aloy are two curated complex datasets, the Aloy dataset is a manual selection based on extensive literature curation, information on protein structures and previous TAP derived protein complexes [26]. Curated datasets comprise approximately one third of all modules.
Figure 1Example of a flexibly evolving complex: Nup84 subcomplex of the nuclear pore complex.
(A) The profile of the Nup84 complex, red indicating absence, green presence (number of paralogs in dark green). The raw score of this complex is (5,0), which means that there are 5 species in which this complex is completely present and none in which this complex is completely absent. The cohesiveness score, which is the fraction of random modules of the same size which score better both in the number of species in which the module is present as well as in the number of species in which the module is absent, is 0.48. This complex from the Aloy dataset occurs also in the MIPS dataset and, with some additional subunits, in the PE and Socio-affinity clusters, so it passes the cross-comparison filter without losing any subunits. (B) The profile after cross-comparison with TAP data. SEC13, which is also part of the COPII complex, has the lowest PE score with the other subunits and has a higher propensity to interact with a protein outside the module (namely with SEC31, an other member of the COPII complex) than with any other member of this module. Removal of this protein from the module results in a subcomplex which is not evolving more cohesively than the original module. (C) Apart from improving the module definition, we attempt to filter possible noise originating from the use of orthologous groups to describe a modules evolutionary dynamics. KOG0845, KOG1964, KOG2271 and KOG8539 are considered unreliable because they have less than 90% overlap with a orthoMCL derived orthologous group. Removal of those orthologous groups leads to a more cohesively evolving module, with a raw score (24,2) and a cohesiveness score 0.87. (D) Removal of orthologous groups which are likely to have functionally differentiated (groups containing many inparalogs, in this example KOG0845 and KOG1332) results in a submodule which we consider evolutionary cohesive: it has a raw score of (5,8) and a cohesiveness score of 0.996. More details on this module and some additional examples can be found in Text S2.
Fraction of cohesive modules for different datasets and different scoring schemes.
| Dataset | Average Cooccurrence | Average Deviation from Modular | Homogeneous Columns | Species Absent | Species Present | Species Absent, Species Present |
| SGD | 0.14 | 0.15 | 0.09 | 0.06 | 0.03 | 0.44 |
| KEGG | 0.24 | 0.24 | 0.17 | 0.08 | 0.16 | 0.38 |
| MIPS | 0.17 | 0.17 | 0.15 | 0.05 | 0.1 | 0.33 |
| Aloy | 0.21 | 0.23 | 0.16 | 0.02 | 0.1 | 0.31 |
| PE | 0.08 | 0.08 | 0.06 | 0.03 | 0.05 | 0.21 |
| Socio-affinity | 0.27 | 0.3 | 0.2 | 0.01 | 0.19 | 0.24 |
| All | 0.18 | 0.2 | 0.14 | 0.03 | 0.12 | 0.27 |
| All curated | 0.19 | 0.19 | 0.15 | 0.06 | 0.1 | 0.37 |
Average Cooccurrence: for each pair of module subunits we calculate the fraction of species in which both subunits are either present or absent together. We average over all component pairs to obtain a score per module. Average deviation from modular: the sum of the deviation of the number of components of the functional module for each genome to the average number of module components per genome, adopted from Snel et al. [9]. Homogeneous Columns: the number of species in which a module is either completely present or completely absent, adopted from Gavin et al. [14]. Species Present, Species Absent: the number of species in which a module is completely present and the number of species in which the module is completely absent. Those two values together make up the raw score which is used throughout the article.
Figure 2Scores and random background.
This figure shows the raw scores for modules composed of six subunits from all datasets, with the Nup84 complex from Figure 1 highlighted in green. The random background density for all score bins is shown in shades of blue, turning darker as the number of random modules with a score in that particular bin increases.
Figure 3(Combined) effect of different filters on the fraction of cohesive modules.
On top of each bar we show the number of (sub)modules passing the filter.