| Literature DB >> 15535869 |
Juan F Poyatos1, Laurence D Hurst.
Abstract
By applying a graph-based algorithm to yeast protein-interaction networks we have extracted modular structures and show that they can be validated using information from the phylogenetic conservation of the network components. We show that the module cores, the parts with the highest intramodular connectivity, are biologically relevant components of the networks. These constituents correlate only weakly with other levels of organization. We also discuss how such structures could be used for finding targets for antimicrobial drugs.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15535869 PMCID: PMC545784 DOI: 10.1186/gb-2004-5-11-r93
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Overlap algorithm and multi-response randomization test method. (a) Overlap algorithm. C-based and L-based matrices are obtained from the interaction matrix. These matrices are then the input data of a standard hierarchical agglomerative average-linkage clustering algorithm [20] which extracts modules according to a given number of branches present in the clustering tree () (see text). Finally, in the C-based modular structure, we kept in each module only those components which also appeared in the corresponding L-based module with which the selected C-module had the greatest overlap. The organization thus obtained is the putative modular organization of the network under consideration. (b) Multi-response permutation procedure. We validate the previous modular organization with the use of the phylogenetic conservation of module protein constituents across species. We calculate a matrix of mean pairwise similarities (or distances) among those phylogenetic profiles [18] of proteins belonging to the same module, W, or every two pairs of modules, W, and computed a representative statistic ξ. P-values are obtained by randomly permuting the data and recomputing the statistic. This step is repeated a large number of times, 10,000 in our case. The resulting values form a randomized distribution. The observed value from the original data can then be compared with this distribution to compute the P-value.
Global and follow-up analysis of the network modular organizations
| Function | Full | Core | |||||||||
| Cellular fate | 34 | 323 | 14 | 0.012 | <0.001 | 2/5 | 16.7 | 0.035 | <0.001 | 3/6 | 6.5 |
| Energy | 25 | 84 | 5 | 0.066 | <0.001 | 1/1 | 12.4 | 0.156 | <0.001 | 1/4 | 4.4 |
| Metabolism | 102 | 420 | 15 | 0.067 | <0.001 | 2/8 | 15.7 | 0.177 | <0.001 | 4/9 | 4.7 |
| Cellular transport | 32 | 336 | 15 | 0.014 | <0.001 | 2/5 | 18.7 | 0.021 | < 0.001 | -/2 | 10.8 |
| Cell cycle | 26 | 514 | 13 | 0.012 | <0.001 | 2/3 | 26.6 | 0.05 | <0.001 | 2/7 | 8.5 |
| Protein fate | 48 | 352 | 18 | 0.014 | 0.004 | -/9 | 15.3 | 0.03 | 0.001 | -/10 | 8.7 |
| Transport facilitation | 20 | 63 | 4 | 0.034 | 0.047 | 1/1 | 10.7 | 0.372 | 0.097 | 1/1 | 6.5 |
| Cellular environment | 18 | 87 | 8 | 0.037 | 0.007 | 2/3 | 8.5 | 0.072 | 0.002 | 3/4 | 5.6 |
| Protein synthesis | 16 | 137 | 7 | 0.038 | 0.002 | 1/1 | 17.3 | 0.194 | <0.001 | 2/5 | 4.8 |
| Cell rescue | 26 | 88 | 8 | 0.08 | <0.001 | 1/2 | 7.7 | 0.108 | <0.001 | 1/3 | 4.2 |
| Signaling | 14 | 67 | 6 | 0.017 | 0.082 | -/2 | 9.3 | 0.018 | 0.157 | -/2 | 6.2 |
| Cellular organization | 36 | 258 | 15 | 0.032 | <0.001 | 1/7 | 12.3 | 0.097 | <0.001 | 3/9 | 5.3 |
| Transcription | 40 | 654 | 21 | 0.019 | <0.001 | 2/7 | 25.1 | 0.037 | <0.001 | 4/9 | 12.3 |
For every functional network of size n, we applied the network clustering algorithm with a given number of branches in the clustering tree, . These -values were chosen to be among those with significantly high average maximal overlap, that is, overlap equal to or greater than 0.8, low overlap ratios, and meso-scale average module size, that is, ~5-25. The outcome of this algorithm is a modular organization with M modules. For the follow-up analysis of both full and core components of the modules, third and fourth column groups, the following quantities are shown: ξ, the overall statistic, P, statistical significance of global test, P†, number of modules whose branch length in the similarity dendrogram (see text for details) is bigger than 0.1 in similarity units and P, number of modules whose within-similarity is statistically significant (P < 0.05) in the modular test. All P-values were obtained by means of an approximate permutation test with 10,000 randomizations and the use of binary phylogenetic profiles with a threshold of E= 1e-6 in the BLAST E-value [35].
Figure 2Modular organization, mean similarity dendrogram and phylogenetic profile. Modular organization, mean similarity dendrogram and phylogenetic profile of (a-c) cellular rescue, and (d-f) cellular environment functional networks. (a-d) Modular organization extracted with the network clustering algorithm. Protein interactions are plotted in brown. Modules are highlighted in white. Proteins within each module have been reorganized to show those with the greatest intra-modular connectivity - the core proteins - in the center of the module. (b,e) Mean similarity dendrograms. Branches for each corresponding module in (a) and (d) are joined at a node plotted at . Branches terminate at the mean similarity of each module, W, giving branch lengths of W- in similarity units. Dendrograms related to full modules are in black and those corresponding to the core components are in red. Those branches statistically significant (P < 0.05) end in a circle. (c,f) Continuous phylogenetic profiles color-coded from dark blue (maximal homology) to brown (no homology). Columns show the presence or absence of network nodes in a given organism and rows show the presence or absence of a given node in all the organism set. Species are arranged in taxonomic groups separated by white dashed vertical lines: Bacteria (left), Archaea (center), and Eukarya (right) (see Additional data file 1). The horizontal white dashed lines represent the localization of modules. A quick look at these figures provides evidence that proteins that are part of the same module exhibit a loosely correlated degree of conservation, as should be the case if modules represent some sort of discrete functional unit. This argument is quantitatively estimated by the branch length in the mean similarity dendrogram and the corresponding statistical significance.
Conservation properties of module core components for those functional networks with more than one statistically significant module core
| Conservation | ||
| Function | (B,A,E) | (-,-,E) |
| Cell fate | 0(0) | 6(3) |
| Metabolism | 3(1) | 6(3) |
| Cellular organization | 3(0) | 6(3) |
| Cellular environment | 3(2) | 1(1) |
| Protein synthesis | 3(0) | 2(2) |
| Transcription | 1(1) | 8(3) |
| Cell cycle | 0(0) | 7(2) |
Conservation of components follows two distinct patterns: module core components are conserved in all three kingdoms: (B,A,E) Bacteria, Archaea and Eukarya, or are only present in eukaryotes, (-,-,E). The table shows the number of module cores, with branch length ξ≥ 0.1, whose components have a representative phylogenetic profile of either type. Conservation profiles of statistically significant core components is shown in parenthesis. See also Table 1.
List of complexes significantly represented in the phylogenetically distinct module cores
| Function | Cores (rcc ≥ 5) | Complexes |
| Cell fate | 6 (2) | Actin-associated motor protein, 431 |
| Energy | 4 (2) | 47, 346, Serine/threonine phosphoprotein phosphatase |
| Metabolism | 9 (3) | 521, GGTase II, OT |
| Cellular transport | 2 (2) | Class C Vps, 239, 77, AP-3, AP-2 |
| Cell cycle | 7 (4) | Tubulins, CA, AP, 3, OR, SCF-GRR1, SCF-CDC4, RI |
| Protein fate | 10 (5) | Vps, Class C Vps, 71, 77, FT, GGTase I, 168, 651, OT, AP, 23 |
| Transport facilitation | 1 (1) | TOM |
| Cell environment | 4 (3) | STE5-MAPK, Kel1p/Kel2p, 521 |
| Protein synthesis | 5 (2) | elF3, elF2B, elF2, 340, 339, 613 |
| Cell rescue | 3 (3) | No complexes |
| Signaling | 2 (1) | 167, 308, 521 |
| Cell organization | 9 (6) | 272, 5, 71, 289, casein kinase II, 181, 167, Gim |
| Transcription | 9 (6) | 154, RM, RP, Ma, Cbf, Mb, 126, NSP1, TF, 178, CPK, 634, 160, CF |
Numbers correspond to those complexes found by systematic analysis as described in MIPS [23]. Abbreviations: AP, anaphase-promoting complex; CA, chromatin-assembly complex; Cbf, Cbf1/Met4/Met28; CF, core factor; CPK, cAMP-dependent protein kinase; FT, farnesyltransferase; GGTase I, geranylgeranyltransferase I; GGTase II, geranylgeranyltransferase II; Ma, Met4/Met28/Met32; Mb, Met4/Met28/Met31; OR, origin-recognition complex; OT, oligosaccharyltransferase; RI, replication initiation complex; RM, RNase MRP; RP, RNase P; TF, TFIIIC; TOM, transport across the outer membrane complex; Vps, Vps35/Vps29/Vps2. Here, ris the ratio between the number of complex components being part of a core and the total number of complex constituents.
Statistical significance of the overall analysis of coexpression, common 5' regulatory motifs, homogeneity in dispensability and lethality for the phylogenetically distinct module cores
| Function | ||||||
| Cell fate | <0.05 | - | - | <0.05 | 0.28 | 0.08 |
| Energy | - | <.005 | - | - | 0 | 0.05 |
| Metabolism | <0.0005 | <0.05 | - | <0.01 | 0.14 | 0.08 |
| Cellular transport | - | - | < 0.01 | - | None | 0.28 |
| Cell cycle | <0.05 | - | < 0.05 | 0.0001 | 0.35 | 0.29 |
| Protein fate | <0.0005 | - | - | - | 0.41 | 0.16 |
| Transport facilitation | - | - | - | - | 0.5 | 0.15 |
| Cell environment | - | - | - | <0.05 | 0 | 0.06 |
| Protein synthesis | <0.05 | - | < 0.0005 | 0.0001 | 0.2 | 0.06 |
| Cell rescue | <0.05 | - | - | - | 0 | 0.12 |
| Signaling | - | - | - | - | 0 | 0.12 |
| Cell organization | <0.01 | <0.05 | - | - | 0.08 | 0.12 |
| Transcription | <0.05 | <0.01 | <0.01 | <0.001 | 0.68 | 0.3 |
Statistical significance (P-values), of the overall analysis of coexpression (P-exp), common 5' regulatory motifs (P-mot), homogeneity in dispensability (P-hom) and lethality (P-let), for the phylogenetically distinct module cores (see text and Materials and methods for details). Not significant statistical results are denoted by -. p-core is the probability of finding lethal genes in the set of proteins without human homolog belonging to the significant cores. p-net is the probability of finding lethal genes in those proteins not found in humans which are part of each full network.