| Literature DB >> 18670624 |
Elena Zotenko1, Julian Mestre, Dianne P O'Leary, Teresa M Przytycka.
Abstract
The centrality-lethality rule, which notes that high-degree nodes in a protein interaction network tend to correspond to proteins that are essential, suggests that the topological prominence of a protein in a protein interaction network may be a good predictor of its biological importance. Even though the correlation between degree and essentiality was confirmed by many independent studies, the reason for this correlation remains illusive. Several hypotheses about putative connections between essentiality of hubs and the topology of protein-protein interaction networks have been proposed, but as we demonstrate, these explanations are not supported by the properties of protein interaction networks. To identify the main topological determinant of essentiality and to provide a biological explanation for the connection between the network topology and essentiality, we performed a rigorous analysis of six variants of the genomewide protein interaction network for Saccharomyces cerevisiae obtained using different techniques. We demonstrated that the majority of hubs are essential due to their involvement in Essential Complex Biological Modules, a group of densely connected proteins with shared biological function that are enriched in essential proteins. Moreover, we rejected two previously proposed explanations for the centrality-lethality rule, one relating the essentiality of hubs to their role in the overall network connectivity and another relying on the recently published essential protein interactions model.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18670624 PMCID: PMC2467474 DOI: 10.1371/journal.pcbi.1000140
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Structural properties of the tested protein interaction networks.
| Number of nodes | Number of edges | Average degree | Average clustering coefficient | |
| DIP CORE | 2,316 | 5,569 | 4.81 | 0.30 |
| LC | 3,224 | 11,291 | 7.00 | 0.36 |
| HC | 2,752 | 9,097 | 6.61 | 0.37 |
| TAP-MS | 1,994 | 15,819 | 15.87 | 0.60 |
| BAYESIAN | 4,135 | 20,984 | 10.15 | 0.26 |
| Y2H | 400 | 491 | 2.45 | 0.09 |
Amount of overlap between tested networks.
| DIP CORE | 0.58 | 0.62 | 0.25 | 0.61 | 0.02 |
| 0.28 | LC | 0.53 | 0.26 | 0.39 | 0.01 |
| 0.38 | 0.65 | HC | 0.47 | 0.47 | 0.02 |
| 0.09 | 0.18 | 0.27 | TAP-MS | 0.36 | 0.00 |
| 0.16 | 0.21 | 0.20 | 0.27 | BAYESIAN | 0.02 |
| 0.26 | 0.18 | 0.31 | 0.10 | 0.97 | Y2H |
Each row of the table corresponds to a single network and shows a fraction of its edges contained in other tested networks. Thus, for example, 58% of the edges in the DIP CORE network are also present in the LC network.
Figure 1Relationship between degree and essentiality in the tested networks.
(A) For each tested network the fraction of essential nodes among nodes with highest degree (hubs) is shown. The horizontal axis shows the fraction of the total network nodes that were designated as hubs. (B) Correlation between degree and essentiality is assessed by Kendall's tau and Spearman's rho rank correlation coefficients.
Figure 2Centrality measures demonstrated on a toy network.
Here we demonstrate the difference in the five centrality measures on a toy network. (A) The toy network consists of two cliques: K50 with nodes A1–A50 and K10 with nodes B1–B10. The two cliques are interconnected by an edge (A1, B1) and through an additional vertex D. Additional node C attaches to the network through A2. (B) As the measures assign centrality values based on different network properties they will rank nodes differently. Briefly, the eigenvector centrality measure (EC) will assign high-centrality values to nodes that are close to many other central nodes in the network. The subgraph centrality measure (SC) assigns centrality values to a node based on the number of closed walks that originate at the node. The shortest path betweenness centrality measure (SPBC) assigns the node centrality value based on the fraction of shortest paths that pass through the node averaged over all pairs of nodes in the network. The current-flow betweenness centrality measure (CFC) generalizes the SPBC measure by including additional paths, not just the shortest paths, in the computation. Here, the difference between the measures is exemplified by the rankings that they produce for the toy network nodes.
Figure 3Vulnerability to attack against most central proteins.
(A–F) The impact of node removal is quantified by the fraction of nodes in the largest connected component. There is one curve for each centrality measure that shows the fraction of nodes in the largest connected component as a function of the fraction of the most central nodes removed. We also show the impact of node removal in a random order and the size of the largest connected component when all essential proteins are removed.
Impact of the removal of essential proteins as compared to the removal of an equivalent number of random nonessential proteins with the same degree distribution.
| Essential | Random nonessential | |
| DIP CORE | 0.519 | 0.504±0.007 |
| LC | 0.578 | 0.551±0.010 |
| HC | 0.521 | 0.525±0.005 |
| TAP-MS | 0.512 | 0.512±0.011 |
| BAYESIAN | 0.685 | 0.625±0.006 |
| Y2H | 0.410 | 0.397±0.046 |
The impact of removal of a set of proteins is measured by the fraction of nodes in the largest connected component. For each network the effect of the removal of essential proteins and the removal of an equivalent number of random nonessential proteins with the same degree is shown.
Figure 4Enrichment of hubs and an equivalent number of most central nodes according to other centrality measures in essential proteins.
Fraction of essential proteins among hubs and an equivalent number of most central nodes according to four other centrality measures. The fraction of essential proteins among the nodes of the network is shown as ntwk.avg.
Correlation between centrality indices and essentiality.
| Eigenvector centrality | Subgraph centrality | |||
|
|
|
|
| |
| DIP CORE | 0.15 (3.5e-19) | 0.064 (8.6e-05) | 0.17 (1.2e-24) | 0.059 (2.5e-04) |
| LC | 0.23 (7.9e-56) | 0.094 (3.6e-11) | 0.23 (1.2e-55) | 0.093 (4.9e-11) |
| HC | 0.24 (1.8e-54) | 0.107 (2.9e-12) | 0.24 (7.9e-55) | 0.102 (3.4e-11) |
| TAP-MS | 0.12 (8.42e-11) | −0.007 (6.5e-01) | 0.12 (8.42e-11) | −0.007 (6.5e-01) |
| BAYESIAN | 0.17 (5.7e-39) | 0.046 (1.5e-04) | 0.17 (5.1e-41) | 0.051 (3.1e-05) |
| Y2H | 0.05 (1.1e-01) | 0.027 (2.5e-01) | 0.03 (2.0e-01) | −0.024 (7.2e-01) |
| Shortest-path betweenness centrality | Current-flow betweenness | |||
|
|
|
|
| |
| DIP CORE | 0.15 (3.2e-18) | −0.002 (5.5e-01) | 0.19 (2.7e-27) | 0.012 (2.5e-01) |
| LC | 0.21 (1.4e-46) | 0.003 (4.25e-01) | 0.26 (3.7e-70) | −0.007 (6.8e-01) |
| HC | 0.20 (1.9e-36) | 0.005 (3.7e-01) | 0.24 (2.6e-53) | −0.005 (6.2e-01) |
| TAP-MS | 0.12 (3.5e-11) | 0.018 (1.8e-01) | 0.16 (3.3e-18) | 0.017 (1.8e-01) |
| BAYESIAN | 0.18 (2.4e-41) | 0.005 (3.43e-01) | 0.23 (2.7e-69) | 0.018 (8.1e-02) |
| Y2H | 0.10 (1.2e-02) | 0.048 (1.4e-01) | 0.10 (1.4e-02) | 0.041 (1.8e-01) |
The correlation of centrality measures with essentiality (τ ess) is measured by Kendall's tau rank correlation coefficient. The correlation with essentiality, after controlling for correlation with degree centrality, is measured using the partial Kendall's tau rank correlation coefficient (τ ess.dc). The p-values are derived from the Kendall's tau z-scores and are shown in parentheses.
Difference between the observed and expected number of pairs where both proteins are either essential or nonessential.
| Total number of pairs | Number of pairs of the same type | Expected number of pairs of the same type | |||
| Simulation | Line fitting | Weighted line fitting | |||
| DIP CORE | 1,849 | 1,135 | 945 (3.6e-10) | 928 (8.6e-12) | 938 (8.0e-11) |
| LC | 10,777 | 6,143 | 5,691 (6.6e-10) | 5.556 (1.1e-15) | 5.589 (3.9e-14) |
| HC | 5,907 | 3,516 | 3,213 (2.0e-08) | 2,997 (2.2e-16) | 2,994 (2.2e-16) |
| Y2H | 3,254 | 2,167 | 1,976 (9.6e-07) | 2,025 (2.6e-04) | 2,052 (3.3e-03) |
The total number of pairs refers to the number of nonadjacent protein pairs with three or more common neighbors in the network. (Due to the sparsity of the Y2H network, the statistics are calculated for nonadjacent pairs having one or more neighbors in common.) The nodes in the pair are of “the same type” if they are both essential or both nonessential.
Figure 5The automatic method for extraction of ECOBIMs.
Here we demonstrate the major steps of the method on the HC network. The input to the method is a protein interaction network, GO annotation, and the set of essential nodes, which are shown in red. The method considers subnetworks induced by proteins annotated with the same GO biological process term, one subnetwork at a time, to identify densely connected regions or COBIMs. The COBIMs are shown by a COBIM intersection graph, where nodes correspond to COBIMs (the size of the node is proportional to the number of genes in the corresponding COBIM) and there is an edge between a pair of COBIMs if they have at least two proteins in common. The COBIMs that are enriched in essential proteins are selected as ECOBIMs, shown in green.
Figure 6Enrichment of ECOBIM and non-ECOBIM hubs in essential proteins.
Fraction of essential proteins among various types of hubs: all hubs, hubs that are members of ECOBIMs (ECOBIM hubs), and hubs that are not members of ECOBIMs (non-ECOBIM hubs). The fraction of essential proteins among all proteins in the network is also shown (ntwk.avg.). The numbers above the bars show the number of essential hubs out of the total number of hubs of this type for ECOBIM and non-ECOBIM hubs.
Membership in ECOBIMs and the centrality-lethality rule.
| Enrichment of ECOBIM hubs | Enrichment of non-ECOBIM hubs | Corr. degree vs. essentiality for non-ECOBIM hubs | |||||||
| Obs. | Rand. |
| Obs. | Rand. |
| Obs. | Rand. |
| |
| DIP CORE | 0.80 | 0.67 | 1.98e-03 | 0.26 | 0.43 | <1.00e-05 | 0.08 | 0.18 | <1.00e-05 |
| LC | 0.80 | 0.69 | 1.88e-03 | 0.32 | 0.48 | <1.00e-05 | 0.17 | 0.27 | <1.00e-05 |
| HC | 0.83 | 0.70 | 4.00e-05 | 0.35 | 0.51 | <1.00e-05 | 0.17 | 0.27 | <1.00e-05 |
| TAP-MS | 0.76 | 0.62 | 1.00e-05 | 0.24 | 0.40 | <1.00e-05 | 0.12 | 0.20 | <1.00e-05 |
| BAYESIAN | 0.77 | 0.65 | <1.00e-05 | 0.18 | 0.36 | <1.00e-05 | 0.09 | 0.20 | <1.00e-05 |
| Y2H | 0.85 | 0.66 | 5.81e-02 | 0.13 | 0.25 | 2.00e-05 | −0.04 | 0.05 | 2.00e-04 |
For every quantity three values are shown: the value under the true assignment of essential proteins (Obs.), the mean value under the randomized assignment of essential proteins (Rand.), and the fraction of the randomized assignments that resulted in values stronger (either smaller or larger depending on the context) than those obtained with the true assignment of essential proteins (p-value).
Largest ECOBIMs extracted from the tested networks.
|
| |||
| GO:0006508 proteolysis | 27 | 35 | 0.77 |
| GO:0042254 ribosome biogenesis and assembly | 27 | 32 | 0.84 |
| GO:0016192 vesicle mediated transport | 21 | 30 | 0.70 |
| GO:0016071 mRNA metabolic process | 18 | 28 | 0.64 |
| GO:0015931 nucleobase, nucleoside, nucleotide and nucleic acid transport GO:0051236 establishment of RNA localization | 15 | 24 | 0.62 |
| GO:0016072 rRNA metabolic process | 18 | 21 | 0.86 |
| GO:0008380 RNA splicing | 16 | 21 | 0.76 |
|
| |||
| GO:0042254 ribosome biogenesis and assembly | 88 | 107 | 0.82 |
| GO:0016071 mRNA metabolic process | 37 | 58 | 0.64 |
| GO:0008380 RNA splicing | 35 | 52 | 0.67 |
| GO:0015931 nucleobase, nucleoside, nucleotide and nucleic acid transport GO:0051236 establishment of RNA localization | 16 | 26 | 0.62 |
| GO:0006508 proteolysis | 17 | 24 | 0.71 |
|
| |||
| GO:0042254 ribosome biogenesis and assembly | 84 | 100 | 0.84 |
| GO:0016071 mRNA metabolic process | 49 | 71 | 0.69 |
| GO:0016072 rRNA metabolic process | 63 | 71 | 0.89 |
| GO:0008380 RNA splicing | 46 | 63 | 0.73 |
| GO:0006508 proteolysis | 28 | 35 | 0.80 |
|
| |||
| GO:0042254 ribosome biogenesis and assembly | 90 | 120 | 0.75 |
| GO:0016071 mRNA metabolic process | 46 | 66 | 0.70 |
| GO:0008380 RNA splicing | 45 | 62 | 0.73 |
| GO:0016072 rRNA metabolic process | 37 | 41 | 0.90 |
| GO:0016072 rRNA metabolic process | 30 | 32 | 0.94 |
| GO:0006508 proteolysis | 17 | 22 | 0.77 |
|
| |||
| GO:0042254 ribosome biogenesis and assembly | 119 | 152 | 0.78 |
| GO:0016072 rRNA metabolic process | 93 | 106 | 0.88 |
| GO:0008380 RNA splicing GO:0016071 mRNA metabolic process | 40 | 50 | 0.80 |
| GO:0006366 transcription from RNA polymerase II promoter | 23 | 42 | 0.55 |
| GO:0006508 proteolysis | 28 | 37 | 0.76 |
| GO:0006913 nucleocytoplasmic transport | 17 | 31 | 0.55 |
| GO:0006412 translation | 18 | 27 | 0.67 |
| GO:0051169 nuclear transport | 15 | 27 | 0.55 |
| GO:0045184 establishment of protein localization | 15 | 27 | 0.55 |
|
| |||
| GO:0007010 cytoskeleton organization and biogenesis | 9 | 11 | 0.82 |
| GO:0006366 transcription from RNA polymerase II promoter | 7 | 11 | 0.64 |
| GO:0045184 establishment of protein localization | 6 | 10 | 0.60 |
| GO:0006913 nucleocytoplasmic transport GO:0051169 nuclear transport | 6 | 10 | 0.60 |
For every tested protein interaction network we list the ECOBIMs with at least 20 members; for the Y2H network, the ECOBIMs with at least 10 members are listed. For each ECOBIM the following information is shown: the corresponding GO biological process term, number of essential genes, number of genes, and fraction of essential genes. For a list of all ECOBIMs and their member genes see Table S2.
ECOBIMs contain a large fraction of essential COBIM proteins.
| Enrich. ECOBIM proteins | Enrich. non-ECOBIM COBIM proteins | |||||
| Obs. | Rand. |
| Obs. | Rand. |
| |
| DIP CORE | 0.77 | 0.65 | <1.0e-05 | 0.06 | 0.21 | <1.0e-05 |
| LC | 0.77 | 0.65 | 1.00e-05 | 0.10 | 0.17 | 1.56e-03 |
| HC | 0.81 | 0.68 | <1.00e-05 | 0.12 | 0.18 | 2.31e-02 |
| TAP-MS | 0.74 | 0.64 | <1.00e-05 | 0.09 | 0.17 | 1.87e-03 |
| BAYESIAN | 0.76 | 0.65 | <1.00e-05 | 0.08 | 0.18 | <1.00e-05 |
| Y2H | 0.79 | 0.63 | 9.93e-03 | 0.06 | 0.17 | 3.00e-05 |
For each network the enrichment in essential proteins of ECOBIM nodes and enrichment of COBIM nodes that are not members of one or more ECOBIMs is shown. For each group three values are listed: the fraction under the true assignment of essential proteins (Obs.), the mean fraction under the randomized assignment of essential proteins (Rand.), and p-value of the difference.
Figure 7GO terms that are overrepresented among ECOBIM nodes.
For every network the GO terms that are overrepresented among ECOBIM nodes are shown. The overrepresentation of a GO term is quantified by the natural logarithm of a p-value, where the p-value is the probability that at least this number of ECOBIM genes would belong to the GO term had the ECOBIM genes been selected uniformly at random from the network genes.
Enrichment of ECOBIM and non-ECOBIM COBIM nodes for GO subnetworks in the DIP CORE network.
| GO term | Subnetwork nodes | ECOBIM nodes | Non-ECOBIM COBIM nodes |
| GO:0016072 rRNA metabolic process | 0.83 | 0.91 | n/a |
| GO:0006352 transcription initiation | 0.82 | 1.00 | n/a |
| GO:0006383 transcription from RNA polymerase III pro | 0.77 | 1.00 | 0.00 |
| GO:0042254 ribosome biogenesis and assembly | 0.72 | 0.87 | n/a |
| GO:0008380 RNA splicing | 0.71 | 0.79 | 0.50 |
| GO:0006839 mitochondrial transport | 0.64 | 0.80 | n/a |
| GO:0006360 transcription from RNA polymerase I pro | 0.64 | 0.80 | 0.00 |
| GO:0016071 mRNA metabolic process | 0.63 | 0.75 | 0.40 |
| GO:0006260 DNA replication | 0.61 | 0.93 | n/a |
| GO:0031123 RNA 3′-end processing | 0.59 | 0.93 | 0.29 |
| GO:0006399 tRNA metabolic process | 0.50 | 1.00 | 0.00 |
| GO:0007059 chromosome segregation | 0.49 | 0.76 | n/a |
| GO:0006944 membrane fusion | 0.48 | 0.75 | 0.22 |
| GO:0006508 proteolysis | 0.46 | 0.77 | n/a |
| GO:0051169 nuclear transport | 0.44 | 0.80 | 0.47 |
| GO:0006997 nuclear organization and biogenesis | 0.43 | 1.00 | 0.33 |
| GO:0000278 mitotic cell cycle | 0.43 | 0.81 | 0.19 |
| GO:0015931 nucleobase, nucleoside, nucleotide and n | 0.42 | 0.63 | n/a |
| GO:0006913 nucleocytoplasmic transport | 0.42 | 0.80 | 0.41 |
| GO:0051236 establishment of RNA localization | 0.42 | 0.63 | n/a |
| GO:0006366 transcription from RNA polymerase II pro | 0.40 | 0.75 | 0.29 |
| GO:0007010 cytoskeleton organization and biogenesis | 0.40 | 0.78 | 0.00 |
| GO:0048308 organelle inheritance | 0.39 | 0.86 | n/a |
| GO:0006401 RNA catabolic process | 0.38 | 0.83 | 0.41 |
| GO:0006461 protein complex assembly | 0.38 | 1.00 | n/a |
| GO:0045184 establishment of protein localization | 0.37 | 0.89 | 0.38 |
| GO:0009100 glycoprotein metabolic process | 0.37 | 0.63 | n/a |
| GO:0006412 translation | 0.36 | 0.85 | 0.00 |
| GO:0007005 mitochondrion organization and biogenes | 0.35 | 0.91 | n/a |
| GO:0006512 ubiquitin cycle | 0.34 | 0.82 | n/a |
| GO:0051325 interphase | 0.33 | 0.83 | 0.00 |
| GO:0016192 vesicle-mediated transport | 0.31 | 0.71 | 0.18 |
| GO:0000074 regulation of progression through cell cycl | 0.31 | 0.73 | 0.18 |
| GO:0000279 M phase | 0.30 | 0.80 | 0.17 |
| GO:0006974 response to DNA damage stimulus | 0.28 | 0.67 | 0.11 |
| GO:0006323 DNA packaging | 0.26 | 1.00 | 0.16 |
| GO:0006417 regulation of translation | 0.26 | 0.80 | n/a |
| GO:0016481 negative regulation of transcription | 0.25 | 1.00 | 0.13 |
| GO:0007001 chromosome organization and biogenesi | 0.22 | 0.79 | 0.16 |
| GO:0016458 gene silencing | 0.22 | 1.00 | 0.00 |
| GO:0040029 regulation of gene expression, epigenet | 0.21 | 1.00 | 0.00 |
| GO:0007047 cell wall organization and biogenesis | 0.17 | 0.75 | n/a |
For each GO subnetwork that contributed at least one ECOBIM, the fractions of essential proteins among the subnetwork nodes, subnetwork ECOBIM nodes, and subnetwork non-ECOBIM COBIM nodes are shown.