Literature DB >> 34220298

Bacterial Protein Interaction Networks: Connectivity is Ruled by Gene Conservation, Essentiality and Function.

Maddalena Dilucca¹, Giulio Cimini², Andrea Giansanti³.

Abstract

BACKGROUND: Protein-protein interaction (PPI) networks are the backbone of all processes in living cells. In this work, we relate conservation, essentiality and functional repertoire of a gene to the connectivity k (i.e. the number of interactions, links) of the corresponding protein in the PPI network.
METHODS: On a set of 42 bacterial genomes of different sizes, and with reasonably separated evolutionary trajectories, we investigate three issues: i) whether the distribution of connectivities changes between PPI subnetworks of essential and nonessential genes; ii) how gene conservation, measured both by the evolutionary retention index (ERI) and by evolutionary pressures, is related to the connectivity of the corresponding protein; iii) how PPI connectivities are modulated by evolutionary and functional relationships, as represented by the Clusters of Orthologous Genes (COGs).
RESULTS: We show that conservation, essentiality and functional specialisation of genes constrain the connectivity of the corresponding proteins in bacterial PPI networks. In particular, we isolated a core of highly connected proteins (connectivities k≥40), which is ubiquitous among the species considered here, though mostly visible in the degree distributions of bacteria with small genomes (less than 1000 genes).
CONCLUSION: The genes that support this highly connected core are conserved, essential and, in most cases, belong to the COG cluster J, related to ribosomal functions and the processing of genetic information.

Entities: Chemical

Keywords: Protein-protein interactions; bacterial genomes; cellular processes; clusters of orthologous genes; evolutionary retention index; gene essentiality

Year: 2021 PMID： 34220298 PMCID： PMC8188579 DOI： 10.2174/1389202922666210219110831

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

INTRODUCTION

To operate biological activities in living cells, proteins work in association with other proteins, often assembled in large complexes. Hence, knowing the interactions of a protein is important to understand its cellular functions. Moreover, a comprehensive description of the stable and transient protein-protein interactions (PPIs) within a cell would facilitate the functional annotation of all gene products, and provide insight into the higher-order organization of the proteome [1, 2]. Several methodologies have been developed to detect PPIs, and have been adapted to chart interactions at the proteome-wide scale. These methods, combining different technologies, experiments and computational analyses, generate PPI networks of sufficient reliability, enabling the assignment of several proteins to functional categories [3, 4]. Moreover, the statistical study of bacterial PPIs over several species (meta-interactomes) has brought important knowledge about protein functions and cellular processes [5, 6]. Our aim here is to shed some light on the relationships among conservation, essentiality and functional annotation at the genetic level and connectivities of PPI networks, at the protein level. We extend here our previous observations made on the PPI of E. coli which suggested a strong correlation between the connectivity of PPI networks on the one hand, and codon bias, gene conservation and essentiality on the other hand [7, 8]. In the next two paragraphs, it is worth making more precise what is usually meant by gene essentiality and gene conservation. Individual genes in the genome differently contribute to the survival of an organism. According to their known functional profiles and based on experimental evidence, genes can be divided into two categories: essential and nonessential ones [9, 10]. Essential genes are not dispensable for the survival of an organism in the environment it lives in [10, 11]. Nonessential genes are instead those which are dispensable [12], being related to functions that can be silenced without compromising the survival of the organism. Naturally, each species has adapted to one or more evolving environments and, plausibly, genes that are essential for one species may not be essential for another one. It has been argued many times that essential genes are more conserved than nonessential ones [13-17]. The term ‟conservation” has, however, at least two meanings. On the one hand, a gene is conserved if orthologous copies of it are found in the genomes of many species, as measured by the Evolutionary Retention Index (ERI) [9, 18]. On the other hand, a gene is (evolutionarily) conserved when it is subject to a purifying, selective evolutionary pressure, which disfavors mutations. A common measure of evolutionary pressure is K, the ratio of the number of non-synonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site. In this second meaning, a conserved gene is, in a nutshell, a slowly evolving gene, a gene that hardly incorporates mutations [13, 19]. To measure the evolutionary pressures exerted on the genes of low, intermediate and high connectivity bacterial proteins, we use here K, and to measure evolutionary patterns of codon bias, we use the Effective Number of Codons (ENC) plots. The main finding of this work is the presence of a functional transition in bacterial PPI networks, ruled by degree connectivity k. The genes of proteins with high connectivities are under selective pressure, conserved, and essential. Below the transition (k<50), the functional repertoire of low connectivity proteins is heterogeneous, whereas the genes of proteins with k>50 mainly belong to the Cluster of Orthologous Genes (COG) J (related to translation, ribosomal structure and biogenesis), with just a few interesting hubs belonging to COGs I (Lipid transport and metabolism), K (Transcription) and L (Replication, recombination and repair). Moreover, we show here that in the degree distribution of each bacterial PPI network, there is a ubiquitous trace of an almost-invariant structure of conserved hubs, essentially due to the ribosomal protein complexes, mostly visible in the networks of bacteria with small genomes.

MATERIALS AND METHODS

Bacterial Dataset and Protein-protein Interaction Networks

We consider a set of 42 bacterial genomes (that we have previously investigated in [8]), shown in Table . Nucleotide sequences were downloaded from the FTP server of the National Center for Biotechnology Information [20]. These genomes were chosen in order to have a reasonably broad coverage of data concerning conservation, essentiality and selective pressure. PPIs are obtained from the STRING database (Known and Predicted Protein-Protein Interactions, https://string db.org/) [21].We have chosen STRING because of its quite broad coverage of different bacterial species, useful to extend to multiple species we studied [7]. In STRING, each interaction is assigned with a confidence level or probability w, evaluated by comparing predictions obtained by different techniques [22-24] with a set of reference associations, namely the functional groups of KEGG (Kyoto Encyclopedia of Genes and Genomes) [25]. In this way, interactions with high w are likely to be true positives, whereas a low w possibly corresponds to a false positive. As usually done in the literature, we consider only interactions with w≥ 0.9, a threshold that provides a fair balance between coverage and interaction reliability (see, for instance, the case study on E. coli reported in reference [7]). We denote the degree (number of connections) associated to each protein in each PPI network after the thresholding procedure by k. It is to be noted also that after applying the cut-off, we are left, for each network, with a number of isolated proteins (singletons, with no connections) that grow as (where n is the number of proteins in the genome). These isolated proteins are not considered in the network analysis and are regarded as stemming from statistical noise or just appear isolated because the PPI data is incomplete. It is known that PPIs of some species in our dataset might be much better known than others (e.g. E. coli). To take into account a potential bias in the dataset, we checked in Fig. () of the Supplementary Information (bottom panel) that the densities of PPIs are high for small genomes and tend to be constant and not so different from that of E. coli in bacteria with bigger genomes, among which we collect here highly investigated pathogens. The distinction between small and big genomes is a key emergent point in this work. We divided the set of 42 bacterial genomes into three groups, according to the number n of their genes: a) n<1000, b) 1000< n<3000 and c) n>3000. In several figures in the Supplementary Information, we have addressed the dependence of various network properties on the size of the genome.

Gene Conservation

The Evolutionary Retention Index (ERI) [9] is a way of measuring the degree of conservation of a gene. In the present study, the ERI of a gene is the fraction of genomes, among those reported in Table , with at least an orthologous (same COG label) of the given gene. Then, as reminded in the Introduction, a low ERI value is related to a gene which is rather specific, common to a small number of genomes; whereas high ERI is characteristic of highly shared, putatively universal and essential genes. We also make reference to another notion of gene conservation. Conserved genes are those which are subject to a purifying, conservative evolutionary pressure. To discriminate between genes subject to purifying selection and genes subject to positive selective Darwinian evolution, we use a classic but still widely used indicator, the ratio K/K between the number of nonsynonymous substitutions per nonsynonymous site (K) and the number of synonymous substitutions per synonymous site (K) [19]. Conserved genes are characterized by K 1. We used K estimates by Luo [15] that are based on the method by Nej and Gojobori [26].

Gene Essentiality

We used the Database of Essential Genes (DEG, ) [15], which classifies a gene as either essential or nonessential, on the basis of a combination of experimental evidence (null mutations or transposons) and general functional considerations. DEG collects genomes from Bacteria, Archaea and Eukarya, with different degrees of coverage [27, 28]. Of the 42 bacterial genomes we considered, only 23 are covered-in total or partially-by DEG, as indicated in Table .

K / K

K/K is the ratio of nonsynonymous substitutions per nonsynonymous site (K) to the number of synonymous substitutions per synonymous site (K) [19]. This parameter is widely accepted as a straightforward and effective way of separating genes subject to purifying evolutionary selection (K 1) from genes subject to positive selective Darwinian evolution (K 1). There are different methods to evaluate this ratio, though the alternative approaches are quite consistent among themselves. For the sake of comparison, we have used here the K estimates by Luo et al. [15], which are based on the Nej and Gojobori method [26]. It must be noted that each genome has a specific average level of K/K [7]. Average values of K are shown for low, intermediate, and high connectivity bins of genes.

ENC Plot

The ENC-plot is a well-known tool to investigate the patterns of synonymous codon usage in which the ENC (Effective Number of Codons) values are plotted against GC3 Guanine and Cytosine Content at the third codon position. The formula of ENC values expected under the hypothesis of pure mutational bias (no selection) is given by: (1) where s represents the value of GC3 [29]. When the corresponding points fall near the expected neutral curve, mutations that enforce the typical mutational bias of the species are the main factor affecting the observed codon diversity. Whereas when the corresponding points fall considerably below the expected curve, the observed codon usage bias of the species is mainly affected by natural selection. To quantitatively represent the balance between mutational bias and selective natural pressure, we parametrise the ENC formula to be used in non-linear fits to the experimental data: (2) ENC plots of genes corresponding to low, intermediate and high connectivity proteins are shown in Fig. () of the Supplementary Information. The best-fit parameters for the three groups of genes are collected in Table .

Clusters of Orthologous Proteins

We use the functional annotation given in the database of orthologous groups of proteins (COGs) from Koonin’s group, available at [30, 31]. We consider 15 functional COG categories Table , excluding the generic categories R and S for which functional annotation is either too general or missing.

RESULTS AND DISCUSSION

Degree Distribution of PPI Networks. We start by studying the degree distributions P (k) observed in bacterial PPIs. We first recall that such distribution was found to be scale-free in E. coli [7, 32-34], meaning that the corresponding PPI network features a large number of poorly connected proteins and a relatively small number of highly connected hubs. In order to assess the generality of this observation, we compute P (k) for each genome in Table (plots are reported in Figs. ( and ) of the Supplementary Information). Note that, despite the fact that PPI networks of different bacteria have different sizes and densities, their average connectivity and the support of their P(k) are very similar. Thus, we can superpose all the considered bacterial degree distributions without the need to normalise the support of each P (k). When doing so, we observe two distinct regimes (Fig. ). For low values of k < 40, the distribution is approximately scale-free: P(k) k (γ = 2.48). This scaling behaviour is consistent with previous studies on the genomes of yeast, worms and flies [35] and on co-conserved PPIs in some bacteria [36]. The scale-free nature of bacterial PPIs is still a matter of debate, and a rough discussion of the origin of this feature is out of the scope of this paper. In this work, we generally confirm that, as said above, there is, as expected, a large number of poorly connected proteins and a small number of hubs. Remarkably, for higher values of k, the distribution deviates from a power law, and a bump with a Gaussian-like shape emerges. This feature, visible for k 40 may be due to the contribution of proteins be- longing to large complexes [37]. From the whole set of observations presented in this paper, the bump in the P (k) is due to the complexity of ribosomal interactions. Indeed, if one recalculates the degree distribution of a dataset in which the ribosomal proteins are removed, the bump is not present (Fig. (), empty dots). Moreover, if we consider the separate contribution of essential and nonessential genes to the P (k) (for DEG-annotated genomes), we see that the bump is present only in the degree distribution of essential genes. It is to be noted also that the degree distributions for essential and nonessential genes are well separated and the average degree is systematically higher for essential genes than for nonessential ones, consistently with previous findings [35]. Remarkably, we have shown in a previous paper [8] that the number of essential genes in bacteria is close to 500 and does not depend on the size of the genome. To correctly interpret the emergence of the bump in the average P (k) in Fig. (), it is worth pointing out the distinction between small and not so small genomes. In the small genomes, almost all the genes are essential, and among the essential genes, those belonging to COG J (functions related to translation and ribosomal structure and biogenesis) play a major and ubiquitous role. In Fig. (), we have checked that the bump that emerges in Fig. () as a feature of essential and conserved genes is quite visible in the P(k) of small genomes, whereas, there seems a confusion in the case of bigger genomes. This might be interpreted as a dilution effect; in the networks of bigger genomes, there are a lot of specific interactions besides the essential ones. Then, averaging P (k) over small, intermediate and big genomes (Fig. in Supplementary Information), we can safely interpret the bump as an emerging feature due to a core of highly connected proteins (connectivities k≥40), which is mostly contributed, in the average, by degree distributions from PPIs of bacteria with small genomes (Figs. S2-S4). From all the considerations above, we exclude that this bump, observed here for the first time, could emerge just because that part of the PPI is much more investigated than other subnetworks. It is there because the ribosome is there, in all bacteria (Table ).

PPI Connectivity and Gene Conservation

We now investigate whether the connectivity k of a protein in a PPI network drives a transition in the degree of conservation (as measured by ERI) of the corresponding genes. Fig. () displays the average value and the spread of ERI in genes relative to bins of proteins that are iso-connected in the PPIs of different species. As a general feature, we observe that, on average, the genes of highly connected proteins are highly conserved among the bacterial species we consider that constitute a reasonably wide sample of different evolutionary adaptations. The same Fig. () shows that if k 50, then the ERI highly fluctuates between different samples of proteins with the same k, in different species. For high connectivities (above k = 50), the ERI is close to 1, with a drastic drop in the fluctuation (as shown in the inset). This observation points to the existence of an almost-invariant structure of conserved hubs, in each bacterial PPI, sustained by highly conserved genes. We can conclude, as a rule of thumb, that a protein with connectivity degree of 40 or more is likely to be coded by a gene shared by at least 80% of the species in a generic pool of bacteria. At the moment, we do not have a general explanation for this apparent threshold. Let us just propose, as a heuristic observation, the existence of an almost-critical value of connectivity to be set between 40 and 50, that corresponds to the connectivity of the core of proteins specifically involved, as we have alluded to in the previous paragraph, to the ubiquitous ribosomal functions (Tables and ).

Evolutionary Pressure and PPI Connectivity

We then look at the evolutionary pressure exerted on genes whose proteins have different connectivities. The graph in Fig. () shows the ratio K for groups of genes binned by the connectivity k of the corresponding proteins, for all the 42 bacterial species in Table . As is well known, this ratio K provides a straightforward indication of the balance between a positive driving Darwinian selection (when the numerator prevails) and a purifying, stabilising selection (acting against change in genes for which the denominator prevails). We see that the more connected proteins correspond to genes that are subject to an increasing purifying evolutionary pressure. Indeed, the ratio (K) is less than 1 in all bins of connectivity and systematically decreases as a function of k. A decreasing ratio generally indicates an increasing role of purifying, conservative, Darwinian, evolutionary pressure on the corresponding set of genes. This is a reasonable result, pointing out that the groups of genes that support conserved structures of connectivity in the PPIs are more constrained, in evolution than the genes of less interacting proteins. To add evidence to this observation, we have also considered ENC plots for sets of genes binned by the connectivities of the corresponding proteins. Interestingly, the ENC data in Fig. () of Supplementary Information are fully consistent with those in Fig. (). In the ENC plots, the points associated with low connectivity proteins (red) are closer to the so-called Wright’s profile (represented there as solid black lines) than those associated to proteins with intermediate and high connectivities (green and blue lines). Fig. () stresses this observation in a more quantitative way by showing that in the ENC plots, the average distance from Wright’s profile monotonously increases with k. Overall, the above results clearly indicate that codon bias and GC content of high connectivity genes are more under selective Darwinian pressure than genes coding for low-connectivity proteins, in which the rate of accepted mutations is mainly ruled by neutral mutational bias. These observations point out that the almost-invariant structure of protein hubs we alluded to in the previous paragraph, is supported by an underlying set of genes that are under strong mutational control; an expected result, perhaps, but clearly seen, here, as a general feature associated with ribosomal ubiquitous and conserved functions. PPI and Essentiality. To further investigate the relationship between gene essentiality and protein connectivities, we consider DEG-annotated genomes and classify interactions between proteins (links) making reference to the essentiality of the corresponding genes. We distinguish three sets of links:|ee| (linking proteins from two essential genes), |ēē| (from two nonessential genes) and |eē| (from an essential gene and a nonessential one). We then compute the density of these sets of links respectively as: (3) where E and NE denote the number of essential and nonessential genes, respectively (self-connection are excluded in our analysis). The denominator is the maximum possible value of the numerator, corresponding to the fully-connected graph. Such densities are then compared with the overall density of the network-restricted to genes classified as either essential or nonessential: (4) We use the ratios to assess the level of connectivity of the subnetworks with respect to the overall connectivity. Table shows that subnetworks of essential genes are far denser than the overall networks, and that, in general, essential and nonessential genes tend to form network components that are weakly interconnected. This happens because many essential genes encode for ribosomal proteins, which in turn are localised in the ribosomal complex where they have a high probability of interacting [39] Table of [8], which shows approximately 25% of essential genes fall into COG J. Figs. ( and ) of the Supplementary Information collect the superposed adjacency matrices of the |ee| (red dots), |eē| (violet dots) and |ēē| (blue dots) subnetworks that display such network features for each individual species. These graphs confirm the dominance of the interactions between the proteins of essential genes (red dots) in the small genomes. The adjacency matrices of bacteria with intermediate and big genomes are dominated by interactions involving proteins supported by nonessential genes (blue dots). PPI Connectivity and Functional Specialisation. For each PPI network, we define the conditional probability (Bayes’ theorem) that a protein with degree k belongs to a given COG as: (5) where P (k) is the degree distribution in the PPI network, P (COG) is the frequency of that COG in the proteome, and P (k COG) is the degree distribution restricted to that COGs. Fig. () shows the COG spectrum as a function of k over all the bacteria here considered. Interestingly, we again note a marked transition. Below k 40, the COG spectrum is quite heterogeneous: genes corresponding to proteins with low connectivity are spread over several COGs, which correspond to different functions (Table ). The transition shows that proteins with more than 40 interactions are likely to be coded by genes belonging to COG J. There are yet a handful of outliers, hubs with connectivities between 57 and 62, that belong to COG I (related to lipid transport and metabolism) and K and L (which, together with J, dεfine the functional class of information storage and processing). The list of these outliers is reported in Table . Interestingly, they correspond to RNA polymerases and to enzymes involved in acetate metabolism. But, which are the genes of COG J that drive the transition? In the next Fig. (), we show which genes are the main characters in the transition. We then investigate the connectivities of the highly conserved (ERI=1, shared by all the species in Table ) genes belonging to COG J and whose proteins have connectivities bigger than 40. These highly shared genes corresponding to cores of highly connected ribosomal proteins are listed in Table . In the heat map of Fig. (), we sort each gene in the COG J in order of descending degree, species by species, and we see there is a core of genes (in red, lower left sector) that correspond to highly connected proteins, which are also highly shared (ERI =1, see Table ) among all the species we considered. It is quite clear that in the heat map of Fig. () the 42 species in this study can be split into at least two groups (see the cladogram on the left). In one group the group of species at the Bottom in Fig. () there is a shared set of genes (the red band at the bottom-left side of the heat map) corresponding to a common core of highly connected ribosomal proteins. This remarkable observation suggests that the species in this group (namely, Synechocystis sp. PCC 6803, Escherichia coli K-12 MG1655, Clostridium acetobutylicum ATCC 824, Mycobacterium tuberculosis H37Rv, Sphingomonas wittichii RW1, Vibrio cholerae N16961, Burkholderia thailandensis E264, Rickettsia prowazekii str. Madrid E, Agrobacterium tumefaciens (fabrum), Ralstonia solanacearum GMI1000, Xylella fastidiosa 9a5c) should have a common structural and functional organisation of their ribosomes, an interesting point to be further investigated. In the rest of the species, the connectivity of the proteins, corresponding to the highly shared COG J genes, with k > 40 is more heterogeneous. We can conclude that the abrupt transition shown in Fig. () is driven by a subset of COG J genes which are highly conserved among a subset of species and are listed in Table . As one can see, these genes correspond to a specific subset of ribosomal proteins in the small and large subunits that should be further investigated in their functional and structural role.

CONCLUSION

Connectivity analysis of biological networks, such as protein-protein interaction or metabolic networks, has demonstrated that structural features of network subgraphs are correlated with biological functions [40, 41]. For instance, it was shown that highly connected patterns of proteins in a PPI are fundamental to cell viability [42]. In this work, we have shown the existence of a functional transition in bacterial species, ruled by the connectivity of proteins in the PPI networks (Fig. ). The critical threshold in k of the transition is located between k=40 and k=50. Proteins that have connectivities above the threshold are mostly encoded by genes that are conserved, under selective pressure (as measured both by ERI and K) and essentiality. Moreover, the functional repertoire above the threshold mainly focuses on the COG J (translation, ribosomal structure and biogenesis), with just a few interesting hubs belonging to COGs I (Lipid transport and metabolism), K (Transcription) and L (Replication, recombination and repair). Indeed, the PPI network of each bacterial species is characterised by a highly connected core of conserved ribosomal proteins, the components of multi-subunit complexes whose corresponding genes are mostly essential [32, 36] and code for supra-molecular complexes that pile up in the bump we have observed for the degree distribution (Fig. ). Hence, what we see here is essentially the ribosome and related protein complexes such as RNA Polymerase. Indeed, the ribosome is the only molecular machine in bacteria in which a given protein could legitimately have 40 or more protein binding partners, with the help of rRNA mediating interactions [43]. It is reasonable to admit that, since there are bacterial species that are much more investigated than others, comparative statistical studies of bacterial PPIs might be particularly biased by the choice of the sample of genomes to be included in the study. Our dataset is no exception. In order to address this hard to settle problem in our study, we have checked Fig. (S1) that in our study, we have included small genomes (i.e. less than 1000 genes) whose PPIs have densities (a rough proxy for the coverage of the interactions in the network) that are higher than those of bigger genomes. The group of small genomes comprises Buchnera, Chlamydia, and Mycoplasmas, whereas bigger genomes refer mostly to illustrious pathogens that are surely among the most investigated bacterial species. The densities of the networks of these species are quite similar and comparable with that of E. coli. As a general rule, and quite obviously, the networks relative to small genomes are better covered in the STRING database (after the application of a conservative cutoff w = 900) than those relative to bigger genomes. Interestingly, we have shown Figs. ( and ) in Supplementary Information) that, indeed, the PPI adjacency matrices of bacteria with small genomes are dominated by the interactions constituting the ribosomal complex. In the adjacency matrices of the PPIs of bacteria with bigger genomes, the cloud of interactions between the proteins of nonessential genes tends to superpose to the ever-present ribosomal core. In conclusion, we believe to have convincingly shown that bacterial PPIs are characterised by the presence of a highly connected structure, associated with the ribosomal functions, and particularly visible in bacteria with small genomes. We believe that the observations we have presented here could be of some utility for the prediction of gene essentiality, based on the knowledge of PPI networks, and for the prediction of interactions between proteins, based on genetic information [44, 45]. It is interesting to note that our results are consistent with a previous study based on inferred bacterial co- conserved networks based on phylogenetic profiles [36]. This work suggests to further and systematically investigate how the structure of the PPI networks is correlated with multiple networks at the genetic level, at least in unicellular organisms. In particular, we believe that a recent approach based on the introduction of the multiple-layer networks could be of great potential interest (e.g. to search for a general scheme behind antimicrobial resistance [46-50]).

Table 1

Summary of the selected bacterial dataset. Organism name, abbreviation, class, RefSeq, STRING code, size of genome (number of genes n). Genomes annotated in the Database of Essential Genes (DEG) are highlighted with bold fonts. Classes are:Alphaproteobacteria(1), Betaproteobacteria(2), Gammaproteobacteria(3), Epsilonpro- teobacteria(4), Actinobacteria(5), Bacilli(6), Bacteroidetes(7), Clostridia(8), Deinococci(9), Mollicutes(10), Spirochaetales(11), Aquificae(12), Cyanobacteria(13), Chlamydiae(14), Fu- sobacteria(15), Thermotoga(16).

Organisms	Abbr.	Class	Ref Seq	STRING	n
Mycoplasma genitalium G37	myge	10	NC 000908	243273	475
Buchnera aphidicola Sg uid57913	busg	2	NC 004061	198804	546
Mycoplasma pneumoniae M129	mypn	10	NC 000912.1	272634	648
Mycoplasma pulmonis UAB CTIP	mypu	10	NC 002771	272635	782
Chlamydia trachomatis D/UW-3/CX	chtr	14	NC 000117.1	272561	894
Treponema pallidum Nichols	trpa	11	NC 000919.1	243276	1036
Helicobacter pylori 26695	hepy	4	NC 000915	85962	1469
Aquifex aeolicus VF5	aqae	12	NC 000918	224324	1497
Campylobacter jejuni	caje	4	NC 002163	192222	1572
Haemophilus influenzae Rd KW20	hain	3	NC 000907.1	71421	1610
Streptococcus pyogenes NZ131	stpy	6	NC 011375	471876	1700
Francisella novicida U112	frno	3	NC 008601	401614	1719
Thermotoga maritima MSB8	thma	16	NC 000853.1	243274	1858
Neisseria gonorrhoeae FA 1090 uid57611	nego	2	NC 002946	242231	1894
Fusobacterium nucleatum ATCC 25586	funu	15	NC 003454.1	190304	1983
Brucella melitensis bv. 1 str. 16M	brme	1	NC 003317.1	224914	2059
Porphyromonas gingivalis ATCC 33277	pogi	7	NC 010729	431947	2089
Streptococcus sanguinis	stsa	6	NC 009009	388919	2270
Vibrio cholerae N16961	vich	3	NC 002505	243277	2534
Staphylococcus aureus N315	stau	6	NC 002745.2	158879	2582
Deinococcus radiodurans R1	dera	9	NC 001263.1	243230	2629
Agrobacterium tumefaciens (fabrum)	agtu	1	NC 003062	176299	2765
Xylella fastidiosa 9a5c	xyfa	3	NC 002488	160492	2766
Staphylococcus aureus NCTC 8325	stau	6	NC 007795	93061	2767
Listeria monocytogenes EGD-e	limo	6	NC 003210.1	169963	2867
Synechocystis sp. PCC 6803	sysp	13	NC 000911.1	1148	3179
Burkholderia thailandensis E264	buth	2	NC 007651	271848	3276
Sinorhizobium meliloti 1021	sime	1	NC 003047.1	266834	3359
Burkholderia pseudomallei K96243	bups	3	NC 006350	272560	3398
Ralstonia solanacearum GMI1000	raso	2	NC 003295.1	267608	3436
Clostridium acetobutylicum ATCC 824	clac	8	NC 003030.1	272562	3602
Caulobacter crescentus	cacr	1	NC 011916	565050	3885
Mycobacterium tuberculosis H37Rv	mytu	5	NC 000962.3	83332	3936
Escherichia Coli K-12 MG1655	esco	3	NC 000913.3	511145	4004
Shewanella oneidensis MR-1	shon	3	NC 004347	211586	4065
Bacillus subtilis 168	basu	6	NC 000964	224308	4175
Salmonella enterica serovar Typhi	saen	3	NC 004631	209261	4352
Bacteroides thetaiotaomicron VPI-5482	bath	7	NC 004663	226186	4778
Sphingomonas wittichii RW1	spwi	1	NC 009511	392499	4850
Pseudomonas aeruginosa UCBPP-PA14	psae	3	NC 008463	208963	5892
Mesorhizobium loti MAFF303099	melo	1	NC 002678.2	266835	6743
Rickettsia prowazekii str. Madrid E	ripr	1	NC 000963.1	272947	8433

Table 2

Functional classification of COG clusters.

COG ID	Functional Classification
Information Storage and Processing
J	Translation, ribosomal structure and biogenesis KTranscription
L	Replication, recombination and repair
Cellular Processes and Signaling
D	Cell cycle control, cell division, chromosome partitioning T Signal transduction mechanisms
M	Cell wall/membrane/envelope biogenesis N Cell motility
O	Post-translational modification, protein turnover, chaperones
Metabolism
C	Energy production and conversion
G	Carbohydrate transport and metabolism
E	Amino acid transport and metabolism
F	Nucleotide transport and metabolism
H	Coenzyme transport and metabolism
I	Lipid transport and metabolism
P	Inorganic ion transport and metabolism

Table 3

Relative density values r for PPI subnetworks between essential genes (rēē), between nonessential genes (reē) and between essential and nonessential genes ree, for each DEG-annotated bacterial genome.

Organisms	r_ee	r_eē	r_ēē
basu	44.46	0.80	0.11
bath	20.07	0.76	0.25
bups	6.21	0.83	0.27
buth	18.69	0.70	0.22
cacr	18.40	0.70	0.15
caje	3.65	0.82	0.32
esco	2.91	0.88	0.31
frno	9.84	0.52	0.18
hain	1.65	1.15	0.27
hepy	2.91	0.78	0.38
myge	1.42	0.29	0.08
mypu	3.42	0.22	0.12
mytu	8.09	0.78	0.23
pogi	11.03	0.41	0.21
psae	9.85	0.92	0.16
saen	28.80	0.81	0.12
shon	6.50	0.64	0.16
spwi	15.47	0.74	0.22
stau	23.05	0.58	0.23
stau	21.89	0.64	0.16
stpy	9.30	0.73	0.23
stsa	30.65	0.61	0.22
vich	8.37	0.81	0.19

Table 4

Specific hubs. In this table we detail which proteins populate the few bins of connectivity around k = 60 in Fig. ().

k	COG	Gene	Protein
57	1250I	paaH	3-hydroxyadipyl-CoA dehydrogenase, NADdependent
	0365I	acs	acetyl-CoA synthetase
58	0222J	rplL	50S ribosomal subunit protein L7/L12
	0335J	rplS	50S ribosomal subunit protein L19
	0267J	rpmG	50S ribosomal subunit protein L33
	0365I	acs	acetyl-CoA synthetase
59	0183I	paaJ	3-oxoadipyl-CoA3-oxo-5,6-dehydrosuberyl-CoA thiolase
	1960I	ydiO	putative acyl-CoA dehydrogenase
	0183I	atoB	acetyl-CoA acetyltransferase
60	0197J	rplP	50S ribosomal subunit protein L16
	0088J	rplD	50S ribosomal subunit protein L4
	0197J	rplP	50S ribosomal subunit protein L16
	0087J	rplC	50S ribosomal subunit protein L3
	1960I	aidB	putative acyl-CoA dehydrogenase
61	0085K	rpoB	RNA polymerase, beta subunit
	0202K	rpoA	RNA polymerase, alpha subunit
62	0087J	rplC	50S ribosomal subunit protein L3
	0052J	rpsB	30S ribosomal subunit protein S2
	2965L	PriB	ribosomal replication protein

Table 5

Genes belonging to COG J with average degree bigger than 40 Fig. (). All these genes are conserved, common to all species (ERI=1), and drive the transition shown in Fig. ().

COG	Genes Name	< k >
COG0097J	50S ribosomal protein L6	60.24
COG0087J	50S ribosomal protein L3	60.19
COG0197J	50S ribosomal protein L16	60.19
COG0090J	50S ribosomal protein L2	60.14
COG0080J	50S ribosomal protein L11	60.12
COG0088J	50S ribosomal protein L4	60.12
COG0081J	50S ribosomal protein L1	58.19
COG0089J	50S ribosomal protein L23	57.88
COG0102J	50S ribosomal protein L13	57.45
COG0094J	50S ribosomal protein L5	57.21
COG0092J	30S ribosomal protein S3	57.12
COG0098J	30s ribosomal protein S5	57.10
COG0093J	50S ribosomal protein L14	57.00
COG0091J	50S ribosomal protein L22	56.24
COG0049J	30S ribosomal protein S7	55.31
COG0051J	30S ribosomal protein S10	55.24
COG0200J	50S ribosomal protein L15	55.12
COG0256J	50S ribosomal protein L18	54.86
COG0203J	50S ribosomal protein L17	54.43
COG0244J	50S ribosomal Protein L10	54.19
COG0100J	30S ribosomal protein S11	53.76
COG0522J	30S ribosomal protein S4	53.43
COG0096J	30S ribosomal protein S8	53.10
COG0099J	30S ribosomal protein S13	52.88
COG0048J	30S ribosomal protein S12	52.14
COG0198J	50S ribosomal protein L24	50.83
COG0185J	30S ribosomal protein S19	50.52
COG0199J	30S ribosomal protein S14	50.45
COG0103J	30S ribosomal protein S9	49.45
COG0480J	tetracycline resistance protein. tetM	47.90
COG0052J	30S ribosomal protein S2	47.69
COG0184J	30S ribosomal protein S15	45.95
COG0186J	30S ribosomal protein S17	44.60
COG0255J	50S ribosomal protein L29	43.95
COG0222J	50S ribosomal protein L7/L12	42.43
COG1841J	50S ribosomal protein L30	40.71

47 in total

Review 1. The tandem affinity purification (TAP) method: a general procedure of protein complex purification.

Authors: O Puig; F Caspary; G Rigaut; B Rutz; E Bouveret; E Bragado-Nilsson; M Wilm; B Séraphin
Journal: Methods Date: 2001-07 Impact factor: 3.608

2. Lethality and centrality in protein networks.

Authors: H Jeong; S P Mason; A L Barabási; Z N Oltvai
Journal: Nature Date: 2001-05-03 Impact factor: 49.962

3. Comparative assessment of large-scale data sets of protein-protein interactions.

Authors: Christian von Mering; Roland Krause; Berend Snel; Michael Cornell; Stephen G Oliver; Stanley Fields; Peer Bork
Journal: Nature Date: 2002-05-08 Impact factor: 49.962

4. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria.

Authors: I King Jordan; Igor B Rogozin; Yuri I Wolf; Eugene V Koonin
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

Review 5. Global approaches to protein-protein interactions.

Authors: Gerard Drewes; Tewis Bouwmeester
Journal: Curr Opin Cell Biol Date: 2003-04 Impact factor: 8.382

6. Expanded microbial genome coverage and improved protein family annotation in the COG database.

Authors: Michael Y Galperin; Kira S Makarova; Yuri I Wolf; Eugene V Koonin
Journal: Nucleic Acids Res Date: 2014-11-26 Impact factor: 16.971

7. Interaction network containing conserved and essential protein complexes in Escherichia coli.

Authors: Gareth Butland; José Manuel Peregrín-Alvarez; Joyce Li; Wehong Yang; Xiaochun Yang; Veronica Canadien; Andrei Starostine; Dawn Richards; Bryan Beattie; Nevan Krogan; Michael Davey; John Parkinson; Jack Greenblatt; Andrew Emili
Journal: Nature Date: 2005-02-03 Impact factor: 49.962