Literature DB >> 34220298

Bacterial Protein Interaction Networks: Connectivity is Ruled by Gene Conservation, Essentiality and Function.

Maddalena Dilucca1, Giulio Cimini2, Andrea Giansanti3.   

Abstract

BACKGROUND: Protein-protein interaction (PPI) networks are the backbone of all processes in living cells. In this work, we relate conservation, essentiality and functional repertoire of a gene to the connectivity k (i.e. the number of interactions, links) of the corresponding protein in the PPI network.
METHODS: On a set of 42 bacterial genomes of different sizes, and with reasonably separated evolutionary trajectories, we investigate three issues: i) whether the distribution of connectivities changes between PPI subnetworks of essential and nonessential genes; ii) how gene conservation, measured both by the evolutionary retention index (ERI) and by evolutionary pressures, is related to the connectivity of the corresponding protein; iii) how PPI connectivities are modulated by evolutionary and functional relationships, as represented by the Clusters of Orthologous Genes (COGs).
RESULTS: We show that conservation, essentiality and functional specialisation of genes constrain the connectivity of the corresponding proteins in bacterial PPI networks. In particular, we isolated a core of highly connected proteins (connectivities k≥40), which is ubiquitous among the species considered here, though mostly visible in the degree distributions of bacteria with small genomes (less than 1000 genes).
CONCLUSION: The genes that support this highly connected core are conserved, essential and, in most cases, belong to the COG cluster J, related to ribosomal functions and the processing of genetic information.
© 2021 Bentham Science Publishers.

Entities:  

Keywords:  Protein-protein interactions; bacterial genomes; cellular processes; clusters of orthologous genes; evolutionary retention index; gene essentiality

Year:  2021        PMID: 34220298      PMCID: PMC8188579          DOI: 10.2174/1389202922666210219110831

Source DB:  PubMed          Journal:  Curr Genomics        ISSN: 1389-2029            Impact factor:   2.236


INTRODUCTION

To operate biological activities in living cells, proteins work in association with other proteins, often assembled in large complexes. Hence, knowing the interactions of a protein is important to understand its cellular functions. Moreover, a comprehensive description of the stable and transient protein-protein interactions (PPIs) within a cell would facilitate the functional annotation of all gene products, and provide insight into the higher-order organization of the proteome [1, 2]. Several methodologies have been developed to detect PPIs, and have been adapted to chart interactions at the proteome-wide scale. These methods, combining different technologies, experiments and computational analyses, generate PPI networks of sufficient reliability, enabling the assignment of several proteins to functional categories [3, 4]. Moreover, the statistical study of bacterial PPIs over several species (meta-interactomes) has brought important knowledge about protein functions and cellular processes [5, 6]. Our aim here is to shed some light on the relationships among conservation, essentiality and functional annotation at the genetic level and connectivities of PPI networks, at the protein level. We extend here our previous observations made on the PPI of E. coli which suggested a strong correlation between the connectivity of PPI networks on the one hand, and codon bias, gene conservation and essentiality on the other hand [7, 8]. In the next two paragraphs, it is worth making more precise what is usually meant by gene essentiality and gene conservation. Individual genes in the genome differently contribute to the survival of an organism. According to their known functional profiles and based on experimental evidence, genes can be divided into two categories: essential and nonessential ones [9, 10]. Essential genes are not dispensable for the survival of an organism in the environment it lives in [10, 11]. Nonessential genes are instead those which are dispensable [12], being related to functions that can be silenced without compromising the survival of the organism. Naturally, each species has adapted to one or more evolving environments and, plausibly, genes that are essential for one species may not be essential for another one. It has been argued many times that essential genes are more conserved than nonessential ones [13-17]. The term ‟conservation” has, however, at least two meanings. On the one hand, a gene is conserved if orthologous copies of it are found in the genomes of many species, as measured by the Evolutionary Retention Index (ERI) [9, 18]. On the other hand, a gene is (evolutionarily) conserved when it is subject to a purifying, selective evolutionary pressure, which disfavors mutations. A common measure of evolutionary pressure is K, the ratio of the number of non-synonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site. In this second meaning, a conserved gene is, in a nutshell, a slowly evolving gene, a gene that hardly incorporates mutations [13, 19]. To measure the evolutionary pressures exerted on the genes of low, intermediate and high connectivity bacterial proteins, we use here K, and to measure evolutionary patterns of codon bias, we use the Effective Number of Codons (ENC) plots. The main finding of this work is the presence of a functional transition in bacterial PPI networks, ruled by degree connectivity k. The genes of proteins with high connectivities are under selective pressure, conserved, and essential. Below the transition (k<50), the functional repertoire of low connectivity proteins is heterogeneous, whereas the genes of proteins with k>50 mainly belong to the Cluster of Orthologous Genes (COG) J (related to translation, ribosomal structure and biogenesis), with just a few interesting hubs belonging to COGs I (Lipid transport and metabolism), K (Transcription) and L (Replication, recombination and repair). Moreover, we show here that in the degree distribution of each bacterial PPI network, there is a ubiquitous trace of an almost-invariant structure of conserved hubs, essentially due to the ribosomal protein complexes, mostly visible in the networks of bacteria with small genomes.

MATERIALS AND METHODS

Bacterial Dataset and Protein-protein Interaction Networks

We consider a set of 42 bacterial genomes (that we have previously investigated in [8]), shown in Table . Nucleotide sequences were downloaded from the FTP server of the National Center for Biotechnology Information [20]. These genomes were chosen in order to have a reasonably broad coverage of data concerning conservation, essentiality and selective pressure. PPIs are obtained from the STRING database (Known and Predicted Protein-Protein Interactions, https://string db.org/) [21].We have chosen STRING because of its quite broad coverage of different bacterial species, useful to extend to multiple species we studied [7]. In STRING, each interaction is assigned with a confidence level or probability w, evaluated by comparing predictions obtained by different techniques [22-24] with a set of reference associations, namely the functional groups of KEGG (Kyoto Encyclopedia of Genes and Genomes) [25]. In this way, interactions with high w are likely to be true positives, whereas a low w possibly corresponds to a false positive. As usually done in the literature, we consider only interactions with w≥ 0.9, a threshold that provides a fair balance between coverage and interaction reliability (see, for instance, the case study on E. coli reported in reference [7]). We denote the degree (number of connections) associated to each protein in each PPI network after the thresholding procedure by k. It is to be noted also that after applying the cut-off, we are left, for each network, with a number of isolated proteins (singletons, with no connections) that grow as (where n is the number of proteins in the genome). These isolated proteins are not considered in the network analysis and are regarded as stemming from statistical noise or just appear isolated because the PPI data is incomplete. It is known that PPIs of some species in our dataset might be much better known than others (e.g. E. coli). To take into account a potential bias in the dataset, we checked in Fig. () of the Supplementary Information (bottom panel) that the densities of PPIs are high for small genomes and tend to be constant and not so different from that of E. coli in bacteria with bigger genomes, among which we collect here highly investigated pathogens. The distinction between small and big genomes is a key emergent point in this work. We divided the set of 42 bacterial genomes into three groups, according to the number n of their genes: a) n<1000, b) 1000< n<3000 and c) n>3000. In several figures in the Supplementary Information, we have addressed the dependence of various network properties on the size of the genome.

Gene Conservation

The Evolutionary Retention Index (ERI) [9] is a way of measuring the degree of conservation of a gene. In the present study, the ERI of a gene is the fraction of genomes, among those reported in Table , with at least an orthologous (same COG label) of the given gene. Then, as reminded in the Introduction, a low ERI value is related to a gene which is rather specific, common to a small number of genomes; whereas high ERI is characteristic of highly shared, putatively universal and essential genes. We also make reference to another notion of gene conservation. Conserved genes are those which are subject to a purifying, conservative evolutionary pressure. To discriminate between genes subject to purifying selection and genes subject to positive selective Darwinian evolution, we use a classic but still widely used indicator, the ratio K/K between the number of nonsynonymous substitutions per nonsynonymous site (K) and the number of synonymous substitutions per synonymous site (K) [19]. Conserved genes are characterized by K 1. We used K estimates by Luo [15] that are based on the method by Nej and Gojobori [26].

Gene Essentiality

We used the Database of Essential Genes (DEG, ) [15], which classifies a gene as either essential or nonessential, on the basis of a combination of experimental evidence (null mutations or transposons) and general functional considerations. DEG collects genomes from Bacteria, Archaea and Eukarya, with different degrees of coverage [27, 28]. Of the 42 bacterial genomes we considered, only 23 are covered-in total or partially-by DEG, as indicated in Table .

K / K

K/K is the ratio of nonsynonymous substitutions per nonsynonymous site (K) to the number of synonymous substitutions per synonymous site (K) [19]. This parameter is widely accepted as a straightforward and effective way of separating genes subject to purifying evolutionary selection (K 1) from genes subject to positive selective Darwinian evolution (K 1). There are different methods to evaluate this ratio, though the alternative approaches are quite consistent among themselves. For the sake of comparison, we have used here the K estimates by Luo et al. [15], which are based on the Nej and Gojobori method [26]. It must be noted that each genome has a specific average level of K/K [7]. Average values of K are shown for low, intermediate, and high connectivity bins of genes.

ENC Plot

The ENC-plot is a well-known tool to investigate the patterns of synonymous codon usage in which the ENC (Effective Number of Codons) values are plotted against GC3 Guanine and Cytosine Content at the third codon position. The formula of ENC values expected under the hypothesis of pure mutational bias (no selection) is given by: (1) where s represents the value of GC3 [29]. When the corresponding points fall near the expected neutral curve, mutations that enforce the typical mutational bias of the species are the main factor affecting the observed codon diversity. Whereas when the corresponding points fall considerably below the expected curve, the observed codon usage bias of the species is mainly affected by natural selection. To quantitatively represent the balance between mutational bias and selective natural pressure, we parametrise the ENC formula to be used in non-linear fits to the experimental data: (2) ENC plots of genes corresponding to low, intermediate and high connectivity proteins are shown in Fig. () of the Supplementary Information. The best-fit parameters for the three groups of genes are collected in Table .

Clusters of Orthologous Proteins

We use the functional annotation given in the database of orthologous groups of proteins (COGs) from Koonin’s group, available at [30, 31]. We consider 15 functional COG categories Table , excluding the generic categories R and S for which functional annotation is either too general or missing.

RESULTS AND DISCUSSION

Degree Distribution of PPI Networks. We start by studying the degree distributions P (k) observed in bacterial PPIs. We first recall that such distribution was found to be scale-free in E. coli [7, 32-34], meaning that the corresponding PPI network features a large number of poorly connected proteins and a relatively small number of highly connected hubs. In order to assess the generality of this observation, we compute P (k) for each genome in Table (plots are reported in Figs. ( and ) of the Supplementary Information). Note that, despite the fact that PPI networks of different bacteria have different sizes and densities, their average connectivity and the support of their P(k) are very similar. Thus, we can superpose all the considered bacterial degree distributions without the need to normalise the support of each P (k). When doing so, we observe two distinct regimes (Fig. ). For low values of k < 40, the distribution is approximately scale-free: P(k) k (γ = 2.48). This scaling behaviour is consistent with previous studies on the genomes of yeast, worms and flies [35] and on co-conserved PPIs in some bacteria [36]. The scale-free nature of bacterial PPIs is still a matter of debate, and a rough discussion of the origin of this feature is out of the scope of this paper. In this work, we generally confirm that, as said above, there is, as expected, a large number of poorly connected proteins and a small number of hubs. Remarkably, for higher values of k, the distribution deviates from a power law, and a bump with a Gaussian-like shape emerges. This feature, visible for k 40 may be due to the contribution of proteins be- longing to large complexes [37]. From the whole set of observations presented in this paper, the bump in the P (k) is due to the complexity of ribosomal interactions. Indeed, if one recalculates the degree distribution of a dataset in which the ribosomal proteins are removed, the bump is not present (Fig. (), empty dots). Moreover, if we consider the separate contribution of essential and nonessential genes to the P (k) (for DEG-annotated genomes), we see that the bump is present only in the degree distribution of essential genes. It is to be noted also that the degree distributions for essential and nonessential genes are well separated and the average degree is systematically higher for essential genes than for nonessential ones, consistently with previous findings [35]. Remarkably, we have shown in a previous paper [8] that the number of essential genes in bacteria is close to 500 and does not depend on the size of the genome. To correctly interpret the emergence of the bump in the average P (k) in Fig. (), it is worth pointing out the distinction between small and not so small genomes. In the small genomes, almost all the genes are essential, and among the essential genes, those belonging to COG J (functions related to translation and ribosomal structure and biogenesis) play a major and ubiquitous role. In Fig. (), we have checked that the bump that emerges in Fig. () as a feature of essential and conserved genes is quite visible in the P(k) of small genomes, whereas, there seems a confusion in the case of bigger genomes. This might be interpreted as a dilution effect; in the networks of bigger genomes, there are a lot of specific interactions besides the essential ones. Then, averaging P (k) over small, intermediate and big genomes (Fig. in Supplementary Information), we can safely interpret the bump as an emerging feature due to a core of highly connected proteins (connectivities k≥40), which is mostly contributed, in the average, by degree distributions from PPIs of bacteria with small genomes (Figs. S2-S4). From all the considerations above, we exclude that this bump, observed here for the first time, could emerge just because that part of the PPI is much more investigated than other subnetworks. It is there because the ribosome is there, in all bacteria (Table ).

PPI Connectivity and Gene Conservation

We now investigate whether the connectivity k of a protein in a PPI network drives a transition in the degree of conservation (as measured by ERI) of the corresponding genes. Fig. () displays the average value and the spread of ERI in genes relative to bins of proteins that are iso-connected in the PPIs of different species. As a general feature, we observe that, on average, the genes of highly connected proteins are highly conserved among the bacterial species we consider that constitute a reasonably wide sample of different evolutionary adaptations. The same Fig. () shows that if k 50, then the ERI highly fluctuates between different samples of proteins with the same k, in different species. For high connectivities (above k = 50), the ERI is close to 1, with a drastic drop in the fluctuation (as shown in the inset). This observation points to the existence of an almost-invariant structure of conserved hubs, in each bacterial PPI, sustained by highly conserved genes. We can conclude, as a rule of thumb, that a protein with connectivity degree of 40 or more is likely to be coded by a gene shared by at least 80% of the species in a generic pool of bacteria. At the moment, we do not have a general explanation for this apparent threshold. Let us just propose, as a heuristic observation, the existence of an almost-critical value of connectivity to be set between 40 and 50, that corresponds to the connectivity of the core of proteins specifically involved, as we have alluded to in the previous paragraph, to the ubiquitous ribosomal functions (Tables and ).

Evolutionary Pressure and PPI Connectivity

We then look at the evolutionary pressure exerted on genes whose proteins have different connectivities. The graph in Fig. () shows the ratio K for groups of genes binned by the connectivity k of the corresponding proteins, for all the 42 bacterial species in Table . As is well known, this ratio K provides a straightforward indication of the balance between a positive driving Darwinian selection (when the numerator prevails) and a purifying, stabilising selection (acting against change in genes for which the denominator prevails). We see that the more connected proteins correspond to genes that are subject to an increasing purifying evolutionary pressure. Indeed, the ratio (K) is less than 1 in all bins of connectivity and systematically decreases as a function of k. A decreasing ratio generally indicates an increasing role of purifying, conservative, Darwinian, evolutionary pressure on the corresponding set of genes. This is a reasonable result, pointing out that the groups of genes that support conserved structures of connectivity in the PPIs are more constrained, in evolution than the genes of less interacting proteins. To add evidence to this observation, we have also considered ENC plots for sets of genes binned by the connectivities of the corresponding proteins. Interestingly, the ENC data in Fig. () of Supplementary Information are fully consistent with those in Fig. (). In the ENC plots, the points associated with low connectivity proteins (red) are closer to the so-called Wright’s profile (represented there as solid black lines) than those associated to proteins with intermediate and high connectivities (green and blue lines). Fig. () stresses this observation in a more quantitative way by showing that in the ENC plots, the average distance from Wright’s profile monotonously increases with k. Overall, the above results clearly indicate that codon bias and GC content of high connectivity genes are more under selective Darwinian pressure than genes coding for low-connectivity proteins, in which the rate of accepted mutations is mainly ruled by neutral mutational bias. These observations point out that the almost-invariant structure of protein hubs we alluded to in the previous paragraph, is supported by an underlying set of genes that are under strong mutational control; an expected result, perhaps, but clearly seen, here, as a general feature associated with ribosomal ubiquitous and conserved functions. PPI and Essentiality. To further investigate the relationship between gene essentiality and protein connectivities, we consider DEG-annotated genomes and classify interactions between proteins (links) making reference to the essentiality of the corresponding genes. We distinguish three sets of links:|ee| (linking proteins from two essential genes), |ēē| (from two nonessential genes) and |eē| (from an essential gene and a nonessential one). We then compute the density of these sets of links respectively as: (3) where E and NE denote the number of essential and nonessential genes, respectively (self-connection are excluded in our analysis). The denominator is the maximum possible value of the numerator, corresponding to the fully-connected graph. Such densities are then compared with the overall density of the network-restricted to genes classified as either essential or nonessential: (4) We use the ratios to assess the level of connectivity of the subnetworks with respect to the overall connectivity. Table shows that subnetworks of essential genes are far denser than the overall networks, and that, in general, essential and nonessential genes tend to form network components that are weakly interconnected. This happens because many essential genes encode for ribosomal proteins, which in turn are localised in the ribosomal complex where they have a high probability of interacting [39] Table of [8], which shows approximately 25% of essential genes fall into COG J. Figs. ( and ) of the Supplementary Information collect the superposed adjacency matrices of the |ee| (red dots), |eē| (violet dots) and |ēē| (blue dots) subnetworks that display such network features for each individual species. These graphs confirm the dominance of the interactions between the proteins of essential genes (red dots) in the small genomes. The adjacency matrices of bacteria with intermediate and big genomes are dominated by interactions involving proteins supported by nonessential genes (blue dots). PPI Connectivity and Functional Specialisation. For each PPI network, we define the conditional probability (Bayes’ theorem) that a protein with degree k belongs to a given COG as: (5) where P (k) is the degree distribution in the PPI network, P (COG) is the frequency of that COG in the proteome, and P (k COG) is the degree distribution restricted to that COGs. Fig. () shows the COG spectrum as a function of k over all the bacteria here considered. Interestingly, we again note a marked transition. Below k 40, the COG spectrum is quite heterogeneous: genes corresponding to proteins with low connectivity are spread over several COGs, which correspond to different functions (Table ). The transition shows that proteins with more than 40 interactions are likely to be coded by genes belonging to COG J. There are yet a handful of outliers, hubs with connectivities between 57 and 62, that belong to COG I (related to lipid transport and metabolism) and K and L (which, together with J, dεfine the functional class of information storage and processing). The list of these outliers is reported in Table . Interestingly, they correspond to RNA polymerases and to enzymes involved in acetate metabolism. But, which are the genes of COG J that drive the transition? In the next Fig. (), we show which genes are the main characters in the transition. We then investigate the connectivities of the highly conserved (ERI=1, shared by all the species in Table ) genes belonging to COG J and whose proteins have connectivities bigger than 40. These highly shared genes corresponding to cores of highly connected ribosomal proteins are listed in Table . In the heat map of Fig. (), we sort each gene in the COG J in order of descending degree, species by species, and we see there is a core of genes (in red, lower left sector) that correspond to highly connected proteins, which are also highly shared (ERI =1, see Table ) among all the species we considered. It is quite clear that in the heat map of Fig. () the 42 species in this study can be split into at least two groups (see the cladogram on the left). In one group the group of species at the Bottom in Fig. () there is a shared set of genes (the red band at the bottom-left side of the heat map) corresponding to a common core of highly connected ribosomal proteins. This remarkable observation suggests that the species in this group (namely, Synechocystis sp. PCC 6803, Escherichia coli K-12 MG1655, Clostridium acetobutylicum ATCC 824, Mycobacterium tuberculosis H37Rv, Sphingomonas wittichii RW1, Vibrio cholerae N16961, Burkholderia thailandensis E264, Rickettsia prowazekii str. Madrid E, Agrobacterium tumefaciens (fabrum), Ralstonia solanacearum GMI1000, Xylella fastidiosa 9a5c) should have a common structural and functional organisation of their ribosomes, an interesting point to be further investigated. In the rest of the species, the connectivity of the proteins, corresponding to the highly shared COG J genes, with k > 40 is more heterogeneous. We can conclude that the abrupt transition shown in Fig. () is driven by a subset of COG J genes which are highly conserved among a subset of species and are listed in Table . As one can see, these genes correspond to a specific subset of ribosomal proteins in the small and large subunits that should be further investigated in their functional and structural role.

CONCLUSION

Connectivity analysis of biological networks, such as protein-protein interaction or metabolic networks, has demonstrated that structural features of network subgraphs are correlated with biological functions [40, 41]. For instance, it was shown that highly connected patterns of proteins in a PPI are fundamental to cell viability [42]. In this work, we have shown the existence of a functional transition in bacterial species, ruled by the connectivity of proteins in the PPI networks (Fig. ). The critical threshold in k of the transition is located between k=40 and k=50. Proteins that have connectivities above the threshold are mostly encoded by genes that are conserved, under selective pressure (as measured both by ERI and K) and essentiality. Moreover, the functional repertoire above the threshold mainly focuses on the COG J (translation, ribosomal structure and biogenesis), with just a few interesting hubs belonging to COGs I (Lipid transport and metabolism), K (Transcription) and L (Replication, recombination and repair). Indeed, the PPI network of each bacterial species is characterised by a highly connected core of conserved ribosomal proteins, the components of multi-subunit complexes whose corresponding genes are mostly essential [32, 36] and code for supra-molecular complexes that pile up in the bump we have observed for the degree distribution (Fig. ). Hence, what we see here is essentially the ribosome and related protein complexes such as RNA Polymerase. Indeed, the ribosome is the only molecular machine in bacteria in which a given protein could legitimately have 40 or more protein binding partners, with the help of rRNA mediating interactions [43]. It is reasonable to admit that, since there are bacterial species that are much more investigated than others, comparative statistical studies of bacterial PPIs might be particularly biased by the choice of the sample of genomes to be included in the study. Our dataset is no exception. In order to address this hard to settle problem in our study, we have checked Fig. (S1) that in our study, we have included small genomes (i.e. less than 1000 genes) whose PPIs have densities (a rough proxy for the coverage of the interactions in the network) that are higher than those of bigger genomes. The group of small genomes comprises Buchnera, Chlamydia, and Mycoplasmas, whereas bigger genomes refer mostly to illustrious pathogens that are surely among the most investigated bacterial species. The densities of the networks of these species are quite similar and comparable with that of E. coli. As a general rule, and quite obviously, the networks relative to small genomes are better covered in the STRING database (after the application of a conservative cutoff w = 900) than those relative to bigger genomes. Interestingly, we have shown Figs. ( and ) in Supplementary Information) that, indeed, the PPI adjacency matrices of bacteria with small genomes are dominated by the interactions constituting the ribosomal complex. In the adjacency matrices of the PPIs of bacteria with bigger genomes, the cloud of interactions between the proteins of nonessential genes tends to superpose to the ever-present ribosomal core. In conclusion, we believe to have convincingly shown that bacterial PPIs are characterised by the presence of a highly connected structure, associated with the ribosomal functions, and particularly visible in bacteria with small genomes. We believe that the observations we have presented here could be of some utility for the prediction of gene essentiality, based on the knowledge of PPI networks, and for the prediction of interactions between proteins, based on genetic information [44, 45]. It is interesting to note that our results are consistent with a previous study based on inferred bacterial co- conserved networks based on phylogenetic profiles [36]. This work suggests to further and systematically investigate how the structure of the PPI networks is correlated with multiple networks at the genetic level, at least in unicellular organisms. In particular, we believe that a recent approach based on the introduction of the multiple-layer networks could be of great potential interest (e.g. to search for a general scheme behind antimicrobial resistance [46-50]).
Table 1

Summary of the selected bacterial dataset. Organism name, abbreviation, class, RefSeq, STRING code, size of genome (number of genes n). Genomes annotated in the Database of Essential Genes (DEG) are highlighted with bold fonts. Classes are:Alphaproteobacteria(1), Betaproteobacteria(2), Gammaproteobacteria(3), Epsilonpro- teobacteria(4), Actinobacteria(5), Bacilli(6), Bacteroidetes(7), Clostridia(8), Deinococci(9), Mollicutes(10), Spirochaetales(11), Aquificae(12), Cyanobacteria(13), Chlamydiae(14), Fu- sobacteria(15), Thermotoga(16).

Organisms Abbr. Class Ref Seq STRING n
Mycoplasma genitalium G37myge10NC 000908243273475
Buchnera aphidicola Sg uid57913busg2NC 004061198804546
Mycoplasma pneumoniae M129mypn10NC 000912.1272634648
Mycoplasma pulmonis UAB CTIPmypu10NC 002771272635782
Chlamydia trachomatis D/UW-3/CXchtr14NC 000117.1272561894
Treponema pallidum Nicholstrpa11NC 000919.12432761036
Helicobacter pylori 26695hepy4NC 000915859621469
Aquifex aeolicus VF5aqae12NC 0009182243241497
Campylobacter jejunicaje4NC 0021631922221572
Haemophilus influenzae Rd KW20hain3NC 000907.1714211610
Streptococcus pyogenes NZ131stpy6NC 0113754718761700
Francisella novicida U112frno3NC 0086014016141719
Thermotoga maritima MSB8thma16NC 000853.12432741858
Neisseria gonorrhoeae FA 1090 uid57611nego2NC 0029462422311894
Fusobacterium nucleatum ATCC 25586funu15NC 003454.11903041983
Brucella melitensis bv. 1 str. 16Mbrme1NC 003317.12249142059
Porphyromonas gingivalis ATCC 33277pogi7NC 0107294319472089
Streptococcus sanguinisstsa6NC 0090093889192270
Vibrio cholerae N16961vich3NC 0025052432772534
Staphylococcus aureus N315stau6NC 002745.21588792582
Deinococcus radiodurans R1dera9NC 001263.12432302629
Agrobacterium tumefaciens (fabrum)agtu1NC 0030621762992765
Xylella fastidiosa 9a5cxyfa3NC 0024881604922766
Staphylococcus aureus NCTC 8325stau6NC 007795930612767
Listeria monocytogenes EGD-elimo6NC 003210.11699632867
Synechocystis sp. PCC 6803sysp13NC 000911.111483179
Burkholderia thailandensis E264buth2NC 0076512718483276
Sinorhizobium meliloti 1021sime1NC 003047.12668343359
Burkholderia pseudomallei K96243bups3NC 0063502725603398
Ralstonia solanacearum GMI1000raso2NC 003295.12676083436
Clostridium acetobutylicum ATCC 824clac8NC 003030.12725623602
Caulobacter crescentuscacr1NC 0119165650503885
Mycobacterium tuberculosis H37Rvmytu5NC 000962.3833323936
Escherichia Coli K-12 MG1655esco3NC 000913.35111454004
Shewanella oneidensis MR-1shon3NC 0043472115864065
Bacillus subtilis 168basu6NC 0009642243084175
Salmonella enterica serovar Typhisaen3NC 0046312092614352
Bacteroides thetaiotaomicron VPI-5482bath7NC 0046632261864778
Sphingomonas wittichii RW1spwi1NC 0095113924994850
Pseudomonas aeruginosa UCBPP-PA14psae3NC 0084632089635892
Mesorhizobium loti MAFF303099melo1NC 002678.22668356743
Rickettsia prowazekii str. Madrid Eripr1NC 000963.12729478433
Table 2

Functional classification of COG clusters.

COG ID Functional Classification
Information Storage and Processing
JTranslation, ribosomal structure and biogenesis KTranscription
LReplication, recombination and repair
Cellular Processes and Signaling
DCell cycle control, cell division, chromosome partitioning T Signal transduction mechanisms
MCell wall/membrane/envelope biogenesis N Cell motility
O Post-translational modification, protein turnover, chaperones
Metabolism
CEnergy production and conversion
GCarbohydrate transport and metabolism
EAmino acid transport and metabolism
FNucleotide transport and metabolism
HCoenzyme transport and metabolism
ILipid transport and metabolism
PInorganic ion transport and metabolism
Table 3

Relative density values r for PPI subnetworks between essential genes (rēē), between nonessential genes (reē) and between essential and nonessential genes ree, for each DEG-annotated bacterial genome.

Organisms ree r rēē
basu44.460.800.11
bath20.070.760.25
bups6.210.830.27
buth18.690.700.22
cacr18.400.700.15
caje3.650.820.32
esco2.910.880.31
frno9.840.520.18
hain1.651.150.27
hepy2.910.780.38
myge1.420.290.08
mypu3.420.220.12
mytu8.090.780.23
pogi11.030.410.21
psae9.850.920.16
saen28.800.810.12
shon6.500.640.16
spwi15.470.740.22
stau23.050.580.23
stau21.890.640.16
stpy9.300.730.23
stsa30.650.610.22
vich8.370.810.19
Table 4

Specific hubs. In this table we detail which proteins populate the few bins of connectivity around k = 60 in Fig. ().

k COG Gene Protein
571250IpaaH3-hydroxyadipyl-CoA dehydrogenase, NADdependent
0365Iacsacetyl-CoA synthetase
580222JrplL50S ribosomal subunit protein L7/L12
0335JrplS50S ribosomal subunit protein L19
0267JrpmG50S ribosomal subunit protein L33
0365Iacsacetyl-CoA synthetase
590183IpaaJ3-oxoadipyl-CoA3-oxo-5,6-dehydrosuberyl-CoA thiolase
1960IydiOputative acyl-CoA dehydrogenase
0183IatoBacetyl-CoA acetyltransferase
600197JrplP50S ribosomal subunit protein L16
0088JrplD50S ribosomal subunit protein L4
0197JrplP50S ribosomal subunit protein L16
0087JrplC50S ribosomal subunit protein L3
1960IaidBputative acyl-CoA dehydrogenase
610085KrpoBRNA polymerase, beta subunit
0202KrpoARNA polymerase, alpha subunit
620087JrplC50S ribosomal subunit protein L3
0052JrpsB30S ribosomal subunit protein S2
2965LPriBribosomal replication protein
Table 5

Genes belonging to COG J with average degree bigger than 40 Fig. (). All these genes are conserved, common to all species (ERI=1), and drive the transition shown in Fig. ().

COG Genes Name < k >
COG0097J50S ribosomal protein L660.24
COG0087J50S ribosomal protein L360.19
COG0197J50S ribosomal protein L1660.19
COG0090J50S ribosomal protein L260.14
COG0080J50S ribosomal protein L1160.12
COG0088J50S ribosomal protein L460.12
COG0081J50S ribosomal protein L158.19
COG0089J50S ribosomal protein L2357.88
COG0102J50S ribosomal protein L1357.45
COG0094J50S ribosomal protein L557.21
COG0092J30S ribosomal protein S357.12
COG0098J30s ribosomal protein S557.10
COG0093J50S ribosomal protein L1457.00
COG0091J50S ribosomal protein L2256.24
COG0049J30S ribosomal protein S755.31
COG0051J30S ribosomal protein S1055.24
COG0200J50S ribosomal protein L1555.12
COG0256J50S ribosomal protein L1854.86
COG0203J50S ribosomal protein L1754.43
COG0244J50S ribosomal Protein L1054.19
COG0100J30S ribosomal protein S1153.76
COG0522J30S ribosomal protein S453.43
COG0096J30S ribosomal protein S853.10
COG0099J30S ribosomal protein S1352.88
COG0048J30S ribosomal protein S1252.14
COG0198J50S ribosomal protein L2450.83
COG0185J30S ribosomal protein S1950.52
COG0199J30S ribosomal protein S1450.45
COG0103J30S ribosomal protein S949.45
COG0480Jtetracycline resistance protein. tetM47.90
COG0052J30S ribosomal protein S247.69
COG0184J30S ribosomal protein S1545.95
COG0186J30S ribosomal protein S1744.60
COG0255J50S ribosomal protein L2943.95
COG0222J50S ribosomal protein L7/L1242.43
COG1841J50S ribosomal protein L3040.71
  47 in total

Review 1.  The tandem affinity purification (TAP) method: a general procedure of protein complex purification.

Authors:  O Puig; F Caspary; G Rigaut; B Rutz; E Bouveret; E Bragado-Nilsson; M Wilm; B Séraphin
Journal:  Methods       Date:  2001-07       Impact factor: 3.608

2.  Lethality and centrality in protein networks.

Authors:  H Jeong; S P Mason; A L Barabási; Z N Oltvai
Journal:  Nature       Date:  2001-05-03       Impact factor: 49.962

3.  Comparative assessment of large-scale data sets of protein-protein interactions.

Authors:  Christian von Mering; Roland Krause; Berend Snel; Michael Cornell; Stephen G Oliver; Stanley Fields; Peer Bork
Journal:  Nature       Date:  2002-05-08       Impact factor: 49.962

4.  Essential genes are more evolutionarily conserved than are nonessential genes in bacteria.

Authors:  I King Jordan; Igor B Rogozin; Yuri I Wolf; Eugene V Koonin
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

Review 5.  Global approaches to protein-protein interactions.

Authors:  Gerard Drewes; Tewis Bouwmeester
Journal:  Curr Opin Cell Biol       Date:  2003-04       Impact factor: 8.382

6.  Expanded microbial genome coverage and improved protein family annotation in the COG database.

Authors:  Michael Y Galperin; Kira S Makarova; Yuri I Wolf; Eugene V Koonin
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

7.  Interaction network containing conserved and essential protein complexes in Escherichia coli.

Authors:  Gareth Butland; José Manuel Peregrín-Alvarez; Joyce Li; Wehong Yang; Xiaochun Yang; Veronica Canadien; Andrei Starostine; Dawn Richards; Bryan Beattie; Nevan Krogan; Michael Davey; John Parkinson; Jack Greenblatt; Andrew Emili
Journal:  Nature       Date:  2005-02-03       Impact factor: 49.962

8.  The topology of the bacterial co-conserved protein network and its implications for predicting protein function.

Authors:  Anis Karimpour-Fard; Sonia M Leach; Lawrence E Hunter; Ryan T Gill
Journal:  BMC Genomics       Date:  2008-06-30       Impact factor: 3.969

9.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes.

Authors:  Ren Zhang; Yan Lin
Journal:  Nucleic Acids Res       Date:  2008-10-30       Impact factor: 16.971

Review 10.  Protein-protein interaction detection: methods and analysis.

Authors:  V Srinivasa Rao; K Srinivas; G N Sujini; G N Sunand Kumar
Journal:  Int J Proteomics       Date:  2014-02-17
View more
  1 in total

1.  Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life's Mechanism.

Authors:  Liya Kondratyeva; Irina Alekseenko; Igor Chernov; Eugene Sverdlov
Journal:  Biology (Basel)       Date:  2022-08-12
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.