| Literature DB >> 22383870 |
Santiago Treviño1, Yudong Sun, Tim F Cooper, Kevin E Bassler.
Abstract
Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect co-regulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22383870 PMCID: PMC3285575 DOI: 10.1371/journal.pcbi.1002391
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Distribution of gene relatedness and network size in the E. coli CLR network.
(A) Probability distribution of relatedness values, , between pairs of genes in E. coli calculated using the CLR algorithm and the full dataset. (B) Size of the largest connected component for relatedness value, . At small values of the network is fully connected but begins to break up into multiple disconnected components at a critical value of approximately .
Figure 2Correlation matrix showing community structure found in the E. coli network with relatedness threshold values .
Genes are ordered in the same sequence along the x and y axes beginning in the upper left corner, and this ordering is the same for all three relatedness values (gene order is given in SI). The matrix element in the position is colored blue, red, or green if genes and are in the same community at threshold values 2, 4 or 6, respectively. The density of the color indicates the strength of the correlation in the partitionings of the pair of genes. For example, considering the correlation between a pair of genes in the 10 replicate partitionings performed on the network, dark and light red indicates that the pair of genes are always and rarely found to be in the same community, respectively. The red, green and blue colors corresponding to thresholds, respectively, are combined to indicate the correlations of each pair of genes at all three threshold values. Thus, the color of the matrix element in the position is white if genes and are in the same community at all three threshold values. It is purple (yellow) if the two genes are in the same community at thresholds 2 and 4 (4 and 6), but not at threshold 6 (2) and it is black if the two genes are not in the same community at any of the three threshold values. A list of the order of genes is given in Dataset S2. A full size version with each pixel representing a distinct pair of genes is given in Figure S1.
Figure 3Change in core community structure as noise is increased from to .
The grey scale value of each element indicates the fraction of times the two genes occurred in the same community over replicate community partitionings. If the element is white (black) the two genes were always (never) found in the same community. At each noise value there are clearly white diagonal blocks indicating sets of genes that are always found in the same community, which we refer to as core communities. Note that, the five core communities at (Figure 3A) are in the same order in Figure 3:B, C, D, and E. Within each of the five core communities of Figure 3A, the node order is allowed to change in Figure 3:B, C, D, and E in order to display the largest subcommunity first. For each panel, he list of of the order of genes and the core community they belong to is given in Dataset S5 and Dataset S6, respectively. A full size version with each pixel representing a distinct pair of genes is included in Figure S3.
Figure 4The effect of noise on core community structure and GO term enrichment.
(A) Proportion of core community nodes that remain in a core community. (B) The number of significant GO term enrichments as a function of noise level for networks constructed with . If a GO term is enriched by more than one community, each enrichment is counted separately.
The 25 most relevant relationships found for without noise.
| P value | GO term num | Com size | GO size | In common | Description |
| 8.41e-42 | 9288 | 72 | 24 | 24 | bacterial-type flagellum |
| 9.57e-39 | 6826 | 53 | 37 | 25 | iron ion transport |
| 8.22e-38 | 1539 | 72 | 28 | 24 | ciliary or flagellar motility |
| 3.67e-35 | 6412 | 826 | 101 | 79 | translation |
| 6.51e-34 | 3735 | 826 | 56 | 54 | structural constituent of ribosome |
| 3.08e-31 | 3723 | 826 | 105 | 77 | RNA binding |
| 1.73e-29 | 6935 | 72 | 22 | 19 | chemotaxis |
| 4.30e-29 | 3774 | 72 | 17 | 17 | motor activity |
| 5.38e-29 | 9425 | 72 | 17 | 17 | bacterial-type flagellum basal body |
| 2.06e-25 | 19861 | 72 | 15 | 15 | flagellum |
| 5.61e-25 | 5506 | 53 | 210 | 31 | iron ion binding |
| 3.72e-24 | 19843 | 826 | 42 | 40 | rRNA binding |
| 6.98e-23 | 6811 | 53 | 79 | 22 | ion transport |
| 6.99e-22 | 30529 | 826 | 36 | 35 | ribonucleoprotein complex |
| 1.72e-21 | 5840 | 826 | 38 | 36 | ribosome |
| 6.62e-21 | 8652 | 247 | 62 | 32 | cellular amino acid biosynthetic process |
| 4.11e-17 | 5506 | 139 | 210 | 39 | iron ion binding |
| 6.66e-16 | 9055 | 139 | 116 | 29 | electron carrier activity |
| 7.30e-15 | 51539 | 139 | 98 | 26 | 4 iron, 4 sulfur cluster binding |
| 8.22e-15 | 15453 | 300 | 15 | 15 | oxidoreduction-driven active transmembrane transporter activity light-driven active transmembrane transporter activity |
| 1.85e-13 | 6865 | 247 | 70 | 27 | amino acid transport |
| 6.13e-13 | 45272 | 300 | 13 | 13 | plasma membrane respiratory chain complex I |
| 9.19e-13 | 30964 | 300 | 13 | 13 | NADH dehydrogenase complex |
| 1.97e-12 | 9060 | 300 | 21 | 16 | aerobic respiration |
| 2.15e-12 | 5515 | 826 | 875 | 251 | protein binding calmodulin binding |
The “P value” or random probability, calculated with a hypergeometric test with Benjamini-Hochberg correction, of the common occurrence, or overlap, of genes in an inferred community and in a GO term for the 25 most statistically relevant relationships are listed. Also listed are the “GO term num” that distinguishes the GO term and its “Description” in the GO database, the number of genes in the GO term “GO size”, the number of genes in the inferred community “Com size”, and the number of genes they have in common “In common.” The complete set of the 239 relevant relationships found for , as well as the relevant relationships found for , are given in Dataset S7.
Genes in the community at that enriches GO:3735 structural constituent of ribosome.
| Genes in the GO Term | Genes not in GO Term |
| rplA, rplB, rplC, rplD, rplE, rplF, rplI, rplJ, rplK, rplL, rplM, rplN, rplO, rplP, rplQ, rplR, rplS, rplU, rplV, rplW, rplX, rplY, | cdsA, cmk, dnaG, dusB, efp, fis, fusA, gidB, gmk, infB, ispU, lpxB, mnmG, mrdA, murA, nusA, nusG, obgE, parE, |
| rpmA, rpmB, rpmC, rpmD, rpmE, rpmG, rpmH, rpmJ, rpsA, rpsB, rpsC, rpsD, rpsE, rpsF, rpsG, rpsH, rpsI, rpsJ, rpsK, | ppa, prfC, priB, pyrH, queA, rbfA, rho, rimM, rlmN, rnhB, rnpA, rpoA, rpoZ, secE, secG, secY, speA, speB, tff, tig, |
| rpsL, rpsM, rpsN, rpsO, rpsP, rpsQ, rpsR, rpsS, rpsT, rpsU, | trmA, trmD, trmI, truB, truC, tsf, typA, yadB, yggN, ygiQ, yhbC, yhbE, yhbY, yidC, yidD, yqcC |
Figure 5Links connecting operons in the community that enriches for genes involved in ribosome structure.
CLR links are in light blue, RegulonDB links are in black. Small symbols are genes that are not in the community, but are regulators of genes that are in the community and are therefore candidates for mediating indirect interactions between community genes. Symbol shape and color indicate attributes as follows: red, transcription factors; dark blue, ppGpp regulated promoter by direct assay [54]; light blue, ppGpp regulated translation related promoter by microarray [55]; pink, other; hexagon, promoter; diamond, promoter; square, promoter; circle, unknown sigma factor. Note that very few interactions observed in the CLR network can be explained by the direct interactions annotated in RegulonDB. The high proportion of ppGpp sensitive promoters among operons contained in the community suggests this molecule as a good candidate for regulating the remaining interactions. The network layout was determined by the circular layout option in Cytoscape 2.8.1, no particular significance should be attached to operons being outside the main circle.