| Literature DB >> 18828895 |
Shigeru Saito1, Sachiyo Aburatani, Katsuhisa Horimoto.
Abstract
BACKGROUND: A knowledge-based network, which is constructed by extracting as many relationships identified by experimental studies as possible and then superimposing them, is one of the promising approaches to investigate the associations between biological molecules. However, the molecular relationships change dynamically, depending on the conditions in a living cell, which suggests implicitly that all of the relationships in the knowledge-based network do not always exist. Here, we propose a novel method to estimate the consistency of a given network with the measured data: i) the network is quantified into a log-likelihood from the measured data, based on the Gaussian network, and ii) the probability of the likelihood corresponding to the measured data, named the graph consistency probability (GCP), is estimated based on the generalized extreme value distribution.Entities:
Mesh:
Year: 2008 PMID: 18828895 PMCID: PMC2566979 DOI: 10.1186/1752-0509-2-84
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Flow of the calculation of graph consistency probability. The calculation is composed of five steps (see details in the text).
Figure 2Robustness in terms of the noise in measured data. GCP(=P(l(G0))) for the graph in Fig. 1 was calculated with simulated data with distinct standard deviations, and the frequencies of GCPs are plotted against the probability degree. The horizontal axis indicates the log(GCP) value, and the vertical axis is its frequency: black-colored bar, σ = 0.5; gray-colored bar, σ = 1.0; and boxed bar, σ = 2.0.
Figure 3Robustness regarding graph structure variation. The calculation is composed of five steps (see details in the text). Three networks with typical structures in biology are examined in (A), (B), and (C). To generate the simulation data by structural equations, we set the standard deviation to 0.1 in all three graphs, and the path coefficients between the variables are as follows: (A) α1,2 = 0.6, α2,3 = 0.3, α3,4 = 0.1, α4,5 = 0.7, α5,6 = 0.8, α6,7 = 0.9, α7,8 = 0.2, α8,9 = 0.5, and α9,10 = 0.4; (B) α1,2 = 0.1, α1,3 = 0.2, α1,4 = 0.3, α1,5 = 0.4, α1,6 = 0.5, α1,7 = 0.6, α1,8 = 0.7, α1,9 = 0.8, and α1,10 = 0.9; and (C) α1,2 = 0.5, α1,3 = 0.7, α2,4 = 0.4, α2,5 = 0.8, α3,6 = 0.6, α3,7 = 0.3, α4,8 = 0.2, α5,9 = 0.1, α6,9 = 1.0, and α7,10 = 0.9. The value of log-likelihood and the parameters of GEV distribution in the respective networks are as follows: (A) l(G0) = 163.4805, μ = 89.8375, σ = 12.9694, and ξ = -0.1743; (B) l(G0) = 61.6096, μ = 3.0217, σ = 12.5220, and ξ = -0.1314; and (C) l(G0) = 124.8894, μ = 46.9002, σ = 12.1395, and ξ = -0.1406. See also the corresponding GEV plots at additional file 5: Robustness regarding the network structure variation.
Figure 4Evaluation of the transcriptional network of the SOS DNA repair system in . The network is schematically shown in (A), and the corresponding GEV plots and the box-plot are also shown in (B). The value of log-likelihood between the examined network and the measured data is -1168.453, and the parameters of GEV distribution are as follows: μ, -1179.079, σ, 4.957; ξ, -0.236. The data for the promoter activities of eight genes in the SOS system are cited from [24].
Consistency of the twenty-nine networks with expression profiles measured under anaerobic conditions in Escherichia coli
| No. | ID | Description | node | edge | |
| 1 | C9333 | detoxification | 6 | 8 | 1.000 |
| 2 | C9448 | amino acids | 6 | 9 | 1.000 |
| 3 | C9449 | carbon compounds | 6 | 9 | 1.000 |
| 4 | C9426 | colanic acid (M antigen) | 6(7) | 9(11) | 1.000 |
| 5 | C9509 | operon | 6(7) | 9(11) | 1.000 |
| 6 | C9448, C9462 | amino acids, formyl-THF biosynthesi | 7 | 10 | 1.000 |
| 7 | C9449 | carbon compounds | 8(9) | 7(8) | 1.000 |
| 8 | C9331 | motility, chemotaxis, energytaxis | 9 | 8 | 0.998 |
| 9 | C9340 | flagella | 9 | 8 | 0.647 |
| 10 | C9362 | nucleoproteins, basic proteins | 9 | 8 | 0.925 |
| 11 | C9401 | tryptophan | 9 | 8 | 1.000 |
| 12 | C9449 | carbon compounds | 9 | 8 | 1.000 |
| 13 | C9376 | cytoplasm | 10 | 9 | 1.000 |
| 14 | C9449 | 10 | 9 | ||
| 15 | C9449 | carbon compounds | 10 | 11 | 0.976 |
| 16 | C9337 | SOS response | 11 | 10 | 0.127 |
| 17 | C9354 | DNA repair | 11 | 10 | 0.068 |
| 18 | C9383 | arginine | 11 | 10 | 1.000 |
| 19 | C9474 | nucleotide and nucleoside conversion | 11 | 15 | 0.378 |
| 20 | C9493 | fermentation | 11 | 10 | 1.000 |
| 21 | C9376 | cytoplasm | 12 | 11 | 0.302 |
| 22 | C9393 | isoleucine/valine | 13 | 12 | 1.000 |
| 23 | C9420 | purine biosynthesis | 13 | 12 | 1.000 |
| 24 | C9394 | leucine | 14 | 17 | 1.000 |
| 25 | C9504 | phosphorous metabolism | 23 | 22 | 1.000 |
| 26 | C9528 | repressor | 52(53) | 77(79) | 1.000 |
| 27 | C9523 | activator | 58(59) | 92(93) | 1.000 |
| 28 | C9490 | 89(91) | 161(162) | ||
| 29 | C9372 | Transcription related | 91(93) | 143(146) | 0.772 |
GCP values with less than 5% significance probability are indicated in bold type. The ID in the classification scheme by EcoCyc [44] and the corresponding gene function are denoted in the second and third columns, respectively. Two networks in the functions C9448 and C9462 are composed of the same constituent genes with the same connectivity. In the following columns, the numbers of nodes and edges of the analyzed networks are denoted: the original network was constructed based on the information about the relationship between the transcription factor and its regulated genes in EcoCyc, and the analyzed network was constructed from the original network by excluding the genes that were not found in the expression profile data from NCBI GEO (accession number: GSE1107) [25]. The numbers of nodes and edges of the original networks are denoted in parentheses. The graph consistency probability (GCP) is denoted in the last column.
Figure 5Networks with 5% significance probability in graph consistency search. By corresponding between the regulatory relationships and the gene functions in EcoCyc [44], 29 regulatory networks were reconstructed, and their consistency with the expression profiles measured under anaerobic conditions (accession number GSE1107 in NCBI Gene Expression Omnibus (GEO); ) [25] was examined. Among the 29 regulatory networks, two networks showed 5% significance probability: the network related with carbon compounds (EcoCyc ID: C9449_11) (A) and that with anaerobic respiration (EcoCyc ID: C9490_1) (B). The details of the network structures of the 29 regulatory networks are shown in the additional file 6: the 29 network structures analyzed in the present study.