| Literature DB >> 18237422 |
Guanrao Chen1, Peter Larsen, Eyad Almasri, Yang Dai.
Abstract
BACKGROUND: The reconstruction of genetic regulatory networks from microarray gene expression data has been a challenging task in bioinformatics. Various approaches to this problem have been proposed, however, they do not take into account the topological characteristics of the targeted networks while reconstructing them.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18237422 PMCID: PMC2275249 DOI: 10.1186/1471-2105-9-75
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Results for the Symmetric-N algorithm with the 100-node simulated network. Panels (a) and (b) show the results when sample size S is fixed (S = 25) while the number of neighbors N is varying. Panels (c) and (d) show the results when N is fixed (N = 5) while S is varying. The upper panels (a) and (c) show the results for Recall, Precision and F-Score. The lower panels (b) and (d) show the results for γ and R2. The parameter pair <γ, R2> for the underlying network structure are <-1.27, 0.96> for in-degree distribution, <-1.61, 0.97> for out-degree distribution, and <-1.22, 0.92> for mixed-degree distribution, respectively.
Figure 2Results for the Asymmetric-N algorithm with the 100-node simulated network. Panels (a) and (b) show the results when S and Nare fixed (S = 25, N= 2) while Nis varying. Panels (c) and (d) show the results when S and Nare fixed (S = 25, N= 91) while Nis varying. Panels (e) and (f) shows the results when Nand Nare fixed (N= 91, N= 2) while S is varying.
Figure 3Recall, precision and imprecision curves obtained with the Asymmetric-N algorithm for the 20-node simulated network when Nand Nare fixed (N= 17, N= 1) while S is varying. The imprecision is defined as 1 - precision.
Results of Asymmetric-N on the 102-gene dataset
| 101 | 4 | 142 | 20 | 14.08 | 38.03 | -0.69 | 0.45 | |
| 101 | 16 | 185 | 31 | 16.76 | 30.81 | -0.96 | 0.79 | |
| 91 | 1 | 60 | 17 | 28.33 | 45.83 | -1.27 | 0.65 | |
| 91 | 11 | 155 | 29 | 18.71 | 27.74 | -0.89 | 0.70 |
P-P PCC means point to point (total 18 points) Pearson correlation coefficient between two time series profiles.
S-S PCC means segment to segment (total 17 segments) Pearson correlation coefficient and the segment (say i) value is +1 (-1) if the value at point i is less (greater) than that at point i + 1 [53].
Time lag means when aligning the two gene profiles, one of them needs to be shifted relative to the other.
#Edges means the total interactions reconstructed;
#Published means the reconstructed interactions that were previously published;
%Published is the percentage of the published interactions among all the reconstructed interactions;
%GO BP means the percentage of the reconstructed interactions whose genes or gene products pair share a common Gene Ontology (GO) Biological Process (BP) annotation from the SGD GO Slim mapper [46];
γ and R2 are the power in P(k) ~ kand coefficient of determination returned by the fit() function, respectively (see 'Results' – 'Computation Study' section for more details).
Figure 4The 100-node simulated network and its node degree distributions. Core nodes are the 10 nodes that form the initial network. Periphery nodes are the remaining nodes that are (preferentially) attached (see 'Methods' – 'Dataset' section for more details).
Figure 5The 102-gene network and its node degree distributions. Core nodes are the 9 transcription factors. Periphery nodes are the remaining non-transcription factors. The edges are obtained from Pathway Studio [45] (see 'Methods' – 'Dataset' section for more details).