| Literature DB >> 19822021 |
Gabriel del Rio1, Dirk Koschützki, Gerardo Coello.
Abstract
BACKGROUND: The prediction of essential genes from molecular networks is a way to test the understanding of essentiality in the context of what is known about the network. However, the current knowledge on molecular network structures is incomplete yet, and consequently the strategies aimed to predict essential genes are prone to uncertain predictions. We propose that simultaneously evaluating different network structures and different algorithms representing gene essentiality (centrality measures) may identify essential genes in networks in a reliable fashion.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19822021 PMCID: PMC2765966 DOI: 10.1186/1752-0509-3-102
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Reliable prediction of essential genes. Prediction of essential genes depends on both the quality of the network and the efficacy of the prediction algorithm to reproduce the features of essentiality. To account for these two factors, our method uses a set of n networks modelling a biological process, and m different centrality measures are applied to these to identify both the network(s) with the largest predictable set of essential genes and the centrality(ies) more effective to identify the essential genes. To determine the reliability of these predictions it is necessary to have access to the set of known essential genes.
Genetic metabolic networks used in this study
| H2O, ATP, ADP, NAD+, NADH, NADP+, NADPH, Oxygen | 636:10038 | 33.85 | |
| Ibid | 629:6590 | 26.81 | |
| Ibid | 634:7752 | 31.57 | |
| ibid | 621:5223 | 25.11 | |
| Ibid | 609:10518 | 34.24 | |
| Ibid | 602:7099 | 27.44 | |
| Ibid | 608:8130 | 31.89 | |
| Ibid | 595:5691 | 25.65 | |
| H2O, H+ | 990:8427 | 19.19 | |
| H2O, H+, Pi | 976:6995 | 18.27 | |
| H2O, H+, Pi, ATP | 976:6278 | 17.85 | |
| H2O, H+, Pi, ATP, Glu-L | 974:5742 | 17.60 | |
| H2O, H+, Pi, ATP, Glu-L, ADP | 969:5186 | 17.18 | |
| H2O, H+ | 634:4761 | 19.19 | |
| H2O, H+, Pi | 619:3963 | 18.27 | |
| H2O, H+, Pi, ATP | 618:3387 | 17.85 | |
| H2O, H+, Pi, ATP, Glu-L | 617:3122 | 17.60 | |
| H2O, H+, Pi, ATP, Glu-L, ADP | 613:2720 | 17.18 | |
The names of the KEGG and the iND750-derived networks are indicated under the "Network name" column. Note that the last five networks end with an "nh", indicating that hypothetical reactions were not included in these networks. The column labeled "Over-linked metabolites" indicates the metabolites being removed while reconstructing the network: H2O: water; H+: a proton; Pi: inorganic phosphorus; ATP: adenosine triphosphate; Glu-L: L-glutamate and ADP: adenosine diphosphate, NADP+/NADPH: Nicotinamide adenine dinucleotide phosphate, NAD/NADH: Nicotinamide adenine dinucleotide. The number of genes (values in bold) and gene-to-gene relationships (values in italics) of each graph are indicated separated by a colon in the Vertices:Edges column. The column labeled "Overlap (%)" indicates the percentage of known gene-to-gene relationships (network links) found in each reconstructed network reported in the probabilistic functional network of yeast genes [4] that included two metabolic genes. According to the procedure to build these networks, KEGG, KEGGpath, KEGG2 and KEGG2path networks are undirected.
Centrality measures used in this study
| Cu = kin(u) | Number of connections into node u | |
| Cu = kout(u) | Number of connections out from node u | |
| Cu = kout(u) + ∑w∩u kout(w); w is any neighbor of u. | Number of nodes at 1 or 2 connections from node u | |
| The fraction of connections between the neighbors of node u | ||
| Cu = max{dist(u, w): w ∈ V} | The distance between node u and the most distant node in the net. | |
| Cu = 1/max{dist(u, w): w ∈ V} | ||
| Average distance of node u to the rest of nodes in the net | ||
| Inverse of average distance | ||
| A node has a larger ckatz value while more paths reach it. | ||
| A node has a larger ckatz value while more paths leave the node. | ||
| cPR = d P cPR + ((1 - d) 1) | The centrality of a node depends on its incoming connections and the relative connectivity of these connections | |
| cPR = d PT cPR + ((1 - d) 1) | The centrality of a node depends on its outcoming connections and the relative connectivity of these connections | |
| The easiness of reaching node u from any other node | ||
| The easiness of reaching any node from node u | ||
| The fraction of shortest paths inside the network, which utilize node u | ||
In the table, kin(u), kout(u) and ktot(u) refer to the incoming, outgoing and total number of edges of node u. diamG refers to the diameter of the graph and dist(u, v) stands for the distance between nodes u and v. In clustering coefficient, |e| stands for the observed paths between the neighbours of a node. In Katz A is the adjacency matrix and α a damping factor. In PageRank d is a damping factor and P the transition matrix. In the formula for shortest path (SP) betweenness σG denotes the number of shortest path from s to t. For a more detailed description of these centralities, please read [17].
Figure 2The efficacy to predict essential metabolic genes from centrality measures. The maximum Area Under the ROC Curve (max AUC) obtained from the Mann-Whitney test (see Methods) applied to the 18 different metabolic networks used in this study (see Table 1) are shown in the plot as black squares. Each square represent the max AUC obtained from groups of 1, 2, 3 or 4 centrality measures used to differentiate essential from non-essential genes. The vertical lines crossing each square represent the confidence interval at 99%.
Comparison of statistical parameters used to estimate the efficacy to identify essential metabolic genes in yeast
| FBA | 23/90 | 0.13 | 1 | 0.87 | 82% | 22 | |
| FBA | 91/508 | 0.31 | 0.95 | 0.69 | 85% | 6 | |
| FBA | 79-146/562-629 | 0.40-0.53 | 90-96% | NR | NR | 33 | |
| FBA | 118/NR | 0.31 | NR | NR | NR | 10 | |
| FBA | NR/NR | 0.68-0.80 | 96-98% | NR | NR | 33 | |
| MOMA | 46/302 | 0.60 | 0.92 | 0.41 | 88% | 34 | |
| MOMA | NR/NR | 0.73-0.80% | NR | NR | NR | 33 | |
| SA | 100/NR | 0.14 | NR | NR | NR | 7 | |
Previous studies on the efficiency to identify essential metabolic genes on yeast, report different statistical parameters of confidence (Sensitivity, Specificity, Error and Accuracy columns), and commonly use a single network (Model column) and a single method (Method column) to identify essential genes. The data were obtained from the works reported in the "Reference" column. FBA: Flux-Balance Analysis; MOMA: Method of Minimization of Metabolic Adjustment; SA: Synthetic accessibility; NR: a value not reported in the cited reference.