| Literature DB >> 32549913 |
Pourya Naderi Yeganeh1,2, Chrsitine Richardson3, Erik Saule2, Ann Loraine4, M Taghi Mostafavi2.
Abstract
The use of graph theory models is widespread in biological pathway analyses as it is often desired to evaluate the position of genes and proteins in their interaction networks of the biological systems. In this article, we argue that the common standard graph centrality measures do not sufficiently capture the informative topological organizations of the pathways, and thus, limit the biological inference. While key pathway elements may appear both upstream and downstream in pathways, standard directed graph centralities attribute significant topological importance to the upstream elements and evaluate the downstream elements as having no importance.We present a directed graph framework, Source/Sink Centrality (SSC), to address the limitations of standard models. SSC separately measures the importance of a node in the upstream and the downstream of a pathway, as a sender and a receiver of biological signals, and combines the two terms for evaluating the centrality. To validate SSC, we evaluate the topological position of known human cancer genes and mouse lethal genes in their respective KEGG annotated pathways and show that SSC-derived centralities provide an effective framework for associating higher positional importance to the genes with higher importance from a priori knowledge. While the presented work challenges some of the modeling assumptions in the common pathway analyses, it provides a straight-forward methodology to extend the existing models. The SSC extensions can result in more informative topological description of pathways, and thus, more informative biological inference.Entities:
Keywords: Biological networks; Network analysis; Pathway analysis
Year: 2020 PMID: 32549913 PMCID: PMC7296696 DOI: 10.1186/s13040-020-00214-x
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Linear regression fit of the quantile-normalized centrality scores (Eq. 20) and the percentage human pathway genes that are cancer-related (Eq. 21). The Source/Sink extension of the centrality models show higher slope and adjusted coefficient of determination (Adjusted r-squared) in comparison to the standard variations of the centrality models (Table 1)
Linear regression fit of the quantile-normalized centrality scores (Eq. 20) and the percentage human pathway genes that are cancer-related
| Centrality | term | estimate | std.error | statistic | |
|---|---|---|---|---|---|
| Degree | (Intercept) | 1.50e+01 | 1.30e+00 | 1.15e+01 | 4.98e-20 |
| Degree | Coefficient | 1.37e-01 | 2.24e-02 | 6.11e+00 | 2.01e-08 |
| Katz-Sink | (Intercept) | 2.55e+01 | 2.31e+00 | 1.11e+01 | 8.26e-19 |
| Katz-Sink | Coefficient | 6.23e-03 | 3.93e-02 | 1.58e-01 | 8.74e-01 |
| Katz-Source | (Intercept) | 2.84e+01 | 2.57e+00 | 1.10e+01 | 1.49e-18 |
| Katz-Source | Coefficient | -5.97e-02 | 4.28e-02 | -1.39e+00 | 1.67e-01 |
| Katz-SSC | (Intercept) | 1.45e+01 | 1.12e+00 | 1.30e+01 | 4.23e-23 |
| Katz-SSC | Coefficient | 1.38e-01 | 1.93e-02 | 7.13e+00 | 1.64e-10 |
| Lap-Sink | (Intercept) | 2.49e+01 | 1.89e+00 | 1.32e+01 | 1.92e-23 |
| Lap-Sink | Coefficient | 1.65e-02 | 3.23e-02 | 5.12e-01 | 6.09e-01 |
| Lap-Source | (Intercept) | 2.91e+01 | 2.50e+00 | 1.17e+01 | 8.94e-20 |
| Lap-Source | Coefficient | -7.29e-02 | 4.14e-02 | -1.76e+00 | 8.18e-02 |
| Lap-SSC | (Intercept) | 1.27e+01 | 1.03e+00 | 1.24e+01 | 8.89e-22 |
| Lap-SSC | Coefficient | 1.78e-01 | 1.78e-02 | 9.99e+00 | 1.12e-16 |
| PageRank-Sink | (Intercept) | 1.33e+01 | 2.28e+00 | 5.85e+00 | 7.09e-08 |
| PageRank-Sink | Coefficient | 1.91e-01 | 3.86e-02 | 4.93e+00 | 3.43e-06 |
| PageRank-Source | (Intercept) | 1.27e+01 | 1.54e+00 | 8.23e+00 | 1.37e-12 |
| PageRank-Source | Coefficient | 1.77e-01 | 2.55e-02 | 6.94e+00 | 5.88e-10 |
| PageRank-SSC | (Intercept) | 7.58e+00 | 9.78e-01 | 7.76e+00 | 7.88e-12 |
| PageRank-SSC | Coefficient | 2.67e-01 | 1.69e-02 | 1.58e+01 | 8.22e-29 |
| PageRank-Und | (Intercept) | 9.06e+00 | 9.39e-01 | 9.65e+00 | 6.36e-16 |
| PageRank-Und | Coefficient | 2.33e-01 | 1.62e-02 | 1.43e+01 | 6.49e-26 |
Fig. 2Comparison of the cumulative density between cancer-related genes and normal genes. The data points represent the quantile-scores calculated based on normalized centrality (Formula 23) across all pathways. Each panel includes the p-value of Kolmogorov-Smirnov test for the hypothesis of the CDF of cancer genes being below that of the normal genes. The panels show that cancer genes tend to have higher centrality values according to all of the models. This indicates the individual values of source and sink components for capturing the topological importance of cancer genes. Asterisk marks denote the p-values generate by the KS test method in R
Pathways identified with higher mean centrality for cancer genes by t-test
| Deg | Katz | Lap | Pgr | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Deg | Sink | So | SSC | Sink | So | SSC | So | Si | SSC | Und | |
| Degree | 5 | 0 | 2 | 4 | 0 | 2 | 1 | 0 | 1 | 0 | 2 |
| Katz-Sink | 6 | 0 | 1 | 5 | 0 | 1 | 2 | 0 | 2 | 1 | |
| Katz-Source | 2 | 2 | 0 | 2 | 1 | 0 | 1 | 0 | 0 | ||
| Katz-SSC | 5 | 1 | 2 | 1 | 0 | 1 | 0 | 1 | |||
| Lap-Sink | 17 | 0 | 4 | 3 | 0 | 3 | 1 | ||||
| Lap-Source | 8 | 2 | 0 | 1 | 0 | 0 | |||||
| Lap-SSC | 15 | 2 | 1 | 2 | 1 | ||||||
| Pgr-Sink | 5 | 1 | 3 | 1 | |||||||
| Pgr-Source | 2 | 1 | 1 | ||||||||
| Pgr-SSC | 5 | 2 | |||||||||
| Pgr-Und | 7 | ||||||||||
Pathways identified with higher mean centrality for cancer genes by Wilcox test
| Deg | Katz | Lap | Pgr | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Deg | Sink | So | SSC | Sink | So | SSC | So | Si | SSC | Und | |
| Degree | 9 | 1 | 4 | 9 | 1 | 2 | 2 | 1 | 3 | 3 | 9 |
| Katz-Sink | 13 | 1 | 2 | 10 | 1 | 3 | 12 | 2 | 10 | 7 | |
| Katz-Source | 9 | 5 | 1 | 5 | 2 | 4 | 6 | 4 | 8 | ||
| Katz-SSC | 13 | 2 | 2 | 2 | 3 | 4 | 5 | 11 | |||
| Lap-Sink | 20 | 2 | 6 | 13 | 2 | 11 | 6 | ||||
| Lap-Source | 8 | 3 | 4 | 7 | 4 | 5 | |||||
| Lap-SSC | 15 | 7 | 5 | 6 | 3 | ||||||
| Pgr-Sink | 25 | 8 | 20 | 13 | |||||||
| Pgr-Source | 16 | 12 | 9 | ||||||||
| Pgr-SSC | 31 | 15 | |||||||||
| Pgr-Und | 29 | ||||||||||
Fig. 3Sensitivity analysis of PageRank variations in the linear regression analysis with respect to the α parameter. Panel A shows the adjusted r-squared of the linear fit per α(Formula 22). Panel B displays the negative log p-value of the difference between the correlation coefficients of SSC PageRank versus undirected PageRank. The red line in panel B denotes the significance threshold of p-value=0.05
Fig. 5Linear regression fit of the quantile-normalized centrality scores (Eq. 20) and the percentage mouse pathway genes that are lethal (Eq. 21). The Source/Sink extension of the centrality models show higher slope and adjusted coefficient of determination (Adjusted r-squared) in comparison to the standard variations of the centrality models (Table 4)
Linear regression fit of the quantile-normalized centrality scores (Eq. 20) and the percentage of mouse pathway genes that are lethal
| Centrality | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|
| Degree | (Intercept) | 8.08e+00 | 5.86e-01 | 1.38e+01 | 1.19e-24 |
| Degree | Coefficient | 1.14e-02 | 1.01e-02 | 1.13e+00 | 2.62e-01 |
| Katz-Sink | (Intercept) | 1.08e+01 | 1.07e+00 | 1.01e+01 | 9.91e-17 |
| Katz-Sink | Coefficient | -2.27e-02 | 1.79e-02 | -1.27e+00 | 2.08e-01 |
| Katz-Source | (Intercept) | 1.32e+01 | 1.09e+00 | 1.21e+01 | 6.70e-21 |
| Katz-Source | Coefficient | -5.63e-02 | 1.83e-02 | -3.08e+00 | 2.69e-03 |
| Katz-Source-Sink | (Intercept) | 8.47e+00 | 6.78e-01 | 1.25e+01 | 4.66e-22 |
| Katz-Source-Sink | Coefficient | 6.40e-03 | 1.17e-02 | 5.46e-01 | 5.86e-01 |
| Lap-Sink | (Intercept) | 1.03e+01 | 1.15e+00 | 8.89e+00 | 4.62e-14 |
| Lap-Sink | Coefficient | -1.06e-02 | 1.94e-02 | -5.48e-01 | 5.85e-01 |
| Lap-Source | (Intercept) | 1.30e+01 | 1.15e+00 | 1.13e+01 | 3.82e-19 |
| Lap-Source | Coefficient | -4.50e-02 | 1.93e-02 | -2.33e+00 | 2.20e-02 |
| Lap-SSC | (Intercept) | 5.51e+00 | 7.05e-01 | 7.82e+00 | 5.70e-12 |
| Lap-SSC | Coefficient | 6.65e-02 | 1.22e-02 | 5.46e+00 | 3.54e-07 |
| PageRank-Sink | (Intercept) | 8.72e+00 | 1.05e+00 | 8.28e+00 | 1.17e-12 |
| PageRank-Sink | Coefficient | 1.04e-02 | 1.74e-02 | 6.01e-01 | 5.50e-01 |
| PageRank-Source | (Intercept) | 9.15e+00 | 1.15e+00 | 7.93e+00 | 4.85e-12 |
| PageRank-Source | Coefficient | 8.83e-03 | 1.94e-02 | 4.56e-01 | 6.49e-01 |
| PageRank-SSC | (Intercept) | 5.92e+00 | 5.84e-01 | 1.01e+01 | 5.26e-17 |
| PageRank-SSC | Coefficient | 5.24e-02 | 1.01e-02 | 5.20e+00 | 1.09e-06 |
| PageRank-Und | (Intercept) | 7.12e+00 | 5.26e-01 | 1.35e+01 | 2.81e-24 |
| PageRank-Und | Coefficient | 2.67e-02 | 9.08e-03 | 2.94e+00 | 4.03e-03 |
Fig. 6Comparison of the cumulative density between lethal genes and normal (non-lethal) genes. The data points represent the quantile-scores calculated based on normalized centrality (Formula 23) across all pathways. Each panel includes the p-value of Kolmogorov-Smirnov test for the hypothesis of the CDF of lethal genes being below that of the normal genes. The panels show that lethal genes tend to have higher centrality values according to some of the models, including Source/Sink PageRank and Source/Sink Laplacian
Pathways identified with higher mean centrality for mouse lethal genes by Wilcox test (FDR<0.25)
| Lap | Pgr | |||||
|---|---|---|---|---|---|---|
| Sink | So | SSC | Sink | So | SSC | |
| Lap-Sink | 1 | 0 | 1 | 0 | 1 | 0 |
| Lap-Source | 1 | 0 | 0 | 1 | 1 | |
| Lap-SSC | 4 | 0 | 2 | 0 | ||
| Pgr-Sink | 2 | 0 | 0 | |||
| Pgr-Source | 6 | 2 | ||||
| Pgr-SSC | 4 | |||||
Pathways identified with higher mean centrality for mouse lethal genes by t-test (FDR<0.25)
| Lap | Pgr | ||
|---|---|---|---|
| Sink | SSC | Sink | |
| Lap-Sink | 1 | 1 | 0 |
| Lap-SSC | 6 | 0 | |
| Pgr-Sink | 1 | ||