| Literature DB >> 29666752 |
Chaoxing Li1, Li Liu2, Valentin Dinu2.
Abstract
Complex diseases such as cancer are usually the result of a combination of environmental factors and one or several biological pathways consisting of sets of genes. Each biological pathway exerts its function by delivering signaling through the gene network. Theoretically, a pathway is supposed to have a robust topological structure under normal physiological conditions. However, the pathway's topological structure could be altered under some pathological condition. It is well known that a normal biological network includes a small number of well-connected hub nodes and a large number of nodes that are non-hubs. In addition, it is reported that the loss of connectivity is a common topological trait of cancer networks, which is an assumption of our method. Hence, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal or the distribution of topological ranks of genes might be altered. Based on this, we propose a new PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) to detect pathways involved in cancer. We use PageRank to measure the relative topological ranks of genes in each biological pathway, then select hub genes for each pathway, and use Fisher's exact test to test if the number of hub genes in each pathway is altered from normal to cancer. Alternatively, if the distribution of topological ranks of gene in a pathway is altered between normal and cancer, this pathway might also be involved in cancer. Hence, we use the Kolmogorov-Smirnov test to detect pathways that have an altered distribution of topological ranks of genes between two phenotypes. We apply PoTRA to study hepatocellular carcinoma (HCC) and several subtypes of HCC. Very interestingly, we discover that all significant pathways in HCC are cancer-associated generally, while several significant pathways in subtypes of HCC are HCC subtype-associated specifically. In conclusion, PoTRA is a new approach to explore and discover pathways involved in cancer. PoTRA can be used as a complement to other existing methods to broaden our understanding of the biological mechanisms behind cancer at the system-level.Entities:
Keywords: Biological pathways; Fisher’s exact test; Hepatocellular carcinoma; Kolmogorov–Smirnov test; PageRank; PoTRA
Year: 2018 PMID: 29666752 PMCID: PMC5896492 DOI: 10.7717/peerj.4571
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Overview of the PoTRA method.
Figure 2The topological rank analysis for each gene within a pathway.
For genes within a specified pathway, according to Step 2, we construct a corresponding gene co-expression network for normal and cancer, separately. Then we apply the PageRank method to obtain the topological importance of each gene for normal and cancer, separately. PR(gene i)normal represents the PageRank score of the gene i for normal samples, while PR(gene i)cancer represents the PageRank score of the gene i for cancer samples.
Contingency table for Fisher’s exact test.
We use the 95th percentile of the distribution (one-tail) in normal samples as cutoff value for hub genes for both normal and cancer samples. The value “a” represents the number of genes whose PageRank scores are below the cutoff value for normal samples. The value “b” represents the number of genes whose PageRank scores are above the cutoff value for normal samples. The values “c” and “d” are the corresponding values for cancer. We use Fisher’s exact test to assess if the number of hub genes is significantly different between normal and cancer.
| Number of non-hub genes | Number of hub genes | Total | |
|---|---|---|---|
| Normal | a | b | a+b |
| Cancer | c | d | c+d |
| Total | a+c | b+d | a+b+c+d |
Figure 3The kernel density distribution of PageRank scores of genes in “MAPK signaling pathway”.
The red line shows the kernel density distribution of PageRank scores for cancer and the black one is for normal samples. Note that the mean for the two distributions is the same, i.e., mean = 1/N = 0.004, where N = 250 is the number of genes in the “MAPK signaling pathway” pathway. We use the 95th-percentile cutoff (=0.006035) of the kernel distribution in normal samples as cutoff for hub genes for both normal and cancer samples.
The “MAPK signaling pathway” pathway identified by PoTRA for HCC using Fisher’s exact test.
The P value is adjusted by False Discovery Rate (FDR).
| Gene Count(L) | # of edges_normal | # of edges_cancer | # of hub genes_normal | # of hub genes_cancer | Adjusted | |
|---|---|---|---|---|---|---|
| MAPK signaling pathway | 250 | 14,005 | 5,170 | 13 | 40 | 0.0158 |
The significant KEGG pathways identified by PoTRA for HCC using Fisher’s exact test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | # of hub genes_normal | # of hub genes_cancer | Adjusted | ||
|---|---|---|---|---|---|---|---|
| 1 | Pathways in cancer | 310 | 24,924 | 9,136 | 16 | 48 | 0.0081 |
| 2 | MAPK signaling pathway | 252 | 14,005 | 5,170 | 13 | 40 | 0.0158 |
| 3 | Breast cancer | 143 | 3,589 | 1,175 | 8 | 29 | 0.0278 |
The “MAPK signaling pathway” pathway identified by PoTRA for HCC using the Kolmogorov–Smirnov test.
The P value is adjusted by False Discovery Rate (FDR).
| Gene Count(L) | # of edges_normal | # of edges_cancer | Adjusted | |
|---|---|---|---|---|
| MAPK signaling pathway | 250 | 14,005 | 5,170 | 0.0278423 |
The significant KEGG pathways identified by PoTRA for HCC using the Kolmogorov–Smirnov test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | Adjusted | ||
|---|---|---|---|---|---|
| 1 | RNA transport | 133 | 7,168 | 5,343 | 6.72E−08 |
| 2 | mRNA surveillance pathway | 70 | 1,867 | 1,252 | 0.023328272 |
| 3 | MAPK signaling pathway | 250 | 14,005 | 5,170 | 0.027842298 |
The significant KEGG pathways identified by PoTRA for hepatitis B-induced HCC using Fisher’s exact test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | # of hub genes_normal | # of hub genes_cancer | Adjusted | ||
|---|---|---|---|---|---|---|---|
| 1 | Insulin signaling pathway | 139 | 2,692 | 958 | 7 | 34 | 0.0007 |
| 2 | Pathways in cancer | 310 | 11,194 | 3,792 | 16 | 52 | 0.0007 |
| 3 | Hippo signaling pathway | 151 | 2,836 | 970 | 8 | 31 | 0.0072 |
| 4 | HTLV-I infection | 194 | 5,518 | 2,080 | 10 | 35 | 0.0072 |
| 5 | Neurotrophin signaling pathway | 117 | 2,441 | 895 | 6 | 25 | 0.0195 |
| 6 | mTOR signaling pathway | 144 | 34,10 | 832 | 8 | 28 | 0.0240 |
| 7 | Epstein-Barr virus infection | 85 | 1,524 | 435 | 5 | 21 | 0.0353 |
| 8 | Hepatitis B | 134 | 2,708 | 828 | 7 | 25 | 0.0353 |
The significant KEGG pathways identified by PoTRA for hepatitis C-induced HCC using Fisher’s exact test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | # of hub genes_normal | # of hub genes_cancer | Adjusted | ||
|---|---|---|---|---|---|---|---|
| 1 | Pathways in cancer | 310 | 22,253 | 7,791 | 16 | 62 | 2.89E−06 |
| 2 | PI3K-Akt signaling pathway | 340 | 19,901 | 6,594 | 17 | 65 | 2.89E−06 |
| 3 | MAPK signaling pathway | 252 | 11,986 | 4,168 | 13 | 47 | 0.0003 |
| 4 | Proteoglycans in cancer | 204 | 9,815 | 3,642 | 11 | 38 | 0.0033 |
| 5 | Rap1 signaling pathway | 208 | 8,294 | 3,587 | 11 | 34 | 0.0215 |
| 6 | Adrenergic signaling in cardiomyocytes | 149 | 3,594 | 1,355 | 8 | 27 | 0.0372 |
| 7 | cAMP signaling pathway | 196 | 5,106 | 2,493 | 10 | 30 | 0.0372 |
| 8 | Focal adhesion | 203 | 10,225 | 4,656 | 11 | 32 | 0.0372 |
| 9 | HTLV-I infection | 194 | 9,843 | 4,030 | 10 | 30 | 0.0372 |
| 10 | Ras signaling pathway | 226 | 10,098 | 3,931 | 12 | 33 | 0.0376 |
| 11 | FoxO signaling pathway | 126 | 3,391 | 1,222 | 7 | 24 | 0.0380 |
| 12 | Osteoclast differentiation | 123 | 4,418 | 1,452 | 7 | 24 | 0.0380 |
| 13 | ErbB signaling pathway | 88 | 2,128 | 814 | 5 | 20 | 0.0400 |
| 14 | Axon guidance | 167 | 6,203 | 2,705 | 9 | 27 | 0.0433 |
The significant KEGG pathways identified by PoTRA for alcohol-induced HCC using Fisher’s exact test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | # of hub genes_normal | # of hub genes_cancer | Adjusted | ||
|---|---|---|---|---|---|---|---|
| 1 | PI3K-Akt signaling pathway | 340 | 23,928 | 8,733 | 17 | 55 | 0.0006 |
| 2 | MAPK signaling pathway | 252 | 14,005 | 5,767 | 13 | 46 | 0.0007 |
| 3 | Pathways in cancer | 310 | 24,924 | 10,191 | 16 | 47 | 0.0043 |
The significant KEGG pathways identified by PoTRA for hepatitis C-induced HCC using the Kolmogorov–Smirnov test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | Adjusted | ||
|---|---|---|---|---|---|
| 1 | RNA transport | 133 | 6,877 | 4,001 | 3.33E−06 |
| 2 | Pathways in cancer | 310 | 22,253 | 7,791 | 0.0055638 |
| 3 | Proteoglycans in cancer | 204 | 9,815 | 3,642 | 0.0055638 |
| 4 | MAPK signaling pathway | 250 | 11,986 | 4,168 | 0.01535456 |
| 5 | PI3K-Akt signaling pathway | 340 | 19,901 | 6,594 | 0.01975723 |
| 6 | HTLV-I infection | 194 | 9,843 | 4,030 | 0.01975723 |
The significant KEGG pathways identified by PoTRA for alcohol-induced HCC using the Kolmogorov–Smirnov test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | Adjusted | ||
|---|---|---|---|---|---|
| 1 | RNA transport | 133 | 7,168 | 4,816 | 2.94E−08 |
| 2 | Pathways in cancer | 310 | 24,924 | 10,191 | 0.0062832 |
| 3 | PI3K-Akt signaling pathway | 340 | 23,928 | 8,733 | 0.0062832 |
| 4 | MAPK signaling pathway | 250 | 14,005 | 5,767 | 0.01050039 |
| 5 | mRNA surveillance pathway | 70 | 1,867 | 1,096 | 0.01166414 |
The significant KEGG pathways identified by PoTRA for hepatitis B-induced HCC using the Kolmogorov–Smirnov test.
FDR adjusted P-values are below 0.05.
| Gene Count(L) | # of edges_normal | # of edges_cancer | Adjusted | ||
|---|---|---|---|---|---|
| 1 | Arginine and proline metabolism | 50 | 163 | 26 | 0.005412 |
| 2 | Glyoxylate and dicarboxylate metabolism | 26 | 216 | 19 | 0.02765948 |
| 3 | Primary bile acid biosynthesis | 17 | 62 | 6 | 0.02765948 |
| 4 | Insulin signaling pathway | 139 | 2,692 | 958 | 0.04057992 |
| 5 | Vasopressin-regulated water reabsorption | 22 | 58 | 8 | 0.04057992 |
The significant KEGG pathways identified by PoTRA for hepatitis C-induced HCC using the Fisher’s exact test based on combined networks.
FDR adjusted P-values are below 0.05. E.comb.normal represents the number of edges in the combined network for normal samples, while E.comb.case is for cancer samples, respectively.
| Gene Counts | E.comb.normal | E.comb.case | Adjusted | ||
|---|---|---|---|---|---|
| 1 | Epstein-Barr virus infection | 85 | 131 | 43 | 0.0103488 |
| 2 | p53 signaling pathway | 68 | 57 | 20 | 0.0103488 |
The significant KEGG pathways identified by PoTRA for hepatitis B-induced HCC using the Fisher’s exact test based on combined networks.
FDR adjusted P-values are below 0.05. E.comb.normal represents the number of edges in the combined network for normal samples, while E.comb.case is for cancer samples, respectively.
| Gene Counts | E.comb.normal | E.comb.case | Adjusted | ||
|---|---|---|---|---|---|
| 1 | Epstein-Barr virus infection | 85 | 84 | 22 | 0.0103488 |