Literature DB >> 26169172

Pan-cancer analysis of TCGA data reveals notable signaling pathways.

Richard Neapolitan1, Curt M Horvath2, Xia Jiang3.   

Abstract

BACKGROUND: A signal transduction pathway (STP) is a network of intercellular information flow initiated when extracellular signaling molecules bind to cell-surface receptors. Many aberrant STPs have been associated with various cancers. To develop optimal treatments for cancer patients, it is important to discover which STPs are implicated in a cancer or cancer-subtype. The Cancer Genome Atlas (TCGA) makes available gene expression level data on cases and controls in ten different types of cancer including breast cancer, colon adenocarcinoma, glioblastoma, kidney renal papillary cell carcinoma, low grade glioma, lung adenocarcinoma, lung squamous cell carcinoma, ovarian carcinoma, rectum adenocarcinoma, and uterine corpus endometriod carcinoma. Signaling Pathway Impact Analysis (SPIA) is a software package that analyzes gene expression data to identify whether a pathway is relevant in a given condition.
METHODS: We present the results of a study that uses SPIA to investigate all 157 signaling pathways in the KEGG PATHWAY database. We analyzed each of the ten cancer types mentioned above separately, and we perform a pan-cancer analysis by grouping the data for all the cancer types.
RESULTS: In each analysis several pathways were found to be markedly more significant than all the other pathways. We call them notable. Research has already established a connection between many of these pathways and the corresponding cancer type. However, some of our discovered pathways appear to be new findings. Altogether there were 37 notable findings in the separate analyses, 26 of them occurred in 7 pathways. These 7 pathways included the 4 notable pathways discovered in the pan-cancer analysis. So, our results suggest that these 7 pathways account for much of the mechanisms of cancer. Furthermore, by looking at the overlap among pathways, we identified possible regions on the pathways where the aberrant activity is occurring.
CONCLUSIONS: We obtained 37 notable findings concerning 18 pathways. Some of them appear to be new discoveries. Furthermore, we identified regions on pathways where the aberrant activity might be occurring. We conclude that our results will prove to be valuable to cancer researchers because they provide many opportunities for laboratory and clinical follow-up studies.

Entities:  

Mesh:

Year:  2015        PMID: 26169172      PMCID: PMC4501083          DOI: 10.1186/s12885-015-1484-6

Source DB:  PubMed          Journal:  BMC Cancer        ISSN: 1471-2407            Impact factor:   4.430


Background

A signal transduction pathway (STP) is a network of intercellular information flow initiated when extracellular signaling molecules bind to cell-surface receptors. The signaling molecules become modified, causing a change in their functional capability, affecting a change in the subsequent molecules in the network. This cascading process culminates in a cellular response. Consensus pathways have been developed based on the composite of studies concerning individual pathway components. KEGG PATHWAY [1] is a collection of manually drawn pathways representing our knowledge of the molecular interaction and reactions for about 157 signaling pathways. Signaling pathways are not stand-alone, but rather it is believed there is inter-pathway communication [2]. Many aberrant STPs have been associated with various cancers [3-9]. To develop optimal treatments for cancer patients, it is important to discover which STPs are implicated in a cancer or cancer-subtype. Microarray technology is providing us with increasingly abundant gene expression level datasets. For example, The Cancer Genome Atlas (TCGA) makes available gene expression level data on tumors and normal tissue in ten different types of cancer including breast cancer, colon adenocarcinoma, glioblastoma, kidney renal papillary cell carcinoma, low grade glioma, lung adenocarcinoma, lung squamous cell carcinoma, ovarian carcinoma, rectum adenocarcinoma, and uterine corpus endometriod carcinoma. Translating the information in these data into a better understanding of underlying biological mechanisms is of paramount importance to identifying therapeutic targets for cancer. In particular, if the data can inform us as to whether and how a signal transduction pathway is altered in the cancer, we can investigate targets on that pathway. In an effort to reveal pathways implicated using gene expression data from tumors and normal tissue, researchers initially developed techniques such as over-representation analysis [10-12]. However these techniques analyze each gene separately rather than perform an analysis of the pathway at a systems level. By ignoring the topology of the network, they do not account for key biological information. That is, if a pathway is activated through a single receptor and that protein is not produced, the pathway will be severely impacted. However, a protein that appears downstream may have a limited effect on the pathway. Recently, researchers have developed methods that account for the topology. Signaling Pathway Impact Analysis (SPIA) [13] is a software package (http://www.bioconductor.org/packages/release/bioc/html/SPIA.html) that analyzes gene expression data to identify whether a signaling network is relevant in a given condition by combining over-representation analysis with a measurement of the perturbation measured in a pathway. Neapolitan et al. [14] developed a method called Causal Analysis of STP Aberrations (CASA) for analysing signal pathways which represents signal pathways as causal Bayesian networks [15], and which also accounts for the topology of the network. Even though much effort has been put into the development of these techniques for analyzing signaling pathways using gene expression data, it was not clear that we could get reliable results concerning signaling pathways by analyzing such data. That is, phosphorylation activity state of each protein in signaling pathway corresponds to the information flow on the pathway. Protein expression level (abundance) is correlated with activity, and gene expression level (mRNA abundance) is associated with protein abundance (correlation coefficient of 0.4 to 0.6). So, it seems gene expression data would be only loosely correlated with activity. To investigate this question of whether we could obtain meaningful results using large-scale gene expression data, Neapolitan et al. [14] analyzed the ovarian cancer TCGA data using both SPIA and CASA. In their analysis, they investigated 20 signaling pathways believed to be implicated in cancer and 6 randomly chosen pathways. They obtained significant results that the cancers believed to be implicated in cancer are the ones most likely to be implicated in ovarian carcinoma. The study in [14] was only a proof of principle study. In this paper we present the results of a study that uses SPIA to investigate all 157 signaling pathways in the KEGG PATHWAY database.

Results and discussion

We analyzed all 157 signaling pathways in the KEGG PATHWAY database using SPIA. We performed a pan-cancer analysis that had all 2100 tumors, a breast cancer analysis that had 466 tumors, a colon adenocarcinoma analysis that had 143 tumors, a glioblastoma analysis that had 567 tumors, a kidney renal papillary cell carcinoma analysis that had 16 tumors, a low grade glioma analysis that had 27 tumors, a lung adenocarcinoma analysis that had 32 tumors, a lung squamous cancer analysis that had 154 tumors, an ovarian cancer analysis that had 572 tumors, a rectum adenocarcinoma analysis that had 69 tumors, and a uterine corpus endometriod carcinoma analysis that had 54 tumors. For all the analyses, we grouped the normal tissue samples from all the datasets, making a total of 101 normal tissue samples. In all our analyses several pathways were found to be markedly more significant than the others, and also have very small FDRs. We call a pathway notable if the p-value is less than 0.0001 and the FDR is less than 0.01. We call a pathway significant if the p-value is less than 0.05. Table 1 shows the pathways found to be notable in all 11 of our analyses, and the most significant pathway that was not notable. Additional file 1: Tables S1-S11 show all pathways found to be significant (p-value < 0.05) in each of the analyses.
Table 1

The pathways found to be notable in the various analyses, and the most significant pathway that was not notable (listed last). A pathway is notable if the p-value is less than 0.0001 and the FDR is less than 0.01. A pathway is significant if the p-value is less than 0.05. The Status column gives the direction in which the pathway is found to be perturbed (activated or inhibited). The Signfct column contains an entry if the pathway is significant in the pan-cancer analysis. The entry is “N” if it is one of the notable pathways. Otherwise, it is “S”. A pathway has an asterisk if it is not notable in the pan-cancer analysis and previous studies have not linked it to the particular cancer

AnalysisPathwayp-valueFDRStatusSignfct
pan-cancerFocal adhesion5.99E-060.000789ActivatedN
PI3K-Akt signaling pathway1.01E-050.000789ActivatedN
Rap1 signaling pathway3.71E-050.001939ActivatedN
Calcium signaling pathway4.95E-050.001942ActivatedN
Systemic lupus erythematosus0.0019660.05302ActivatedS
breastECM-receptor interaction5.71E-050.008967Activated
Complement and coagulation cascades0.0038550.218606ActivatedS
colonAdrenergic signaling in cardiomyocytes*3.35E-050.001709InhibitedS
Melanoma3.68E-050.001709InhibitedS
Focal adhesion4.73E-050.001709InhibitedN
Cytokine-cytokine receptor interaction5.84E-050.001709ActivatedS
Pathways in cancer*6.21E-050.001709InhibitedS
PI3K-Akt signaling pathway6.53E-050.001709InhibitedN
Rap1 signaling pathway0.0029190.065477InhibitedN
glioblastomaCytokine-cytokine receptor interaction5.12E-078.04E-05InhibitedS
Complement and coagulation cascades*1.33E-050.000798InhibitedS
Systemic lupus erythematosus1.94E-050.000798InhibitedS
PI3K-Akt signaling pathway2.31E-050.000798InhibitedN
Chemokine signaling pathway2.54E-050.000798InhibitedS
Vascular smooth muscle contraction0.0030760.069809Inhibited
kidneyRap1 signaling pathway3.30E-060.000518InhibitedN
ECM-receptor interaction*8.13E-060.000638Inhibited
Colorectal cancer*2.79E-050.001459Inhibited
Focal adhesion8.66E-050.0034InhibitedN
Insulin signaling pathway0.0005570.015232Inhibited
gliomaFocal adhesion4.94E-060.000674InhibitedT
ECM-receptor interaction*8.59E-060.000674Inhibited
Chemokine signaling pathway1.74E-050.00091InhibitedS
Small cell lung cancer*4.27E-050.001482InhibitedS
Cytokine-cytokine receptor interaction4.72E-050.001482InhibitedS
Retrograde endocannabinoid signaling0.0004780.01252Activated
Analysis Pathway p-value FDR Status Signfct
lung adeno.Chemokine signaling pathway1.82E-082.86E-06ActivatedS
Cytokine-cytokine receptor interaction1.51E-050.001187ActivatedS
Systemic lupus erythematosus0.0001080.005654ActivatedS
lung squamousChemokine signaling pathway1.43E-050.002204ActivatedS
Cytokine-cytokine receptor interaction4.14E-050.002204ActivatedS
Endocrine and other factor-reg. calcium reab.*4.21E-050.002204Inhibited
Amoebiasis0.0056490.221723InhibitedS
ovarianRap1 signaling pathway4.02E-050.002785InhibitedN
PI3K-Akt signaling pathway5.03E-050.002785InhibitedN
Calcium signaling pathway5.32E-050.002785InhibitedN
Focal adhesion0.0003660.014354InhibitedN
rectumFocal adhesion3.63E-060.000342InhibitedN
Rap1 signaling pathway4.36E-060.000342InhibitedN
Ras signaling pathway*1.32E-050.000689InhibitedS
PI3K-Akt signaling pathway4.96E-050.001727InhibitedN
Prostate cancer*5.50E-050.001727InhibitedS
Melanoma0.0015140.039609InhibitedS
uterineFocal adhesion7.50E-070.000118InhibitedN
Maturity onset diabetes of the young4.69E-050.003144ActivatedS
Calcium signaling pathway6.01E-050.003144InhibitedN
Rap1 signaling pathway0.0053180.208728InhibitedN
The pathways found to be notable in the various analyses, and the most significant pathway that was not notable (listed last). A pathway is notable if the p-value is less than 0.0001 and the FDR is less than 0.01. A pathway is significant if the p-value is less than 0.05. The Status column gives the direction in which the pathway is found to be perturbed (activated or inhibited). The Signfct column contains an entry if the pathway is significant in the pan-cancer analysis. The entry is “N” if it is one of the notable pathways. Otherwise, it is “S”. A pathway has an asterisk if it is not notable in the pan-cancer analysis and previous studies have not linked it to the particular cancer

Pan-cancer results

Table 1 reveals that the notable pathways in the pan-cancer analysis are the focal adhesion pathway, P13k-Akt pathway, Rap1 pathway, and calcium signaling pathways. This result verifies previous research showing that three of these four pathways are major players in cancer. The focal adhesion pathway has been shown to be involved in invasion, metastasis, angiogenesis, epithelial-mesenchymal transition (EMT), maintenance of cancer stem cells, and globally promoting tumor cell survival [16]. Furthermore, the Focal Adhesion Kinase (FAK) gene is a non-receptor tyrosine kinase that controls cellular processes such as proliferation, adhesion, spreading, motility, and survival [17-22]. FAK has been shown to be over-expressed in many types of tumors [23-26]. Disruption of FAK and p53 interaction with small molecule compound R2 reactivated p53 and blocked tumor growth [27]. The PI3K-Akt signaling pathway has been shown to be the most frequently altered pathway in human tumors. It controls most hallmarks of cancer, including cell cycle, survival, metabolism, motility and genomic instability; angiogenesis and inflammatory cell recruitment [28]. The Calcium signaling pathway has diverse functions in cellular regulation, which was found previously (with cell adhesion) by pathway analysis in breast cancer [29]. Yang et al. [30] discuss regulation of calcium signaling in lung cancer. On the other hand, much less is known about the Rap1 signaling pathway and cancer. There are only 6 pubmed citations concerning Rap1 and cancer. In particular, Bailey et al. [31] provide evidence to support a role for aberrant Rap1 activation in prostate cancer progression. Our results indicate Rap1 might be as big of a player in all cancers as the other three pathways just discussed.

Individual cancer results

Next we discuss the individual cancer results. Each of these discussions refers to information provided in Table 1. The only notable pathway in the breast cancer analysis is the ECM-receptor interaction pathway. This pathway was not found to be significant in the pan-cancer analysis, much less notable. However, previous research links changes in the extracellular matrix (ECM) to breast cancer. Lu et al. [32] recently discuss how the ECM’s biomechanical properties change under disease conditions. In particular, tumor stroma is typically stiffer than normal stroma; and in the case of breast cancer, diseased tissue can be 10 times stiffer than normal breast tissue. There are 7 notable pathways in the case of colon adenocarcinoma, and all of them were found to be significant in the pan-cancer analysis. The PI3k-Akt signaling pathway and focal adhesion pathway were both found to be notable in the pan-cancer analysis and were discussed above. There are only 7 pubmed citations linking the highest ranking pathway, adrenergic signaling in cardiomyocytes, to cancer. The second pathway, namely the melanoma pathway, is of course linked to cancer. Furthermore, there is research substantiating that the BRAF mutation is prominent in melanoma and colorectal cancer [33]. BRAF is on the melanoma pathway. As to the cytokine-cytokine receptor interaction pathway, there has been research linking cytokine receptors to colorectal cancer [34]. The pathway in cancer pathway is of course linked to cancer. Our result substantiates its role in colon cancer in particular. The top ranking pathway in the case of glioblastoma is the cytokine-cytokine receptor interaction pathway, whose relevance to cancer we just discussed. The second pathway is complement and coagulation cascades. Recent research has suggested an essential role of this pathway in multiple cancers [35], but not glioblastoma in particular. Our results support that it is also has a role in glioblastoma. The third pathway, namely system lupus erythematosus, has been linked to glioblastoma [36]. We have already discussed the PI3K-Akt signalling pathway, as it was one of the notable pathways in the pan-cancer analysis. Finally, chemokine signaling has been associated with a number of cancers including glioma [37]. The first and fourth pathways for kidney renal papillary cell carcinoma are two of the notable pathways in the pan-cancer analysis, and have already been discussed. The second pathway, namely the ECM-receptor interaction pathway was also discussed because it was the most significant pathway in breast cancer. Finally, the colorectal cancer pathway is of course linked to cancer, but we know of no specific study implicating it in kidney renal papillary cell carcinoma. The chemokine signaling pathway and the cytokine-cytokine receptor interaction pathway are both notable in low grade glioma. These same two pathways were found to be significant in glioblastoma and were discussed above. The first pathway, namely focal adhesion, is one of the notable pathways in our pan-cancer analysis. The second pathway, ECM-receptor interaction, was previously discussed because it was the most notable pathway in breast cancer. Finally, the small cell lung cancer pathway is concerned with cancer, but a literature search did not reveal any study linking it specifically to glioma. The two notable pathways in the case of lung adenocarcinoma are also notable in glioblastoma, and were discussed when we discussed that cancer. The cytokine-cytokine receptor interaction pathway has been implicated specifically with lung cancer [38], as has chemokine signaling [39]. The top two pathways in the case of lung squamous cell carcinoma are the same as the top two in the case of lung adenocarcinoma. Their relevance to lung cancer was just discussed. A pubmed search does not show any papers linking cancer with the third pathway, endocrine and other factor-regulated calcium absorption. The notable pathways in ovarian cancer are all notable pathways in the pan-cancer analysis, and were previously discussed. Three of the notable pathways in the rectum adenocarcinoma analysis, are notable pathways in the pan-cancer analysis. The third ranked pathway, RAS signaling, has been associated with renal carcinoma [40]. As to the prostate cancer pathway, prostate cancer and renal cell cancer have been shown to have some commonality [41]. Two of the three notable pathways for uterine corpus endometriod carcinoma are notable pathways in the pan-cancer analysis. As to the third pathway, the connection between maturity onset diabetes of the young and endometrial cancer has been well-established [42].

Summary results

Out of 157 signaling pathways analyzed, only 18 were found to be notable in at least one cancer. Table 2 lists those pathways. Out of a total of 37 notable findings, 26 occurred for the top 7 pathways. So, our results indicate that relatively few pathways are responsible for much of the aberrant activity in cancer. Of those 7 pathways, 4 were found to be notable in the pan-cancer analysis, and 2 others were fairly significant (p-values of 0.006 and 0.007). So these pathways may play roles in many different cancers. However, the ECM-receptor interaction pathway was not significant in the pan-cancer analysis (p-value of 0.472), indicating that perhaps this pathway is relevant only to the 3 cancers in which it was found to be notable, namely breast cancer, kidney renal papillary cell carcinoma, and low grade glioma.
Table 2

The pathways that were found to be notable in at least one cancer analysis. The second column shows the number of cancer types in which the pathway was found to be notable. The pathways are ranked by that column. The third column contains an “N” if the pathway was found to be notable in the pan-cancer analysis and it contains an “S” if it was only found to be significant in the pan-cancer analysis. The fourth column shows the p-value in the pan-cancer analysis

RankPathway# cancersPan_cancerp-value
1Focal Adhesion5N5.99E-06
2Cytokine-cytokine receptor interaction5S0.006
3PI3K-Akt signaling pathway4N1.01E-05
4Chemokine signaling pathway4S0.007
5Rap1 signaling pathway3N3.71E-05
6ECM-receptor interaction30.472
7Calcium signaling pathway2N4.95E-05
8Adrenergic signaling in cardiomyocytes1S0.014
9Melanoma1S3.00E-03
10Pathways in Cancer1S0.002
11Complement and coagulation cascades1S0.005
12Systemic lupus erythematosus1S0.002
13Colerectal cancer10.531
14Small cell lung cancer1S0.015
15Endocrine and other factor-regulated calcium reabsorption10.183
16Ras signal pathway1S0.038
17Prostate cancer1S0.004
18Maturity onset diabetes of the young1S0.047
The pathways that were found to be notable in at least one cancer analysis. The second column shows the number of cancer types in which the pathway was found to be notable. The pathways are ranked by that column. The third column contains an “N” if the pathway was found to be notable in the pan-cancer analysis and it contains an “S” if it was only found to be significant in the pan-cancer analysis. The fourth column shows the p-value in the pan-cancer analysis To gain insight as to how much each particular cancer has in common with all cancers, we computed the Jaccard Index comparing the notable pathways in the each cancer type to the notable pathways in the pan-cancer analysis. If A and B are the two sets, the Jaccard Index of A and B is given by where A is the number of items in A. The value of J(A, B) is 0 if A and B have no items in common, and is 1 if A and B are the same set. Table 3 shows the Jaccard Indices. Ovarian carcinoma is at the top with an index of 0.75. The index would have been even higher, namely 1.0, if we had included the fourth most significant pathway for Ovarian Cancer, which is Focal adhesion and has a p-value of 0.000366. At the bottom we have breast cancer and the two lung cancers with Jaccard Indices equal to 0.
Table 3

The Jaccard Index for each cancer type. The index is based on the number of notable pathways the cancer analysis has in common with the pan-cancer analysis

Cancer typeJaccard index
Ovarian carcinoma0.75
Rectum adenocarcinoma0.6
Uterine corpus Endometriod carcinoma0.4
Kidney renal papillary cell carcinoma0.333
Colon adenocarcinoma0.222
Glioblastoma0.125
Low grade glioma0.125
Breast cancer0
Lung adenocarcinoma0
lung squamous cell carcinoma0
The Jaccard Index for each cancer type. The index is based on the number of notable pathways the cancer analysis has in common with the pan-cancer analysis

Pathway intersections

If we look at the pathway diagrams for our seven most significant pathways appearing in Table 2, often different signaling molecules bind to different receptors (integrin, RTK, GPCR), but the responses converge on many of the same proteins. For example, PI3K-Akt, Focal Adhesion, and Rap1 all converge on protein PI3K. To gain insight as to how much overlap there is among the seven most significant pathways, we determined the number of proteins each pathway pair has in common. The results appear in Table 4. Two interesting relationships are discernable in that table, and they are depicted in Fig. 1.
Table 4

The number of proteins that the top 7 pathways have in common with each other. The entry is the number of proteins that are affiliated with both of the two indicated pathways

FACytPI3kChmRapECMCal
FA 2071612044637011
Cyt 1626562642103
PI3K 120623475196708
Chm 44645118951017
Rap 63219651211431
ECM 7007004870
Cal 113817310180
Fig. 1

Venn diagrams showing number of proteins pathway pairs have in common. a) Intersection of PI3K-Akt with each of the other top 6 pathways. b) Intersection of calcium signalling pathway with each of the other top 6 pathways

The number of proteins that the top 7 pathways have in common with each other. The entry is the number of proteins that are affiliated with both of the two indicated pathways Venn diagrams showing number of proteins pathway pairs have in common. a) Intersection of PI3K-Akt with each of the other top 6 pathways. b) Intersection of calcium signalling pathway with each of the other top 6 pathways The first relationship is that PI3K-Akt has substantial overlap will five of the other six pathways. This is shown in Fig. 1a. PI3K-Akt is “probably one of the most important pathways in cancer metabolism and growth” [43]. The fact that it overlaps substantially will five other significant pathways indicates that much of the aberrant signaling in many cancers might be located in regions where PI3K-Akt overlaps with other pathways. The second interesting relationship is that the Calcium pathway hardly overlaps with the other six pathways. This is shown in Fig. 1b. The Calcium pathway was found to be notable in only ovarian and uterine cancer (Table 1). This result indicates that there might be a common region of aberrant signaling in these two cancers, which does not overlap with regions of aberrant signaling in other cancers. To discover possible hotspots where other aberrant signaling might occur, we looked at higher order intersections. We discovered the intersections shown in Fig. 2. In each of the diagrams in that figure, the intersection of the pathways in the diagram includes essentially no proteins from the other significant pathways.
Fig. 2

Venn diagrams showing number proteins pathway triplets have in common. a) PI3K-Akt, focal adhesion, and Rap1. b) P13K-Akt, focal adhesion, and Rap1. c) P13K-Akt, chemokine signaling, and Rap1. d) chemokine signaling, focal adhesion, and Rap1. e) chemokine signaling, and cytokine-cytokine receptor interaction. In each of the diagrams, the intersection of the pathways includes essentially no proteins from the other significant pathways

Venn diagrams showing number proteins pathway triplets have in common. a) PI3K-Akt, focal adhesion, and Rap1. b) P13K-Akt, focal adhesion, and Rap1. c) P13K-Akt, chemokine signaling, and Rap1. d) chemokine signaling, focal adhesion, and Rap1. e) chemokine signaling, and cytokine-cytokine receptor interaction. In each of the diagrams, the intersection of the pathways includes essentially no proteins from the other significant pathways Perhaps the most interesting relationship appears in Fig. 2a, which shows that the majority of the proteins in the ECM-receptor interaction pathway are located in the intersection of the PI3K-Akt and Focal Adhesion pathways. The ECM-receptor interaction pathway was found to be notable in breast cancer, kidney cancer, and glioma. This result indicates that there may be a region of aberrant signaling, located in the intersection of PI3K-Akt and Focal Adhesion, in these cancers. Figures 2b and c show other possible hot regions in PI3K-Akt, while Fig. 2d and e show possible hot regions not including PI3K-Akt. Of these figures, Fig. 2e is the most compelling. The Cytokine-cytokine receptor interaction and Chemokine signaling pathways have a large intersection that excludes other pathways. Both these pathways were found to be notable in glioblastoma, glioma, lung adenocarcinoma, and lung squamous cancer. Only the Cytokine-cytokine receptor interaction pathway was found to be notable in colon cancer. So there may be a region of aberrant signaling, located in the intersection of these pathways, in these cancers.

Cancer clusters

To investigate further how different cancers might share common causal mechanisms, we developed a heat map, based on hierarchical clustering, with cancer type on the horizontal, the 18 notable pathways on the vertical, and with the entry being p-value. Figure 3 shows the heat map. Ovarian cancer and uterine cancer constitute a primary group. This is consistent with our result mentioned about that the calcium pathway was found to be notable only in these two cancers. Furthermore, these cancers are in close proximity. Rectum cancer and colon cancer also constitute a primary group, which is consistent with their close proximity.
Fig. 3

Heat map showing cancer and pathway clusters. The entries are standardized values of the p-value. The p-values are mapped to [−0.5, 0.5]; then standardization is done along the rows by the hierarchical clustering algorithm in MATLAB so that the mean values is 0 and the standard deviation is 1. Abbreviations: LGG: low grade glioma; BRCA: breast; LUSC: lung squamous; GBM: glioblastoma; LUAD: lung adenocarcinoma; OV: ovarian; UCEC: uterine; READ: rectum; COAD: colon; KIRP: kidney

Heat map showing cancer and pathway clusters. The entries are standardized values of the p-value. The p-values are mapped to [−0.5, 0.5]; then standardization is done along the rows by the hierarchical clustering algorithm in MATLAB so that the mean values is 0 and the standard deviation is 1. Abbreviations: LGG: low grade glioma; BRCA: breast; LUSC: lung squamous; GBM: glioblastoma; LUAD: lung adenocarcinoma; OV: ovarian; UCEC: uterine; READ: rectum; COAD: colon; KIRP: kidney

Discussion

We performed a pan-cancer analysis by grouping the TCGA data on 10 different cancer types. We identified 4 signaling pathways to be markedly more significant (which we called notable) than the remaining 153 pathways. We also did a separate analysis for each of the 10 types of cancers individually. In all 10 of the cancers, there were several pathways that were found to be markedly more significant than the others. Altogether there were 37 notable findings in the separate analyses, and 26 of them occurred in 7 pathways. These 7 pathways included the 4 discovered in the pan-cancer analysis. Our results suggest that these 7 pathways account for much of the mechanisms of cancer. As we discussed, research has already established a connection between many of the 18 pathway we discovered and the corresponding cancer type. However, some of them appear to be new discoveries. Furthermore, we have identified regions on the pathways that might account for the aberrant behaviour. So, we have both substantiated previous knowledge, and provided researchers with avenues for future investigations. The PI3K-Akt pathway has long been recognized as an aberrant pathway in breast cancer [43]. However, our breast cancer analysis did not find it to be significant (p = 0.304). On the other hand, the ECM-receptor interaction pathway was the only notable pathway in the breast cancer analysis, and we showed that 70 of its 87 proteins are on the PI3K-Akt pathway. So, our results indicate that the effect of PI3K-Akt on breast cancer might be localized in this region of the PI3K-Akt pathway. It likely that there are other known pathways that affect various cancers, which we did not discover. The analysis of gene expression alone may not account for pathways that are activated by post-translational modification (like phosphorylation/dephos) that could change the pathway activation profile without altering mRNA abundance. So, we should interpret our results only as suggesting avenues of investigation, rather than as disconfirming any existing knowledge. This in silico analysis of cancer patient signaling pathways provides many opportunities for laboratory and clinical follow-up studies. We know of no dataset as comprehensive as the TCGA datasets. However, there are individual datasets for specific cancers that could be investigated. For example, the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset has data on 1981 breast cancer tumors, and expression levels for 16,384 genes [44].

Conclusions

We presented the results of a study that analyzes all 157 signaling pathways in the KEGG PATHWAY database using TCGA gene expression datasets concerning ten types of cancer. We performed a pan-cancer analysis and analyze each dataset separately. There were 37 notable findings concerning 18 pathways. Research has already established a connection between many of these pathways and the corresponding cancer type. However, some of them appear to be new discoveries. Furthermore, we identified regions on pathways where the aberrant activity might be occurring. We conclude that our results will prove to be valuable to cancer researchers because they provide many opportunities for laboratory and clinical follow-up studies.

Method

This research does not involve any human subjects. It utilizes the publically available de-identified TCGA datasets. The Cancer Genome Atlas (TCGA) makes available datasets concerning breast cancer, colon adenocarcinoma, glioblastoma, kidney renal papillary cell carcinoma, low grade glioma, lung adenocarcinoma, lung squamous cell carcinoma, ovarian carcinoma, rectum adenocarcinoma, and uterine corpus endometriod carcinoma. Each dataset contains data on the expression levels of 17,814 genes in tumorous tissue and in normal tissue. Table 5 shows the number of tumor samples and non-tumor samples in each of these datasets. Tables 6, 7, 8, 9, 10 shows demographic information concerning the patients from which the samples were taken.
Table 5

The number of tumor samples and normal samples in the TCGA cancer datasets

Cancer# tumors# normal
Breast cancer46661
Colon adenocarcinoma14319
Glioblastoma56710
Kidney renal papillary cell carcinoma160
Low grade glioma270
Lung adenocarcinoma320
Lung squamous cell carcinoma1540
Ovarian carcinoma5728
Rectum adenocarcinoma693
Uterine corpus endometriod carcinoma540
Pan-cancer (total)2100101
Table 6

Gender distribution of the patients from which the various samples were obtained

CancerTumor samplesNon-tumor samples
FemaleMaleFemaleMale
Breast cancer4615601
Colon adenocarcinoma6776145
Glioblastoma21934855
Kidney renal papillary cell carcinoma41200
Low grade glioma91800
Lung adenocarcinoma181400
Lung squamous cell carcinoma44110000
Ovarian carcinoma572080
Rectum adenocarcinoma313830
Uterine corpus endometriod carcinoma54000
Pan-cancer (total)14796219011
Table 7

Menopause status distribution of the patients from which the various samples were obtained

CancerTumor samplesNon-tumor samples
PrePeriPostNAPrePeriPostNA
Breast cancer10416297491922812
Colon adenocarcinoma00014300019
Glioblastoma00056700010
Kidney renal papillary cell carcinoma000160000
Low grade glioma000270000
Lung adenocarcinoma000320000
Lung squamous cell carcinoma0001540000
Ovarian carcinoma0005720008
Rectum adenocarcinoma000690003
Uterine corpus endometriod carcinoma504540000
Pan cancer (total)1091634216331922852
Table 8

Race distribution of the patients from which the various samples were obtained. Ind: American indian or Alaska native; Asn: Asian; Blk: Black or African American; Haw: Native Hawaiian or other Pacific islander; Wht: white; NA: Not available

CancerTumor samplesNon-tumor samples
Ind.Asn.Blk.Haw.Wht.NAInd.Asn.Blk.Haw.Wht.NA
Breast cancer134390303890010591
Colon adenocarcinoma00109133002089
Glioblastoma013340495250000010
Kidney renal papillary cell carcinoma000097000000
Low grade glioma0020250000000
Lung adenocarcinoma0210263000000
Lung squamous cell carcinoma03709153000000
Ovarian carcinoma31924149332000008
Rectum adenocarcinoma0010464000030
Uterine corpus endometriod carcinoma2460402000000
Pan-cancer (total)6751151149540800307028
Table 9

Ethnicity distribution of the patients from which the various samples were obtained

CancerTumor samplesNon-tumor samples
LatinoNot LatinoNALatinoNot LatinoNA
Breast cancer733612303625
Colon adenocarcinoma0101330109
Glioblastoma12465900010
Kidney renal papillary cell carcinoma0160000
Low grade glioma1206000
Lung adenocarcinoma1283000
Lung squamous cell carcinoma48862000
Ovarian carcinoma11330231008
Rectum adenocarcinoma0564030
Uterine corpus endometriod carcinoma22428000
Pan-cancer (total)13132274004952
Table 10

Age distribution of the patients from which the various samples were obtained

CancerTumor samplesNon-tumor samples
0-2021-4041-6061-8081-100NA0-2021-4041-6061-8081-100NA
Breast cancer05119819422107262530
Colon adenocarcinoma0222902900031240
Glioblastoma763238237202014410
Kidney renal papillary cell carcinoma0011500000000
Low grade glioma11510100000000
Lung adenocarcinoma0192020000000
Lung squamous cell carcinoma023111272000000
Ovarian carcinoma023295233201044000
Rectum adenocarcinoma01144770001200
Uterine corpus endometriod carcinoma03232260000000
Pan-cancer (total)81618519611136012384380
The number of tumor samples and normal samples in the TCGA cancer datasets Gender distribution of the patients from which the various samples were obtained Menopause status distribution of the patients from which the various samples were obtained Race distribution of the patients from which the various samples were obtained. Ind: American indian or Alaska native; Asn: Asian; Blk: Black or African American; Haw: Native Hawaiian or other Pacific islander; Wht: white; NA: Not available Ethnicity distribution of the patients from which the various samples were obtained Age distribution of the patients from which the various samples were obtained We did a pan-cancer analysis by grouping the ten different cancer datasets into one dataset, resulting in 2100 tumor samples and 101 normal samples. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that integrates genomic, chemical and systemic functional information. We chose KEGG because it is widely used as a reference knowledge base for integration and interpretation of large-scale datasets generated by genome sequencing and other high-throughput experimental technologies. KEGG PATHWAY [1] is a collection of manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks for the following: Metabolism Global/overview, Carbohydrate, Energy, Lipid, Nucleotide, Amino acid, Other amino, Glycan, Cofactor/vitamin, Terpenoid/PK, Other secondary metabolite, Xenobiotics, Chemical structure Genetic Information Processing Environmental Information Processing Cellular Processes Organismal Systems Human Diseases We investigated all 157 signaling pathways in the KEGG databases. For each pathway, we identified all the genes related to the pathways. We extracted gene expression profiles for the 2100 tumor samples and 101 normal samples in the TCGA database. By mapping the gene names of the genes in the gene sets identified using KEGG pathways and the gene names in TCGA data, we were able to extract the gene expression profiles for each of the 157 pathways for the 2100 tumor samples and 101 normal samples. The TCGA gene expression data is already processed and normalized. We repeated this procedure for each of the ten cancer datasets separately. Each dataset has the number of tumor samples shown in Table 5. However, to achieve a larger sample for the normal samples, we grouped the normal samples in the ten datasets, making the number of normal samples equal to 101. Once these datasets were developed, we analysed each dataset using the software package SPIA [13] (http://www.bioconductor.org/packages/release/bioc/html/SPIA.html), which analyzes gene expression data to identify whether a signaling pathway is relevant in a given cancer by 1) determining the overrepresentation of genes on the pathway that are differentially expressed in tumor samples versus normal samples; and 2) investigating the abnormal perturbation of the pathway, as measured by propagating measured expression changes across the pathway topology. SPIA produces a p-value showing the significance level at which a pathway is found to be perturbed in cancerous tissue and a false discovery rate (FDR). We ran SPIA using the recommended value of 2000 bootstrap iterations, and all parameters set to their default values.
  39 in total

Review 1.  Cellular functions of FAK kinases: insight into molecular mechanisms and novel functions.

Authors:  Michael D Schaller
Journal:  J Cell Sci       Date:  2010-04-01       Impact factor: 5.285

2.  Efficient methods for identifying mutated driver pathways in cancer.

Authors:  Junfei Zhao; Shihua Zhang; Ling-Yun Wu; Xiang-Sun Zhang
Journal:  Bioinformatics       Date:  2012-09-14       Impact factor: 6.937

3.  Regulation of calcium signaling in lung cancer.

Authors:  Haihong Yang; Qi Zhang; Jianxing He; Wenju Lu
Journal:  J Thorac Dis       Date:  2010-03       Impact factor: 2.895

4.  Immunohistochemical analyses of focal adhesion kinase expression in benign and malignant human breast and colon tissues: correlation with preinvasive and invasive phenotypes.

Authors:  W G Cance; J E Harris; M V Iacocca; E Roche; X Yang; J Chang; S Simkins; L Xu
Journal:  Clin Cancer Res       Date:  2000-06       Impact factor: 12.531

Review 5.  Focal adhesion kinase and its signaling pathways in cell migration and angiogenesis.

Authors:  Xiaofeng Zhao; Jun-Lin Guan
Journal:  Adv Drug Deliv Rev       Date:  2010-11-29       Impact factor: 15.470

6.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

7.  Early occurrence of RASSF1A hypermethylation and its mutual exclusion with BRAF mutation in thyroid tumorigenesis.

Authors:  Mingzhao Xing; Yoram Cohen; Elizabeth Mambo; Giovanni Tallini; Robert Udelsman; Paul W Ladenson; David Sidransky
Journal:  Cancer Res       Date:  2004-03-01       Impact factor: 12.701

8.  Cytokine and cytokine receptor single-nucleotide polymorphisms predict risk for non-small cell lung cancer among women.

Authors:  Alison L Van Dyke; Michele L Cote; Angie S Wenzlaff; Wei Chen; Judith Abrams; Susan Land; Craig N Giroux; Ann G Schwartz
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2009-06       Impact factor: 4.254

9.  Overexpression of the focal adhesion kinase (p125FAK) in invasive human tumors.

Authors:  L V Owens; L Xu; R J Craven; G A Dent; T M Weiner; L Kornberg; E T Liu; W G Cance
Journal:  Cancer Res       Date:  1995-07-01       Impact factor: 12.701

10.  Chemokine receptor CXCR4 and early-stage non-small cell lung cancer: pattern of expression and correlation with outcome.

Authors:  J-P Spano; F Andre; L Morat; L Sabatier; B Besse; C Combadiere; P Deterre; A Martin; J Azorin; D Valeyre; D Khayat; T Le Chevalier; J-C Soria
Journal:  Ann Oncol       Date:  2004-04       Impact factor: 32.976

View more
  17 in total

1.  PANOPLY: Omics-Guided Drug Prioritization Method Tailored to an Individual Patient.

Authors:  Krishna R Kalari; Jason P Sinnwell; Kevin J Thompson; Xiaojia Tang; Erin E Carlson; Jia Yu; Peter T Vedell; James N Ingle; Richard M Weinshilboum; Judy C Boughey; Liewei Wang; Matthew P Goetz; Vera Suman
Journal:  JCO Clin Cancer Inform       Date:  2018-12

2.  Comprehensive Transcriptome Profiling of Cryptic CBFA2T3-GLIS2 Fusion-Positive AML Defines Novel Therapeutic Options: A COG and TARGET Pediatric AML Study.

Authors:  Jenny L Smith; Rhonda E Ries; Tiffany Hylkema; Todd A Alonzo; Robert B Gerbing; Marianne T Santaguida; Lisa Eidenschink Brodersen; Laura Pardo; Carrie L Cummings; Keith R Loeb; Quy Le; Suzan Imren; Amanda R Leonti; Alan S Gamis; Richard Aplenc; E Anders Kolb; Jason E Farrar; Timothy J Triche; Cu Nguyen; Daoud Meerzaman; Michael R Loken; Vivian G Oehler; Hamid Bolouri; Soheil Meshinchi
Journal:  Clin Cancer Res       Date:  2019-11-12       Impact factor: 12.531

3.  Chemotherapy-induced uridine diphosphate release promotes breast cancer metastasis through P2Y6 activation.

Authors:  Xiaobin Ma; Xinhua Pan; Yinglei Wei; Binhe Tan; Linli Yang; Hua Ren; Min Qian; Bing Du
Journal:  Oncotarget       Date:  2016-05-17

4.  Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids.

Authors:  Yang Pan; Cheng Yan; Yu Hu; Yu Fan; Qing Pan; Quan Wan; John Torcivia-Rodriguez; Raja Mazumder
Journal:  Sci Rep       Date:  2017-02-08       Impact factor: 4.379

5.  Breast cancer metastasis suppressor OTUD1 deubiquitinates SMAD7.

Authors:  Zhengkui Zhang; Yao Fan; Feng Xie; Hang Zhou; Ke Jin; Li Shao; Wenhao Shi; Pengfei Fang; Bing Yang; Hans van Dam; Peter Ten Dijke; Xiaofeng Zheng; Xiaohua Yan; Junling Jia; Min Zheng; Jin Jin; Chen Ding; Sheng Ye; Fangfang Zhou; Long Zhang
Journal:  Nat Commun       Date:  2017-12-13       Impact factor: 14.919

6.  A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

Authors:  Xin-Ping Xie; Yu-Feng Xie; Hong-Qiang Wang
Journal:  BMC Bioinformatics       Date:  2017-08-23       Impact factor: 3.169

7.  Comprehensive investigation of a novel differentially expressed lncRNA expression profile signature to assess the survival of patients with colorectal adenocarcinoma.

Authors:  Jiang-Hui Zeng; Liang Liang; Rong-Quan He; Rui-Xue Tang; Xiao-Yong Cai; Jun-Qiang Chen; Dian-Zhong Luo; Gang Chen
Journal:  Oncotarget       Date:  2017-03-07

8.  Sample Level Enrichment Analysis of KEGG Pathways Identifies Clinically Relevant Subtypes of Glioblastoma.

Authors:  Siyi Wanggou; Chengyuan Feng; Yuanyang Xie; Linrong Ye; Feiyifan Wang; Xuejun Li
Journal:  J Cancer       Date:  2016-07-26       Impact factor: 4.207

9.  Identification of cancer prognosis-associated functional modules using differential co-expression networks.

Authors:  Wenshuai Yu; Shengjie Zhao; Yongcui Wang; Brian Nlong Zhao; Weiling Zhao; Xiaobo Zhou
Journal:  Oncotarget       Date:  2017-12-04

10.  Investigating MicroRNA and transcription factor co-regulatory networks in colorectal cancer.

Authors:  Hao Wang; Jiamao Luo; Chun Liu; Huilin Niu; Jing Wang; Qi Liu; Zhongming Zhao; Hua Xu; Yanqing Ding; Jingchun Sun; Qingling Zhang
Journal:  BMC Bioinformatics       Date:  2017-09-02       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.