Hong Che1, Yi Liu2, Meng Zhang2,3, Jialin Meng2, Xingliang Feng2, Jun Zhou2, Chaozhao Liang2. 1. Department of Cardiac Surgery, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China (mainland). 2. Department of Urology, The First Affiliated Hospital of Anhui Medical University and Institute of Urology and Anhui Province Key Laboratory of Genitourinary Diseases, Hefei, Anhui, China (mainland). 3. Urology Institute of Shenzhen University, The Third Affiliated Hospital of Shenzhen University, Shenzhen University, Shenzhen, Guangdong, China (mainland).
Abstract
BACKGROUND Prostate cancer (PCa) is one of the major causes of cancer-induced death among males. Here, we applied integrated bioinformatics analysis to identify key prognostic factors for PCa patients. MATERIAL AND METHODS The gene expression data were obtained from the UCSC Xena website. We calculated the differentially expressed genes between PCa tissues and normal controls. Pathway enrichment analyses found cell cycle-related pathways might play crucial roles during PCa tumorigenesis. The genes were assigned into 22 modules established via weighted gene co-expression network analysis (WGCNA). RESULTS The results indicated that the purple and red modules were obviously linked to the Gleason score, pathological N, pathological T, recurrence, and recurrence-free survival (RFS). In addition, Kaplan-Meier curve analysis found 8 modules were markedly correlated with RFS, serving as prognostic markers for PCa patients. Then, the hub genes in the most 2 critical modules (purple and red) were visualized by Cytoscape software. Pathway enrichment analyses confirmed the above findings that cell cycle-related pathways might play vital roles during PCa initiation and progression. Lastly, we randomly chose the PILRß (also termed PILRB) in the red module for clinical validation. The immunohistochemistry (IHC) results showed that PILRß was significantly increased in the high-risk PCa population compared with low-/middle-risk patients. CONCLUSIONS We used integrated bioinformatics approaches to identify hub genes that can serve as prognosis markers and potential treatment targets for PCa patients.
BACKGROUND Prostate cancer (PCa) is one of the major causes of cancer-induced death among males. Here, we applied integrated bioinformatics analysis to identify key prognostic factors for PCapatients. MATERIAL AND METHODS The gene expression data were obtained from the UCSC Xena website. We calculated the differentially expressed genes between PCa tissues and normal controls. Pathway enrichment analyses found cell cycle-related pathways might play crucial roles during PCa tumorigenesis. The genes were assigned into 22 modules established via weighted gene co-expression network analysis (WGCNA). RESULTS The results indicated that the purple and red modules were obviously linked to the Gleason score, pathological N, pathological T, recurrence, and recurrence-free survival (RFS). In addition, Kaplan-Meier curve analysis found 8 modules were markedly correlated with RFS, serving as prognostic markers for PCapatients. Then, the hub genes in the most 2 critical modules (purple and red) were visualized by Cytoscape software. Pathway enrichment analyses confirmed the above findings that cell cycle-related pathways might play vital roles during PCa initiation and progression. Lastly, we randomly chose the PILRß (also termed PILRB) in the red module for clinical validation. The immunohistochemistry (IHC) results showed that PILRß was significantly increased in the high-risk PCa population compared with low-/middle-risk patients. CONCLUSIONS We used integrated bioinformatics approaches to identify hub genes that can serve as prognosis markers and potential treatment targets for PCapatients.
Prostate cancer (PCa) is the second leading cause of cancer-associated death for males among America and Europe populations [1]. PCapatients are initially sensitive to androgen deprivation therapy (ADT), which can reduce tumor burden and improve symptoms, but the responses are not durable after several years of treatment, after which, disease progression usually occurs [2,3]. Many factors have been proved to be associated with the risk of PCa, including family histology, heritability, diet, environment, age, and androgens [4-6]. Some studies estimated that genetic components are involved in up to 42–57% of PCa cases [7,8], and understanding the molecular mechanism of tumorigenesis in PCa would help establish better therapeutic strategies.Recently, new insights into the underlying molecular aspects of tumorigenesis have been generated by gene expression analyses [9-11]. Although several cancer-related molecular biomarkers have been discovered, because of tumor heterogeneity, a single gene still cannot precisely represent the genetic features of tumors. Recently, prognostic predicting signatures based on RNA expression profiles are becoming increasingly popular. Peng et al. [12] discovered a cluster of genes and reported a module that displayed better prediction rates than any of the separate genes in glioblastoma. In contrast to differentially expressed gene (DEG) analysis, co-expression network analysis mostly concentrates on gene-to-gene relationships and on these critical genes within each module. The intrinsic modules of coordinately expressed genes were uncovered in an unsupervised manner by co-expression network analysis [12], and pathway enrichment analysis was applied to identify the vital signaling, such as tumor progression and drug resistances related pathways [13]. Co-expression network analysis has also been involved in predicting the survival of breast cancer and gliomapatients [14,15].In the present study, we used a weighted gene co-expression network analysis (WGCNA) along with other analysis methods to assess the connection between TCGA RNA-seq data and clinical parameters to illustrate the key modules or hub genes linked to the clinical features [recurrence-free survival (RFS), or tumor grade]. We also used IHC assay to determine the expression of these critical genes in PCa tissues.
Material and Methods
Sample collection, differentially expressed gene calculation, and functional annotation
We obtained TCGA gene expression in PCa, along with clinical data, from the UCSC Xena website (), which includes 497 tumor and 67 normal samples. The normalized read count matrix for gene expression, produced by RSEM, was selected for differential expression analysis [16], while the RPKM matrix for gene expression was used for gene set enrichment analysis (GSEA) [17,18] and WGCNA. The differentially expressed genes (DEGs) were calculated by using R package DESeq2 [19] (the thresholds of Fold Change >2 or Fold Change <−2, Pαdjust <0.05). We used the R package “clusterProfiler” [20] to perform Gene Ontology (GO) and KEGG enrichment analyses.
Co-expression network construction
The squared coefficient of variance (CV2) of each gene expression (reads per kilobase per million mapped reads [RPKM]) across tumor samples was calculated. To capture the dependence of the CV2 of genes on their average expression level μ, we fit a curve with 95% confidence intervals (CI) to the observed data, using the parameterization:using the R package “statmod.” Genes with a high level of variance were defined as expression exceeding the specified threshold (upper 95% CI in this study). In total, 7005 variably expressed genes across cancerpatients were selected for downstream analysis.The R package “WGCNA” was applied to construct the co-expression network by the prepared gene expression matrix [21]. After the hierarchical clustering of all tumor samples based on gene expression profile, the Pearson’s correlation between each gene pair was calculated to determine the concordance of gene expression and we generated an adjacency matrix, which was then turned into a topological overlap matrix (TOM) [22]. Next, we conducted an average linkage hierarchical clustering, considering the TOM-based dissimilarity. Co-expression gene module sizes were restricted at a minimum of 30, and the parameters of soft-thresholding that could be used to illustrate high-positive connections at the expense of low correlations were set as a cluster of gene modules.
Clinical and survival analyses, and module pathway annotation
We collected the RFS information as the survival endpoints, then the R packages of “survival” [23] and “survminer” [24] were used to determine the connections between module/gene expression and RFS. The module genes were functionally annotated by GO and KEGG analyses, as detailed above.
Hub-gene identification and module network visualization
For a better network visualization, high-level linkages (for purple module: weight >0.02, and for red module: weight >0.1) of the co-expressed genes were retained. Cytoscape software was utilized to provide deeper network insights [25,26]. We found the hub genes by the highest module membership value (kME), which measures the relationship between each gene and ME.After relating each module to clinical features and calculating the related gene significance and module membership, we randomly chose 1 hub-gene for further validation using clinical samples [27].
Immunohistochemistry (IHC) validation
We collected clinical samples from the Department of Urology, the First Affiliated Hospital of Anhui Medical University. The sample slides were obtained from the Department of Pathology of our hospital. This study was approved by the ethics committee of the First Affiliated Hospital of Anhui Medical University. The PILRβ antibody (Product # PA5-55405) was bought from Thermo Fisher Scientific Company (MA, USA, 02451). All the detail IHC processes were as described in our previous publications [28,29].
Results
Differential expression of genes and GSEA analysis
To study the DEGs between PCatumor samples and normal controls, we analyzed the combined TCGA dataset. At first, principal component analysis (PCA) could successfully separate the normal and cancer gene sets (Figure 1A). We found that there were 1664 genes downregulated in cancer tissues compared to that in normal controls, whereas there were 2254 upregulated genes (Fold Change >2 or Fold Change <−2, Pαdjust <0.05; Figure 1B, Table 1). Furthermore, KEGG pathway enrichment analysis was conducted for these DEGs (Figure 1C), and the results indicated the DEGs were more commonly enriched in Alcoholism, Neuroactive ligand-receptor interaction, and cAMP signaling pathway. By comparing the tumor cases with normal controls, GSEA revealed that the top 5 significantly enriched pathways were Myogenesis [30], E2F [31], G2M Checkpoint [32], Coagulation, and MYC [33,34] signaling pathways (Figure 1D–1H), which have been proven to play critique roles during PCa initiation and progression.
Figure 1
KEGG and GSEA pathway enrichment analyses classified the difference between PCa tissues and normal controls. (A) Principal Component Analysis (PCA) plots for the TCGA dataset based on RSEM data. (B) Volcano Plot for DEGs between PCa tissues and normal controls. (C) KEGG pathway enrichment analysis for DEGs between PCa tissues and normal controls. (D–H) Mainly GSEA plots by comparing the PCa tissues and normal controls.
Table 1
Top DEGs generated from PCa tissues and normal controls.
Detection of gene co-expression modules using WGCNA
We applied the WGCNA algorithm to identify the gene co-expression modules from the combined TCGA dataset. Initially, we used sample clustering to detect outliers, and 2 samples were removed by pool analysis (Figure 2A). WGCNA analysis elucidated 9 co-expressed modules, ranging in size from 39 (royal blue) to 2383 (turquoise) (each module had its own specific color assigned) (Figure 2B). These modules were used for further analyses below.
Figure 2
Sample clustering and network Heatmap plot. (A) Sample clustering for PCa samples extracted from the TCGA database. (B) Constructing the gene co-expression network (yellow part) and identifying modules based on co-expression network (up and left).
Correlation between modules and clinicopathological parameters of PCa
The Pearson’s connections between the clinical features and Eigengene module were calculated, and the purple module was obviously and positively connected with age (at initial pathological diagnosis, r2=0.17, P-value blue, Salmon, Magenta, Red, and Grey60 modules were significantly associated with Gleason score. We also analyzed the relationship of each module expression with new tumor events, biochemical recurrence, pathological T, pathological N, and clinical stage, and found that purple, salmon, red and midnight blue modules were positively associated with biochemical recurrence (Figure 3B–3F). Furthermore, royal blue, purple, salmon, pink, red, magenta, grey60, and midnight blue modules were positively associated with new tumor events. The purple, salmon, red, yellow, brown, and magenta modules were remarkably linked to the pathological N, while only purple and green-yellow modules were strongly positively associated with the pathological T.
Figure 3
The correlation between these modules and clinical features. (A) Heatmap correlation analysis between the 22 modules and age (at diagnosis), Gleason score, and PSA levels. (B–F) The correlation between these modules and biochemical recurrence (B), new-tumor event (C), pathological T (D), pathological N (E), and Clinical T (F).
Module clinical significance
The RFS status has a more critical role in reflecting the prognosis of PCa than overall survival (OS). Here, we explored the association of the modules with RFS, considering their biological significance. The Cox regression test calculated the hazard ratios (HRs) and their corresponding P values for each dichotomized module (Figure 4). We found the red (P=0.00018), purple (P=0.00045), salmon (P=0.0011), pink (P=0.0026), magenta (P=0.004), midnight blue (P=0.034), grey60 (P=0.038), and brown (P=0.041) modules were significantly associated with RFS. Interestingly, higher expression of the module genes indicated poor RFS. The results of survival analysis for these modules further suggested their biological significance, particularly as prognostic markers for patients with PCa.
Figure 4
(A–H) The connection between module gene expression and recurrence-free survival (RFS).
Enrichment analysis of the purple module and visual analyses of the genes within purple and red modules
Based on the significance of these modules, we chose purple (150 genes) and red (180 genes) modules as representatives to investigate their functional classification. For purple module, molecular function (MF), biological process (BP), Reactome, and KEGG analyses were applied (Supplementary Figure 1A–1C). The KEGG enrichment analysis showed that genes of the purple module were mostly enriched in Cell Cycle, Valine, Leucine and Isoleucine Degradation, Fanconi Anemia, and Ascorbate and Aldarate Metabolism pathways (Supplementary Figure 1D), which were partly consistent with GSEA enrichment results (Figure 1). For the red module, the Reactome pathway analysis indicated the module genes were mostly enriched in Fatty acid metabolism, Biosynthesis of DHA–derived SPMs, Biosynthesis of specialized pro-resolving mediators, and CYP2E1 reactions pathways.We used Cytoscape software to visualize the critical hub genes within the purple and red modules. Results were consistent with the above findings, showing a high correlation of these genes with others in each module, and that they may play a significant role during PCa imitation or progression (Figure 5).
Figure 5
Visualize the purple and red modules to identify the Hub genes. Network visualization for hub genes in the purple module (A) and in red module (B).
Survival analysis and receiver operating characteristic (ROC)
To further investigate the importance of these hub genes, we tested their role in predicting RFS of PCapatients. Based on the connectivity of the hub genes, we chose the top 5 positive and highest 5 negative hub genes in red and purple modules, and the survival analyses showed that all these hub genes could serve as prognostic markers for PCapatients (Supplementary Figure 2). ROC curve analysis showed that the PILRB, MAPK8IP3, KIF18B, LOC91316, SCNN1D, FAM156A, LY6G5B, TTLL3, SNHG12, C17orf56, and AHSA2 genes have moderate RFS predicting capacity for PCapatients (Supplementary Figure 3).
IHC validations
Considering the significance of purple and red modules, owing to their association with pathological states and RFS, we used the IHC assay to validate the clinical relevance of the top hub genes. Based on hub-gene visualization (Figure 5) and the online RFS calculation website (), PILRβ (also termed PILRB) in the red module was chosen for further clinical validation. We compared the PILRβ expression in different risk subgroups of PCa tissues, and found it was significantly upregulated in high-risk compared with low-/middle-risk PCa tissues (Figure 6, Table 2). All these results confirmed the accuracy and significance of our findings.
Figure 6
Clinical evidence for PILRβ expressions in PCa tissues. Representative images of IHC staining in low-/middle and high-risk PCa tissues are presented on the left (100× above and 400× below), and the static results of immunoscore are presented on the right. *** p<0.01.
Table 2
Clinical data of PCa patients.
Group
Low grade
High grade
Total number
8
15
Age (year)
67
69
tPSA (ng/ml)
7.72
38.07
fPSA/tPSA (ng/ml)
0.12
0.13
Pathological grade (n)
≤T2c
8
≤T2c
4
≥T3a
0
≥T3a
11
Lymphatic metastasis (n)
N0
8
N0
14
N1
0
N1
1
Metastasis (n)
M0
8
M0
14
M1
0
M1
1
Gleason score (n)
≤7
8
≤7
2
>7
0
>7
13
tPSA – total prostate specific antigen; fPSA – free prostate specific antigen; n – number.
Discussion
Gene signatures obtained from the transcriptome-based analyses can assist in the risk classification of PCapatients. Previously, several studies have tried to define gene signatures for predicting the survival rate or risk of recurrence of cancers, including bladder cancer [35], renal disease [36], and liver cancer [37]. Here, we applied the WGCNA algorithm to analyze the mRNA expression profiles containing 497 PCa samples to identify the genes associated with clinicopathological features in PCa and prognosis markers for PCapatients.We obtained the DEGs between PCatumor samples and normal controls and used KEGG pathway analysis to classify these genes. We found these DEGs were commonly enriched in Alcoholism, Neuroactive ligand-receptor interaction, and cAMP signaling pathways. For the GSEA analysis, results showed the top 5 significantly enhanced channels were Myogenesis [30], E2F [31], G2M Checkpoint [32], and Coagulation and MYC [33,34] signaling pathways, which have been proven to play critical roles during PCa initiation and progression. Next, we conducted WGCNA analysis and identified 22 co-expressed modules. We analyzed the Pearson’s correlation between the clinical features with each module and found the purple module to be significantly and positively associated with age, Gleason score, and PSA level.We found some modules were remarkably associated with new tumor events, biochemical recurrence, pathological T, pathological N, and clinical stage, particularly for purple, red, and salmon modules. We found the red, purple, salmon, pink, magenta, midnight blue, grey60, and brown modules were significantly associated with RFS. The results of survival analysis based on these modules further indicated their biological significance, particularly as prognostic markers for PCapatients. Based on the importance of these modules, we chose purple and red modules as representatives to investigate their functional classification. KEGG enrichment analyses showed that these genes were mostly enriched in Cell Cycle and Aldarate Metabolism pathways, supporting their significant role in tumor development, consistent with the GSEA results.Hub-gene analyses found that KIF23, PARM1, DONSON, MCM3, PCCA, and MTL5 played a critical role in connecting with other genes in the purple module, while PILRβ, FAM156A, and AHSA2 genes were strongly connected with other genes in the red module. We also tested the predictive values of these top positive/negative hub genes and found most of them displayed moderate predictive ability. Furthermore, based on the importance of these genes, we validated their expression in our own clinical samples. Before that, we confirmed the prognostic role of these genes as markers for RFS based on an online database and found the PILRβ was consistent with the module analysis, and that it is significantly associated with RFS of PCapatients. Therefore, we validated the PILRβ expression in our own clinical samples and found it was significantly higher in the high-risk tissues compared with the low-/middle-risk PCa tissues. Unfortunately, due to lack of follow-up information (we collected the clinical samples for less than 1 year, and the case number was also limited), we did not perform survival analysis in our cohort.Paired immunoglobulin-like type 2 receptor (PILR) β contributes to the inflammatory process regulation in response to pathogen infection and plays a critical role in host-disease resistance/risk [38]. In addition, we also found that PILRβ is a DAP12-binding partner, which is expressed on both human and mouse myeloid cells. CD99 plays a pivotal role during the migration of immune cells to inflammation sites [39]. Recently, researchers found that activating signals from PILRβ can increase the production of IL-27 and IL-10 in effector T cells and inhibit excessive inflammatory responses [40], which may be related to tumor immune tolerance. Prostate cancer originates from aberrant epithelial cells, and we found that PILRβ was expressed at higher levels in high-risk PCapatients, suggesting that genes aberrantly expressed in epithelial cells could regulate the immune process to influence the disease initiation and progression. Moreover, upregulation of PILRB was also identified in p185 (BCR-ABL)-positive acute lymphoblastic leukemia [41]. One study also found that shRNA knockdown of the upregulated PILRB could improve patients’ leukemia-related survival [42]. In general, few studies have explored the possible regulatory role of PILRB in PCa initiation, progression, and prognosis prediction, and we plan to perform further research to explore the functional role of PILRB in PCa.Our study has certain strengths and weaknesses. Firstly, we performed GSEA analyses for the differentially expressed genes (DEGs) and found some similarities and differences compared with regular KEGG or GO pathway enrichment analyses. Secondly, similar to several previous studies, we analyzed the connection between module expression and clinical parameters. We also added a new part to investigate whether the modules could serve as prognostic factors. Thirdly, for these critical hub genes within the 2 key modules (red and purple), we further investigated their RFS predicting ability by ROC curve analysis, showing that most of these hub genes displayed moderate predictive ability. Lastly, considering the significance of the purple and red modules, owing to their association with pathological states and RFS, we used IHC assay to validate the clinical relevance of the top hub genes. Based on the hub-gene visualization and ROC curve analysis, PILRβ in the red module was chosen for further clinical validation. We compared the PILRβ expression in different risk subgroups of PCa tissues, showing that it was significantly upregulated in high-risk compared with low-/middle-risk PCa tissues. However, the present study has 2 limitations that need further research. Firstly, we are still collecting clinical samples and follow-up information, and did not analyze the role of PILRβ in predicting the RFS of PCapatients in the present study. Secondly, we tried to confirm our data in an external database, Memorial Sloan Kettering Cancer Center (MSKCC), and most of the other results were consistent, but not survival analyses. We intend to address these questions in our future work.
Comclusions
To conclude, a total of 22 co-expression modules were derived from the TCGA-PCa dataset via WGCNA, and we found that the purple and red modules were remarkably associated with clinicopathological features and prognosis of PCa. The key hub genes within purple and red modules were successfully validated in clinical samples, further suggesting the accuracy and significance of our findings. These results will promote better diagnosis and prognostic prediction for patients with PCa.Pathway enrichment for the purple module. (A) GO-biological process. (B) GO-cellular component. (C) GO-molecular function. (D) KEGG pathway enrichment. GO – Gene Ontology.Survival curve showed the prognostic value of the top 10 hub genes in red and purple modules (A, purple module; B, red module).The ROC curve showed the predictive value of the top 10 hub genes in red and purple modules (A, purple module; B, red module).
Authors: Cristina M Tato; Barbara Joyce-Shaikh; Antara Banerjee; Yi Chen; Manjiri Sathe; Sarah E Ewald; Man-Ru Liu; Daniel Gorman; Terrill K McClanahan; Joseph H Phillips; Paul G Heyworth; Daniel J Cua Journal: PLoS One Date: 2012-03-27 Impact factor: 3.240
Authors: Yirong Li; Caihong X Li; Huihui Ye; Fei Chen; Jonathan Melamed; Yi Peng; Jinsong Liu; Zhengxin Wang; Hui C Tsou; Jianjun Wei; Paul Walden; Michael J Garabedian; Peng Lee Journal: J Cell Mol Med Date: 2008-02-08 Impact factor: 5.310