Qiufeng Pan1, Longwang Wang2, Hao Zhang1, Chaoqi Liang1, Bing Li1. 1. Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China (mainland). 2. Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China (mainland).
Abstract
BACKGROUND Although the mortality rates of clear cell renal cell carcinoma (ccRCC) have decreased in recent years, the clinical outcome remains highly dependent on the individual patient. Therefore, identifying novel biomarkers for ccRCC patients is crucial. MATERIAL AND METHODS In this study, we obtained RNA sequencing data and clinical information from the TCGA database. Subsequently, we performed integrated bioinformatic analysis that includes differently expressed genes analysis, gene ontology and KEGG pathway analysis, protein-protein interaction analysis, and survival analysis. Moreover, univariate and multivariate Cox proportional hazards regression models were constructed. RESULTS As a result, we identified a total of 263 dysregulated genes that may participate in the metastasis of ccRCC, and established a predictive signature relying on the expression of OTX1, MATN4, PI3, ERVV-2, and NFE4, which could serve as significant progressive and prognostic biomarkers for ccRCC. CONCLUSIONS We identified differentially expressed genes that may be involved in the metastasis of ccRCC. Moreover, a predictive signature based on the expression of OTX1, MATN4, PI3, ERVV-2, and NFE4 could be an independent prognostic factor for ccRCC.
BACKGROUND Although the mortality rates of clear cell renal cell carcinoma (ccRCC) have decreased in recent years, the clinical outcome remains highly dependent on the individual patient. Therefore, identifying novel biomarkers for ccRCC patients is crucial. MATERIAL AND METHODS In this study, we obtained RNA sequencing data and clinical information from the TCGA database. Subsequently, we performed integrated bioinformatic analysis that includes differently expressed genes analysis, gene ontology and KEGG pathway analysis, protein-protein interaction analysis, and survival analysis. Moreover, univariate and multivariate Cox proportional hazards regression models were constructed. RESULTS As a result, we identified a total of 263 dysregulated genes that may participate in the metastasis of ccRCC, and established a predictive signature relying on the expression of OTX1, MATN4, PI3, ERVV-2, and NFE4, which could serve as significant progressive and prognostic biomarkers for ccRCC. CONCLUSIONS We identified differentially expressed genes that may be involved in the metastasis of ccRCC. Moreover, a predictive signature based on the expression of OTX1, MATN4, PI3, ERVV-2, and NFE4 could be an independent prognostic factor for ccRCC.
Clear cell renal cell carcinoma (ccRCC) is the most common malignancy in the kidneys, which has increasing incidence and mortality rates worldwide [1]. Treatments for localized ccRCC can vary from radio-frequency ablation to partial or radical nephrectomy; however, once RCC progresses to distant metastasis, the curative effect of current targeted drug therapies is limited [2]. Additionally, when first diagnosed, approximately 30% patients already have metastasis [1]. Therefore, it is urgent to understand the underlying mechanism of metastasis and to identify novel biomarkers with greater prognostic values.The TNM staging system has been used for over 80 years and is important for estimating the outcome of various cancers; however, it provides an incomplete prognostic value [3-5]. Clinical outcomes can differ significantly among patients with the same tumor stage [6]. Despite surgical removal of the tumor, a subgroup of patients experience recurrence, indicating that at the time of curative surgery, the metastasis was already present [7]. However, no consensus was reached regarding the surveillance protocols of RCC, and no available tumor-associated biomarkers can predict recurrence in patients who may have benefited from earlier therapy [8]. Previous studies in colorectal cancer proposed several gene signatures and proved to be useful in predicting prognosis [9-11]. In this study, we divided patients from the Cancer Genome Atlas database into a non-metastasis group and a metastasis group in order to screen the differently expressed genes. Furthermore, we constructed a risk scoring system based on upregulated genes involved in metastasis to identify a multi-gene signature for use as an independent predictor for ccRCC.
Material and Methods
Data collection
The TCGA database contains large cohorts of genomic abnormalities and clinical information across the world, and is publicly available. RNA sequencing counts data from the ccRCC cohort, which consists of 539 tumor samples and 72 normal tissues, were obtained from the TCGA data portal (). Clinical data pertaining to patients’ age, gender, grade, stage, survival and recurred/progressed outcome were also acquired from the TCGA data portal. We divided patients based on N stage and M stage into 2 groups. Patients with both M0 and N0 stage were assigned to the non-metastasis group, whereas M1 and/or N1 patients were assigned to the metastasis group.
Identification of differentially expressed genes (DEGs)
We identified the DEGs using the edgeR package, with a cutoff of adj.p-value <0.05 and a |logFC| >2 [12]. DEGs were visualized with volcano plot through the gplots package in R (version 3.5.2).
Enrichment analysis of DEGs
We performed a functional enrichment analysis of the DEGs using DAVID (Database for Annotation, Visualization, and Integrated Discovery) to determine the gene ontology (GO) categories by using cellular component (CC), molecular function (MF), or biological processes (BP), as well as KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway [13]. P<0.05 was defined as significant enrichment. An online web tool was used to visualized these processes ().
Construction of PPI network
We used the STRING database to retrieve the protein-protein interaction (PPI) network of DEGs, and we used Cytoscape software to reconstruct and visualize the network [14,15]. Individual network modules with 10 or more nodes were shown.
Univariate and multivariate Cox analysis to screen the candidate genes
CcRCC samples were separated into 2 groups according to the median gene expression. Then, age (≤60/>60), sex (male/female), grade (G1–G2/G3–G4), stage (I–II/III–IV), T stage (T1–T2/T3–T4), N stage (N0/N1), M stage (M0/M1), specific gene, and survival data (time and state) were all included into the Cox regression model to preform univariate and multivariate Cox analysis using SPSS 22.0 (IBM Corporation, Armonk, NY, USA).
Establishment of a prognostic signature based on candidate genes
The stepwise multivariate Cox regression analysis model was constructed based on the candidate genes to extract the mRNA-based model with the best predictive ability. The criteria for inclusion and exclusion was set as P<0.05. Subsequently, the risk score for each patient was computed using the mRNA-based prognostic model as follows: Risk score=expRNA1*βRNA1+ expRNA2*βRNA2+expRNA3*βRNA3+…expRNAn*βRNAn, where expRNA was the mRNA expression level and βRNA referred to the regression coefficient derived from the multivariate Cox hazards regression analysis. Based on the risk score for each patient, patients from the TCGA database were separated into 2 groups: a low-risk group and a high-risk group. Kaplan-Meier survival analysis was performed to assess differences in overall survival and disease-free time of patients using a log-rank test in GraphPad Prism 7.0. In addition, the receiver operating characteristic (ROC) curve was utilized to evaluate the specificity and sensitivity of the survival and disease-free prediction by the area under the curve using the R package “survivalROC” [16]. Heatmaps and clustering were generated based on the ClustVis open web tool [17].
Predictive value assessment
To evaluate the clinical value of our risk scoring system, we analyzed the clinical characteristics and risk scores in univariate Cox regression. We included factors with P<0.05 into the multivariate Cox regression analysis model. Then, a P<0.05 was treated as an independent prognostic factor. Moreover, to assess the relationship between risk level and clinical characteristics, we regrouped the patients based on age, sex, grade, stage, T stage, M stage, N stage, vital status, and risk level. A P<0.05 was considered as statistically significant using the chi-square test.
Results
Differentially expressed genes related to the metastasis of ccRCC
In this study, we defined M0 and N0 patients as the non-metastasis group (198 cases), while M1 and/or N1 patients (89 cases) were defined as the metastasis group. Altogether, 263 genes were found to be dysregulated according to the cutoff criteria, among which, 101 genes were upregulated and 162 gene were downregulated (Figure 1, Supplementary Table 1). Functional enrichment analysis of gene ontology revealed that dysregulated genes were mainly enriched in sequence-specific DNA binding, receptor binding, the extracellular region, the integral component of the plasma membrane, ion transmembrane transport, and insulin receptor signaling pathway. KEGG pathway analysis indicated that genes were primarily enriched in neuroactive ligand-receptor interaction and synaptic vesicle cycle (Figure 2). Moreover, the PPI network consisted of 10 modules, which included 255 nodes and 316 edges. The most significant module is shown in Figure 3.
Figure 1
The volcano plot for DEGs related to metastasis. The x-axis is -log10(FDR) and the y-axis is logFC. The red dots represent upregulated genes and green dots represent downregulated genes.
Figure 2
Go term and KEGG pathway analysis for DEGs. (A) Top 10 molecular function (MF) processes. (B) Cellular component (CC). (C) Top 10 biological processes (BP). (D) KEGG pathway analysis.
Figure 3
The most significant module. The color and the size of a node indicates the number of proteins interacting with the designated protein.
Survival-related genes by Cox regression analysis
To identify key genes that may affect overall survival of patients, we performed Cox proportional hazard regressions analysis on upregulated genes. Twenty key genes were demonstrated to influence overall survival: OTX1, FOXE1, FAM83A, HMGA2, KRT6A, DPYSL5, ANXA8, MATN4, ROS1, CSMD3, MAGEC3, AMER2, CPLX2, PI3, KRT13, ERVV-2, ANKFN1, VTN, NFE4, and ZNF114 (Figure 4).
Figure 4
(A–T) Survival-related upregulated genes. Kaplan-Meier survival curves were generated for genes with P<0.05 in multivariate Cox regression analysis.
Construction of a risk scoring system based on candidate genes
For the purpose of extracting a signature that possesses the best predictive efficacy, 20 key genes were subjected to the stepwise multivariate Cox regression model. Results from the model revealed a total of 5 genes that proved to be significant survival predictors. The related information of these 5 genes is shown in Table 1. Subsequently, the risk score for each patient was computed as follows: expOTX1*0.725+expMATN4*0.473+expPI3*0.548+expERVV-2*0.458+expNFE4*0.410. According to the median risk score, we assigned these scores to the low- or high-risk group. Overall survival analysis showed that the low-risk group had better prognoses compared with the high-risk group (Figure 5A). The prognostic ability of the 5-gene signature was assessed by the AUC value of the ROC curve. The AUC was 0.687 for 3-year and 0.695 for 5-year overall survival, indicating a good performance of the 5-gene signature (Figure 5C, 5E). Risk scores in the low-risk group ranged from 0 to 0.215232506445 and ranged from 0.215490360701 to 360.615372760823 in the high-risk group (Figure 5G). Disease-free survival analysis revealed a significant difference between the 2 groups, with the low-risk group having a longer disease-free time (Figure 5B). In the ROC curve, the AUC for 3-year disease-free survival was 0.674 and 0.681 for 5-year disease-free survival (Figure 5D, 5F). Risk scores in the low-risk group ranged from 0 to 0.187870897657 and from 0.194574172497 to 360.615372760823 in the high-risk group (Figure 5H). Figure 6 shows the expression patterns of all the 5 genes in the 2 groups. The expression of OTX1, MATN4, and PI3 were significantly higher in the high-risk group in the 2 cohorts (Figure 7).
Table 1
Overall information of 5 genes constructing the prognostic signature.
Gene
Gene name
Gene type
Hazard ratio
Coefficient
P value
OTX1
Orthodenticle Homeobox 1
Protein-coding
2.064
0.725
<0.0001
MATN4
Matrilin 4
Protein-coding
1.605
0.473
0.007
PI3
Peptidase Inhibitor 3
Protein-coding
1.73
0.548
0.002
ERVV-2
Endogenous Retrovirus Group V Member 2
Protein-coding
1.581
0.458
0.009
NFE4
Nuclear Factor, Erythroid 4
Protein-coding
1.506
0.41
0.015
Figure 5
The 5-gene predictive signature in ccRCC. (A) Kaplan-Meier curve of OS in the low- and high-risk groups. (B) Kaplan-Meier curve of DFS in the low- and high-risk groups. (C) ROC curve for the 3-year survival prediction by the 5-gene signature. (D) ROC curve for the 3-year disease-free survival prediction by the 5-gene signature. (E) ROC curve for the 5-year survival prediction. (F) ROC curve for the 5-year disease-free survival prediction. (G) Risk scores distribution among OS cohort. (H) Risk scores distribution among DFS cohort.
Figure 6
Expression pattern of the 5-gene signature in OS and DFS cohort. (A–C). In the OS cohort, the expression levels of OTX1, MATN4, and PI3 were significantly higher in the high-risk group. (D–F). In the DFS cohort, the expression levels of OTX1, MATN4, and PI3 were significantly higher in the high-risk group.
Figure 7
Heatmap of the 5 genes. (A) Heatmap of the OS cohort. (B) Heatmap of the DFS cohort. Red indicates the high-risk group, while blue indicates the low-risk group.
Assessment of gene signature prognostic value
Univariate Cox regression analysis of the prognostic power of our risk scoring system showed that age, grade, stage, T stage, N stage, M stage, and risk level were all indicators of poor outcome. Then, these 7 indexes were entered into the multivariate Cox regression model, showing that risk level could be treated as an independent prognostic factor (Table 2). Furthermore, as is shown in Table 3, based on the chi-square test, risk level was significantly correlated with sex, grade, tumor stage, T stage, N stage, M stage, and vital status. Collectively, our results demonstrate that our 5-gene signature is a robust tool for use in predicting prognosis and recurrence.
Table 2
Univariate and multivariate analysis of risk level and patient survival.
Variables
Univariate analysis
Multivariate analysis
HR*
95% CI
P value
HR
95% CI
P value
Overall survival
Age (years)
≤60 (257)
1.683
1.228–2.306
0.001
1.540
1.122–2.114
0.007
>60 (246)
Sex
Male (325)
0.791
Female (178)
Stage
I+II (303)
4.313
3.092–6.015
<0.0001
2.511
1.256–5.019
0.009
III+IV (200)
T stage
T1–T2 (321)
3.482
2.534–4.785
<0.0001
0.681
T3–T4 (182)
N stage
N0 (487)
3.925
2.124–7.255
<0.0001
0.093
N1 (16)
M stage
M0 (425)
4.572
3.21–6.294
<0.0001
2.202
1.500–3.232
0.0001
M1 (78)
Grade
G1–G2 (232)
2.644
1.860–3.759
<0.0001
0.051
G3–G4 (271)
Risk level
Low risk (251)
2.592
1.859–3.612
<0.0001
1.779
1.251–2.530
0.001
High risk (252)
HR estimated from Cox proportional hazard regression model; multivariate models were adjusted for age, grade, T, N, M, and stage.
HR – hazard ratio; CI – confidence interval.
Table 3
Relationship between clinical parameters and risk level.
Subgroup
High risk
Low risk
Total
P value*
Age
0.503
≤60
125 (24.85%)
132 (26.24%)
257
>60
127 (25.25%)
119 (23.66%)
246
Sex
0.008
Male
180 (35.79%)
151 (30.01%)
331
Female
72 (14.31%)
100 (19.88%)
172
Grade
<0.0001
G1–G2
86 (17.10%)
146 (29.03%)
232
G3–G4
166 (33.00%)
105 (20.87%)
271
Stage
<0.0001
I+II
119 (23.66%)
184 (36.58%)
303
III+IV
133 (26.44%)
67 (13.32%)
200
T stage
<0.0001
T1–T2
132 (26.24%)
189 (37.57%)
321
T3–T4
120 (23.86%)
62 (12.33%)
182
N stage
<0.0001
N0
111 (45.87%)
114 (47.11%)
225
N1
16 (6.61%)
1 (0.41%)
17
M stage
<0.0001
M0
178 (37.47%)
219 (46.11%)
397
M1
57 (12.00%)
21 (4.42%)
78
Vital status
<0.0001
Alive
141 (28.03%)
200 (39.76%)
341
Dead
111 (22.07%)
51 (10.14%)
162
Chi-square test was used.
Discussion
CcRCC has been shown to display distinct variability in clinical outcome, possibly due to the intrinsic molecular heterogeneity, which remains unclear, especially with regard to the mechanism of distant metastasis [18]. Moreover, the clinically available parameters, such as TNM stage and Fuhrman grade, are indispensable for prognostic prediction [19]. Nevertheless, there remains an urgent need to detect prognostic biomarkers due to the high heterogeneity in ccRCC.In the current study, we performed bioinformatic analysis between the non-metastasis and metastasis ccRCC group to identify genes involved in metastasis. As a result, we found that 263 genes were dysregulated; functional enrichment analysis of these genes revealed that dysregulated genes were primarily enriched in sequence-specific DNA binding, receptor binding, extracellular region, integral component of plasma membrane, ion transmembrane transport, insulin receptor signaling pathway, neuroactive ligand-receptor interaction, and synaptic vesicle cycle. Most importantly, we identified a 5-gene panel signature (OTX1, MATN4, PI3, ERVV-2, and NFE4) after the Cox proportional hazards regression analysis. Then, a risk score was acquired by combing the 5 genes. Recently, Wei et al. also identified key genes involved in the metastasis of ccRCC using similar bioinformatics methods [20]. However, in our study, we make our inclusion criteria clear with regard to the metastasis and non-metastasis groups. Moreover, we calculated each patient’s risk score based on the 5-gene signature. The 5-gene signature could independently predict overall survival for ccRCC patients, demonstrating that this signature might be useful in clinical practice.OTX1 encodes a member of the Bicoid sub-family of homeodomain-containing transcription factor, which may play a role in sensory and brain organ development. It has been described as a vital molecule for axon refinement [21]. Terrinoni et al. demonstrated that the p53 protein can directly induce OTX1 expression by acting on its promoter in breast cancer, and Figueira-Muoio et al. revealed that the OTX pathway is important in medulloblastomas development [22,23]. OTX1 was also found to promote colorectal cancer progression in vitro through epithelial-mesenchymal transition and hepatocellular carcinoma progression by regulation of the ERK/MAPK pathway [24,25]. In bladder cancer, OTX1 combined with FGFR3 and TERT can function as a surveillance biomarker [26]. However, the role of OTX1 in ccRCC is still unknown. PI3, also called elafin, encodes an elastase-specific inhibitor that functions as an antimicrobial peptide [27,28]. Caruso et al. demonstrated that elafin predicts poor outcome in ovarian and breast cancerpatients, and it may play a role in tumor dormancy; moreover, it has been shown that elafin is an important therapeutic target for breast and ovarian carcinoma [29-31]. MATN4, a member of the von Willebrand factor A domain-containing protein family, has not been widely studied in cancer to date [32]. A study showed that under acute stress, CXCR4 and MATN4 are involved in the regulation of hematopoietic stem cells proliferation and expansion [33]. ERVV-2 is functionally important in reproduction, and NFE4 is involved in preferential expression of the gamma-globin genes in fetal erythroid cells [34,35]. These 2 genes have not been well defined in cancer biology, particularly in ccRCC.In summary, our study used an integrated analysis to identify differentially expressed genes that participate in metastasis of ccRCC. Furthermore, we constructed a 5-gene signature with a quantitative index that exhibited an independent prognostic value. In the future, this 5-gene signature may be used to identify patients who need regional lymph node dissection during radical nephrectomy [36]. Since these 5 genes are correlated with poor outcome, they might be therapeutic targets for ccRCC. However, in vivo and in vitro studies are still needed to reveal the biological functions of these predictive mRNAs in ccRCC.
Conclusions
We identified differentially expressed genes that may participate in the metastasis of ccRCC. More importantly, we established a predictive signature based on the expression of OTX1, MATN4, PI3, ERVV-2, and NFE4, which could serve as significant progressive and prognostic biomarkers for ccRCC.Differentially expressed genes involved in metastasis in ccRCC.
Supplementary Table 1.
Differentially expressed genes involved in metastasis in ccRCC.
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: Jean-Jacques Patard; Emmanuelle Leray; Nathalie Rioux-Leclercq; Luca Cindolo; Vincenzo Ficarra; Amnon Zisman; Alexandre De La Taille; Jacques Tostain; Walter Artibani; Claude C Abbou; Bernard Lobel; François Guillé; Dominique K Chopin; Peter F A Mulders; Christopher G Wood; David A Swanson; Robert A Figlin; Arie S Belldegrun; Allan J Pantuck Journal: J Clin Oncol Date: 2005-04-20 Impact factor: 44.544
Authors: Quan Zhao; Wenlai Zhou; Gerhard Rank; Rosemary Sutton; Xi Wang; Helen Cumming; Loretta Cerruti; John M Cunningham; Stephen M Jane Journal: Blood Date: 2005-11-01 Impact factor: 22.113
Authors: Da Wei Huang; Brad T Sherman; Qina Tan; Jack R Collins; W Gregory Alvord; Jean Roayaei; Robert Stephens; Michael W Baseler; H Clifford Lane; Richard A Lempicki Journal: Genome Biol Date: 2007 Impact factor: 13.583