Literature DB >> 29774095

A long non-coding RNA signature for predicting survival in patients with colorectal cancer.

Yi-Lin Wang1, Jun Shao2, Xiaohong Wu3, Tong Li4, Ming Xu2, Debing Shi5.   

Abstract

Dysregulation of long non-coding RNA (lncRNA) plays important roles in cancer development and progression. In this work, we attempted to develop a lncRNA signature to improve prognosis prediction of colorectal cancer. A comprehensive analysis for the lncRNA expression and corresponding clinical information of 344 colorectal patients has been performed based on the data from The Cancer Genome Atlas (TCGA). We randomly divided TCGA data into a training set (n = 172) and a testing set (n = 172). A four-lncRNA signature has been established which was significantly associated with the overall survival of colorectal cancer patients. Based on the four-lncRNA signature, the training set can be classified into high-risk and low-risk groups with significantly different survival. The result can be further validated in the testing dataset and another independent dataset. Further analyses suggested that the prognostic power of the four-lncRNA signature was independent of other clinical variables. The identification of lncRNA signature indicated that lncRNAs could be novel independent biomarkers for predicting the survival in patients with colorectal cancer.

Entities:  

Keywords:  colorectal cancer; long non-coding RNA; prognosis

Year:  2017        PMID: 29774095      PMCID: PMC5955136          DOI: 10.18632/oncotarget.23431

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Colorectal cancer (CRC) is the third most common malignancy, and is the major cause of cancer-related death worldwide [1, 2]. The incidence of colorectal cancer is gradually increasing in the developed areas. To date, surgery followed by adjuvant therapy is still the most common option for CRC patients. Despite an improved understanding of the molecular mechanism of CRC, the overall survival (OS) of CRC patients has not been dramatically improved and the four-year survival rate remains very low [3]. It is an urgent need to identify novel independent biomarkers for the diagnostic and prognosis of CRC. With the advancements of transcriptome profiling, the roles of long non-coding RNAs (lncRNAs) have received great attention in the development of human cancer researches. LncRNAs are an important category of non-coding RNAs with little or no protein-coding capacity [4, 5]. It has been documented that lncRNAs play important roles in regulating gene expression at transcriptional, posttranscriptional and epigenetic levels [4, 6–8]. Moreover, lncRNAs can participate in various biological processes and pathways, such as cell growth and immune response [7, 9, 10]. Recently, many lncRNAs have been examined to play critical oncogenic or tumor suppressive roles in various types of cancers [11-14]. Furthermore, several lncRNAs have been identified to be novel independent biomarkers for cancer prognosis [15-20]. As for colorectal cancer, recent studies have also revealed that some lncRNAs, such as PANDR, AFAP1-AS1 and TUG1, are dysregulated in CRC patients and play important role in the tumorigenesis [21-25]. We here attempted to develop a lncRNA signature to improve prognosis prediction of CRC. We identified a four-lncRNA signature by using the sample-splitting method. Our results demonstrated the four-lncRNA signature can provide a novel insight into the understanding of the underlying molecular mechanism of CRC.

RESULTS

Identification of prognostic lncRNAs from the training dataset

The 344 CRC patients were randomly divided into a training dataset (n = 172) and a testing dataset (n = 172). At first, we identified the prognostic lncRNAs from the training set. A univariate Cox regression analysis was performed to evaluate the association between lncRNA expression and overall survival of CRC patients. Based on the threshold of P-value < 0.01, four lncRNAs were identified to be significantly correlated with overall survival of CRC patients. The detailed information of these four lncRNAs was showed in Table 1. Positive coefficients represent that higher expression profiles were associated with shorter overall survival (SPRY4-IT1), whereas negative coefficients represent that higher expression level of lncRNA expression was associated with longer survival (LINC01133, Loc554202 and RP11-727F15.13).
Table 1

The detailed information of four prognostic lncRNAs significantly associated with overall survival in patients with CRC

Gene symbolP valueaHazard ratioaCoefficientb
SPRY4-IT13.71E–041.6370.322
RP11-727F15.132.88E–030.746–0.231
Loc5542026.38E–030.560–0.134
LINC011331.39E–040.751–0.336

a, bDerived from the univariate and multivariate Cox regression analyses in CRC patients of the training dataset.

a, bDerived from the univariate and multivariate Cox regression analyses in CRC patients of the training dataset.

A four-lncRNA signature for predicting overall survival of CRC patients

These four lncRNAs were analyzed using a multivariate Cox regression analysis to establish a lncRNA signature for predicting patients’ overall survival. We constructed a risk-score formula by integrating the lncRNA expressions and corresponding estimated regression coefficient derived from above multivariate Cox regression analysis, as follows: Risk score = (0.322 × expression value of SPRY4-IT1) + (–0.134 × expression value of Loc554202) + (–0.336 × expression value of LINC01133) + (–0.231 × expression value of RP11-727F15.13). We calculated four-lncRNA signature risk score for each CRC patient, and ranked them according to risk score values. These 172 CRC patients can be divided into a high-risk group (n = 90) and a low-risk group (n = 82) using the median risk score as the threshold. A significant difference of overall survival between the high-risk group and low-risk group was observed (P-value = 1.74E-06; Figure 1A). It is obvious that CRC patients in the high-risk group had significantly shorter survival (median 18 months) than those in the low-risk group (median 24.5 months). The time-dependent ROC curve analysis achieved an AUC of 0.727 at the overall survival of five years (Figure 1B), suggesting a competitive performance of the four-lncRNA signature for survival prediction. The lncRNA risk score were significantly associated with overall survival of CRC patients using the univariate Cox regression analysis (Table 2).
Figure 1

The four-lncRNA signature in prognosis of survival of CRC patients in the training dataset

(A) The Kaplan-Meier curves of overall survival between high-risk and low-risk patients in the training dataset. (B) The ROC curve for survival prediction by the four-lncRNA signature within four years as the defining point.

Table 2

Univariate and multivariate cox regression analyses in each dataset

VariablesUnivariate analysisMultivariate analysis
HR95% CI of HRP valueHR95% CI of HRP value
Training dataset (n = 172)
 Four-lncRNA risk score
 Low risk/High risk17.251.28–29.401.37E-0521. 621.34–32.286.83E-05
 Age
 ≤65/>651.740.54–4.910.0061.530.37–3.700.0042
 Gender
 Female/Male0.8260.38–4.610.620.910.51–2.780.73
 Stage
 II1 (reference)1 (reference)
 III/IV2.180.58–4.710.0122.240.36–4.950.01
Testing dataset (n = 172)
 Four-lncRNA risk score
 Low risk/High risk3.331.42–5.713.01E-032. 591.07–5.896.48E-03
 Age
 ≤65/>651.200.48–5.740.0061830.45–3.240.008
 Gender
 Female/Male1.490.43–2.950.671.230.44–3.160.62
 Stage
 II1 (reference)1 (reference)
 III/IV1.650.55–5.440.0211.490.42–5.110.033
Entire dataset (n = 344)
 Four-lncRNA risk score
 Low risk/High risk5.562.81–11.726. 8E-044.982.54–9.888.42E-04
 Age
 ≤65/>651.120.603–4.030.011.290.63–5.480.01
 Gender
 Female/Male0.740.51–1.950.351. 450.56–2.90.54
 Stage
 II1 (reference)1 (reference)
 III/IV2.921.26–5.850.011.660.74–5.420.01

The four-lncRNA signature in prognosis of survival of CRC patients in the training dataset

(A) The Kaplan-Meier curves of overall survival between high-risk and low-risk patients in the training dataset. (B) The ROC curve for survival prediction by the four-lncRNA signature within four years as the defining point.

Validation of the four-lncRNA signature for survival prediction in the testing dataset and another independent dataset

We confirmed our results using the testing set. Using the same risk score formula, 172 CRC patients can be classified into a high-risk group (n = 77) and a low-risk group (n = 95) with the same cutoff point derived from the training dataset. The result showed that a significant difference of overall survival between the high-risk group and the low-risk group (P-value = 0.00439, median 17.5 months vs. 23 months; Figure 2A). The AUC value in the testing set was 0.712 at the overall survival of four years, and the lncRNA risk score was significantly associated with patients’ overall survival (Table 2). Next, we performed the same analysis in the entire TCGA CRC dataset. similar results were obtained. The lncRNA signature can classify 344 CRC patients into a high-risk group (n = 166) and a low-risk group (n = 178) with significant difference of overall survival (P-value = 6.9E-05, median 16 months vs. 23 months; Figure 2B). The AUC value in the entire set was 0.721 at the overall survival of four years. Further analysis indicated that lncRNA risk score was significantly associated with CRC patients’ overall survival in the entire TCGA CRC dataset (Table 2). We further validated our lncRNA signature in an independent CRC data (GSE14333). As shown in Figure 2C, lncRNA signature can effectively predict overall survival in CRC patients. A significant difference of overall survival between the high-risk group (n = 125) and the low-risk group (n = 72) was observed (P-value = 0.0183, median 38.3 months vs. 58.3 months).
Figure 2

The Kaplan-Meier curves of overall survival between high-risk and low-risk patients in the testing, entire dataset and another independent dataset

(A) The Kaplan-Meier curves for the testing dataset. (B) The Kaplan-Meier curves for the entire dataset. (C) The Kaplan-Meier curves for the dataset from Gene Expression Omnibus database.

The Kaplan-Meier curves of overall survival between high-risk and low-risk patients in the testing, entire dataset and another independent dataset

(A) The Kaplan-Meier curves for the testing dataset. (B) The Kaplan-Meier curves for the entire dataset. (C) The Kaplan-Meier curves for the dataset from Gene Expression Omnibus database.

Independence of the lncRNA signature for survival prediction from other clinical variables

We examined whether the prognostic power of the lncRNA signature was independent of other clinical variables, such as age, gender, subtype and tumor stage. The multivariate Cox regression analyses were performed, and the results suggested that the lncRNA risk score was also significantly associated with overall survival. The lncRNA signature still maintained a significant association with overall survival after adjustment for other clinical variables (Table 2). The result showed that patient age and tumor stage were significantly associated with overall survival. A series stratified analyses have been performed according to age and tumor stage, respectively. At first, all CRC patients were stratified into a younger group (n = 132, age < 65) and an elder group (n = 212, age ≥ 65). The lncRNA signature can divided the younger group into a high-risk subgroup (n = 85) and a low-risk subgroup (n = 47) with significant difference of survival (P-value = 0.00416, median 23 months vs. 50.85 months; Figure 3A). As for the elder group, the four-lncRNA signature was also able to classify them into a high-risk subgroup (n = 147) and a low-risk subgroup (n = 65) with significantly different survival (P-value = 0.00742, median 13.3 months vs. 20.1 months; Figure 3B). Next, all CRC patients were stratified by tumor stage into an early subgroup (stage I and II, n = 196) and a late subgroup (stage III and IV, n = 148), respectively. The result of stratified analysis showed effective prognostic power in both early subgroup and late subgroup. As shown in Figure 4A, patients in the early subgroup can be divided into a high-risk group (n = 92) with shorter survival and a low-risk group (n = 104) with longer survival (P-value = 0.00189, median 26 months vs. 51.05 months). Similar results were obtained in the late subgroup (P-value = 2.48E-04, median 16 months vs. 24.5 months; Figure 4B). These result demonstrated that the prognostic ability of lncRNA signature is independent of other clinical variables for the prediction of survival in CRC patients.
Figure 3

Survival analyses of all CRC patients stratified by age and tumor stage with the four-lncRNA signature

(A) The Kaplan-Meier curves for the younger dataset. (B) The Kaplan-Meier curves for the elder dataset.

Figure 4

Survival analyses of all CRC patients stratified by tumor stage with the four-lncRNA signature

(A) The Kaplan-Meier curves for the early stage dataset. (B) The Kaplan-Meier curves for the late stage dataset.

Survival analyses of all CRC patients stratified by age and tumor stage with the four-lncRNA signature

(A) The Kaplan-Meier curves for the younger dataset. (B) The Kaplan-Meier curves for the elder dataset.

Survival analyses of all CRC patients stratified by tumor stage with the four-lncRNA signature

(A) The Kaplan-Meier curves for the early stage dataset. (B) The Kaplan-Meier curves for the late stage dataset.

Functional implications of the prognostic lncRNAs

We investigated the potential functional roles of the four prognostic lncRNAs in CRC. Spearman correlation coefficients were calculated between lncRNAs and protein-coding genes using the expression profiles of 344 CRC patients. A total of 732 protein-coding genes were positively correlated with either of the four lncRNAs (Spearman correlation coefficient > 0.6). Functional enrichment analyses indicated that these protein-coding genes were significantly enriched in 20 GO categories (P-value of < 0.01, Figure 5). These functionally enriched GO categories included assembly and disassembly of protein and macromolecules, transcription, signal transduction and response to stimulus, cell apoptosis and death, metabolic and catabolic process, cell cycle, DNA replication and DNA repair, etc. The result suggested that the four prognostic lncRNAs may participate in CRC tumorigenesis through regulating protein-coding genes to influence CRC-related biological pathways.
Figure 5

Functional enrichment analyses of the protein-coding genes co-expressed with the four prognostic lncRNAs

(A) The functional enrichment map of GO terms. Each node represents a GO category. An edge represents the overlap of the shared genes between connecting terms. Node size represents the number of gene in the GO terms. Color intensity is proportional to enrichment significance.

Functional enrichment analyses of the protein-coding genes co-expressed with the four prognostic lncRNAs

(A) The functional enrichment map of GO terms. Each node represents a GO category. An edge represents the overlap of the shared genes between connecting terms. Node size represents the number of gene in the GO terms. Color intensity is proportional to enrichment significance.

DISCUSSION

Great efforts have been devoted to detect prognostic biomarkers for CRC at protein-coding and non-coding genes [21, 22, 26, 27]. Mounting evidence suggested that expression changes of lncRNAs are implicated in tumorigenesis by acting as tumor oncogenes or suppressor [8, 28]. Moreover, dysregulation of lncRNA has been measured in various cancer types, highlighting their potential roles as novel independent biomarkers for cancer prognosis [10, 29–32]. Some works have identified potential prognostic lncRNA signatures to predict overall survival in many cancer types, such as glioblastoma, lung cancer, etc. [15, 18]. However, the prognostic power of lncRNA signature for predicting survival in patients with CRC has still not been investigated. Up to date, many lncRNAs have been discovered in human over the past decades [33]. However, only few of them are well characterized in human cancers. Among these four lncRNAs, SPRY4-IT1 and LINC01133 have been reported to be prognostic factors in patients with CRC [34, 35]. In this work, we identified that four lncRNAs are significantly associated with CRC patients’ survival and established a four-lncRNA signature for the prediction of survival. The result suggested a competitive performance of four-lncRNA signature for predicting survival. This finding can be validated by using TCGA testing set and another independent dataset, which demonstrated the reliability and reproducibility of the four-lncRNA signature for predicting CRC patients’ survival. Further stratified analyses after controlling for age and tumor stage showed that the prognostic power of the four-lncRNA signature was independent of other clinical variables for survival prediction of patients with CRC. Previous studies documented that lncRNAs participated in biological processes by positively regulating protein-coding genes involved in the same processes. It is possible to predict lncRNA biological functions based on their co-expressed protein-coding genes [36-38]. Here, we performed GO enrichment analyses for lncRNA co-expressed protein-coding genes. The results demonstrated the important functional roles of the four prognostic lncRNAs in CRC tumorigenesis. Taken together, we performed a comprehensive analysis for lncRNA expression profiles and corresponding clinical information in CRC patients. Our work identified that four prognostic lncRNAs were significantly associated with CRC patients’ survival. A four-lncRNA signature was established to effectively predict patients’ survival. The four-lncRNA signature might function as novel independent biomarkers for CRC prognosis. Our work gains insight into the understanding of the molecular mechanism of CRC.

MATERIALS AND METHODS

CRC datasets and clinical information

CRC lncRNA data and corresponding clinical information were downloaded from TCGA data portal. A total of 344 CRC patients were included in this work after removal of patients without clear clinical information. The lncRNAs derived from TCGA were annotated based on GENCODE database [39] to reduce redundant. The lncRNA expressions were defined as those with an average Fragments Per Kilobase of transcript per Million fragments mapped (FPKM) ≥ 0.1. The lncRNAs expression profiles were normalized by log2 transformed. At last, a total of 14,467 lncRNAs were enrolled in 344 CRC patients.

Identification of prognostic lncRNA signature

We randomly divided CRC patients into a training set (n = 172) and a testing set (n = 172). In this training set, the association between the lncRNA expression and the overall survival of CRC patients was evaluated using a univariate Cox regression analysis. The lncRNAs that are significantly associated with the overall survival of CRC patients were identified based on the threshold of P-value < 0.01. Next, those selected lncRNAs were subjected to a multivariate Cox regression analysis. We established a risk score formula according to the lncRNA expression, weighted by the regression coefficients derived from the multivariable Cox regression analysis. Then, CRC patients in the training set can be divided into high-risk or low-risk groups by using the median risk score as a threshold. The survival differences between high-risk and low-risk group in each dataset can be evaluated by the Kaplan-Meier analyses. Multivariate Cox regression and stratified analyses were carried out to evaluate whether the prognostic power of the four-lncRNA signature was independent of other clinical variables. The receiver operating characteristic (ROC) curve analyses were performed to evaluate the competitive performance for overall survival prediction. Area under the ROC curve (AUC) values were calculated. All analyses were performed using R package.

Functional enrichment analyses

Since lncRNAs are always co-expressed with neighboring coding genes, we calculated spearman correlation coefficients to evaluate co-expression relationships between lncRNAs and protein-coding genes. Functional enrichment analyses for those co-expressed protein-coding genes were performed using the DAVID software [40, 41]. Gene Ontology (GO) categories with a P-value of < 0.01 were considered as significantly enriched function annotations.
  41 in total

1.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.

Authors:  Moran N Cabili; Cole Trapnell; Loyal Goff; Magdalena Koziol; Barbara Tazon-Vega; Aviv Regev; John L Rinn
Journal:  Genes Dev       Date:  2011-09-02       Impact factor: 11.361

2.  Downregulation of long non-coding RNA LINC01133 is predictive of poor prognosis in colorectal cancer patients.

Authors:  J-H Zhang; A-Y Li; N Wei
Journal:  Eur Rev Med Pharmacol Sci       Date:  2017-05       Impact factor: 3.507

3.  Overexpression of the long non-coding RNA MEG3 impairs in vitro glioma cell proliferation.

Authors:  Pengjun Wang; Zhongqiao Ren; Piyun Sun
Journal:  J Cell Biochem       Date:  2012-06       Impact factor: 4.429

Review 4.  RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts.

Authors:  Sarah Geisler; Jeff Coller
Journal:  Nat Rev Mol Cell Biol       Date:  2013-10-09       Impact factor: 94.444

Review 5.  The hallmarks of cancer: a long non-coding RNA point of view.

Authors:  Tony Gutschner; Sven Diederichs
Journal:  RNA Biol       Date:  2012-06-01       Impact factor: 4.652

6.  Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network.

Authors:  Qi Liao; Changning Liu; Xiongying Yuan; Shuli Kang; Ruoyu Miao; Hui Xiao; Guoguang Zhao; Haitao Luo; Dechao Bu; Haitao Zhao; Geir Skogerbø; Zhongdao Wu; Yi Zhao
Journal:  Nucleic Acids Res       Date:  2011-01-18       Impact factor: 16.971

Review 7.  Long non-coding RNA: a new player in cancer.

Authors:  Hua Zhang; Zhenhua Chen; Xinxin Wang; Zunnan Huang; Zhiwei He; Yueqin Chen
Journal:  J Hematol Oncol       Date:  2013-05-31       Impact factor: 17.388

8.  Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks.

Authors:  Xingli Guo; Lin Gao; Qi Liao; Hui Xiao; Xiaoke Ma; Xiaofei Yang; Haitao Luo; Guoguang Zhao; Dechao Bu; Fei Jiao; Qixiang Shao; RunSheng Chen; Yi Zhao
Journal:  Nucleic Acids Res       Date:  2012-11-05       Impact factor: 16.971

9.  A long non-coding RNA signature to improve prognosis prediction of colorectal cancer.

Authors:  Ye Hu; Hao-Yan Chen; Chen-Yang Yu; Jie Xu; Ji-Lin Wang; Jin Qian; Xi Zhang; Jing-Yuan Fang
Journal:  Oncotarget       Date:  2014-04-30

10.  A four-long non-coding RNA signature in predicting breast cancer survival.

Authors:  Jin Meng; Ping Li; Qing Zhang; Zhangru Yang; Shen Fu
Journal:  J Exp Clin Cancer Res       Date:  2014-10-06
View more
  4 in total

1.  The Construction and Comprehensive Prognostic Analysis of the LncRNA-Associated Competitive Endogenous RNAs Network in Colorectal Cancer.

Authors:  Wei Li; Weifang Yu; Xia Jiang; Xian Gao; Guiqi Wang; Xiaojing Jin; Zengren Zhao; Yuegeng Liu
Journal:  Front Genet       Date:  2020-06-23       Impact factor: 4.599

2.  A stroma-related lncRNA panel for predicting recurrence and adjuvant chemotherapy benefit in patients with early-stage colon cancer.

Authors:  Rui Zhou; Huiying Sun; Siting Zheng; Jingwen Zhang; Dongqiang Zeng; Jianhua Wu; Zhenhua Huang; Xiaoxiang Rong; Jianping Bin; Yulin Liao; Min Shi; Wangjun Liao
Journal:  J Cell Mol Med       Date:  2020-01-27       Impact factor: 5.310

3.  Genome-wide expression profiling in colorectal cancer focusing on lncRNAs in the adenoma-carcinoma transition.

Authors:  Alexandra Kalmár; Zsófia Brigitta Nagy; Orsolya Galamb; István Csabai; András Bodor; Barnabás Wichmann; Gábor Valcz; Barbara Kinga Barták; Zsolt Tulassay; Peter Igaz; Béla Molnár
Journal:  BMC Cancer       Date:  2019-11-06       Impact factor: 4.430

4.  Four long noncoding RNAs act as biomarkers in lung adenocarcinoma.

Authors:  Zhihui Zhang; Liu Yang; Yujiang Li; Yunfei Wu; Xiang Li; Xu Wu
Journal:  Open Med (Wars)       Date:  2021-04-21
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.