Literature DB >> 33850764

The influence of lncRNAs on the prognosis of prostate cancer based on TCGA database.

Hang Huang1, Yufan Tang1, Xueting Ye1, Wei Chen1, Hui Xie1, Shengye Chen1.   

Abstract

BACKGROUND: Prostate adenocarcinoma (PRAD) is a common male urinary system cancer globally with a poor prognosis. Our research aims to explore the role of LncRNA in the occurrence and prognosis of prostate cancer and its underlying mechanism.
METHODS: The biomaRt package screened for the differentially expressed lncRNA. The survival package was used to identify lncRNAs related to prognosis. The survminer package completed the Kaplan-Meier survival curves. WGCNA (Weighted Gene Co-Expression Network Analysis) screened for the co-expressed genes. The ClusterProfiler package implemented the analysis results of GO (gene ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes).
RESULTS: We performed differential expression analysis on the TCGA (The Cancer Genome Atlas) database to determine the association between LncRNA and Prostate cancer. The data of 500 Prostate cancer patients were tested. 6 LncRNAs (AC245884.1, LINC01524, AL807752.4, AP000844.2, AC016590.1, LINC00641) were selected as independent prognostic factors using statistical analysis methods, and their value was tested through multivariate COX analysis and Kaplan-Meier survival analysis. Through the study of co-expressed genes, the biological processes of these lncRNA enrichment are the perception and conduction of smell and taste. The specific carcinogenic and cancer-promoting mechanisms need further study.
CONCLUSIONS: This study shows that lncRNA has a certain predictive effect on prostate cancer occurrence and prognosis and can be a new biomarker for prostate cancer survival and potential treatment targets. 2021 Translational Andrology and Urology. All rights reserved.

Entities:  

Keywords:  Prostate cancer; The Cancer Genome Atlas (TCGA); lncRNA; prognosis

Year:  2021        PMID: 33850764      PMCID: PMC8039585          DOI: 10.21037/tau-21-154

Source DB:  PubMed          Journal:  Transl Androl Urol        ISSN: 2223-4683


Introduction

Prostate cancer is a common cause of death in older men. In recent years, the incidence of prostate cancer has increased yearly, and the age of onset is younger, which has become a crucial factor affecting men’s quality of life. Many patients with prostate cancer cannot be diagnosed and are not treated early, making them progress to severe levels (1,2). LncRNA is a kind of non-coding RNA with a length more than 200 nucleotides. LncRNA plays an important role in many life activities such as dose compensation effect, epigenetic regulation, cell cycle regulation and cell differentiation regulation (3). In urinary system tumors, lncRNA has been found to participate in androgen receptor signalling in prostate cancer, hypoxia-inducible factor pathway activation in renal cell carcinoma and invasiveness in bladder cancer (3). Although the non-surgical treatment of prostate cancer is mainly based on the blocking of the androgen pathway, a large number of clinical outcomes show that once prostate cancer progresses to castration-resistant prostate cancer (CRPC), the tumor depends on other ways to maintain growth. The Cancer Genome Atlas (TCGA) is currently the largest cancer gene database. Research on the data in TCGA helps to discover the mechanism of cancer and explore treatment methods. Therefore, finding new therapeutic targets for tumors and better diagnosing them early and evaluating them better is the most urgent clinical problem. lncRNA, a new class of molecule, provides us with another choice (4). We present the following article in accordance with the REMARK reporting checklist (available at http://dx.doi.org/10.21037/tau-21-154).

Methods

Data collection

Transcriptome data and clinical information from 499 cases cancer tissue and 25 cases normal tissues were retrieved from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/).

Statistical analysis

lncRNA differential expression profiles

LncRNA-seq data were retrieved using the TCGAbiolinks. 14083 LncRNAs were screened using the biomaRt package (5,6). The samples of normal tissue and cancer tissue were divided randomly into two groups, and DESeq2 was used for differential expression analysis. The screening criteria for up-regulated genes were P<0.05 and log2 fold change >1, and the criteria for down-regulated genes were P<0.05 and log2 fold change <−1. After retrieving the differentially expressed lncRNA in each group, use the Venn Diagram package to draw a Venn diagram to show the intersection (7).

Risk scoring model construction and evaluation

We used the survival package to perform a single-factor Cox analysis on the data downloaded through TCGAbiolinks. We screened out 13 LncRNAs related to overall survival (OS). The standard is P<0.001. Multivariate Cox regression analysis was carried out through the backward stepwise method to determine six independent prognostic factors LncRNA, and Lasso regression was implemented through glmnet for further verification. Finally, a risk-scoring model is established based on the results of multi-factor Cox regression. According to the median of risk scores, patients were divided into high-risk groups and low-risk groups. The K-M survival curve was implemented with the survminer package. The time-dependent ROC curve was based on the risk score drawn by the survivalROC package.

Identification and functional analysis of co-expressed coding genes

We used WGCNA to screen the coding genes co-expressed with prognostic-independent lncRNA (the soft threshold was set to β=3, the adjacency threshold was set to 0.1; the Cytoscape software was used for visualization) (8,9). The functional enrichment analysis of GO and KEGG encoding genes is realized with the clusterProfiler package (10). Select “c2.all.v7.0.symbols.gmt” in the MSigDB gene set, and the screening criterion is P<0.05. GOplot visualized the results.

Screening of key coding genes and analysis of immune infiltration

We used the STRING database (https://string-db.org/) to retrieve the interactions of the genetically encoded proteins (score >0.4), and then used CytoHubba, a plug-in in the Cytoscape software, to identify 10 key coding genes (11,12). Immune infiltration was analyzed in TIMER2.0 (http://cistrome.dfci.harvard.edu/TIMER/) online tool (13). The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Results

Abnormal expression of lncRNAs

We classified and grouped the transcriptome data of the PRAD patients () in the TCGA database, which was N1 (26 normal tissues), N2 (26 normal tissues), T1 (250 cancerous tissues), and T2 (249 cancerous tissues). The difference analysis of 14,083 lncRNAs in the normal group and the cancerous group, the 461 lncRNAs that were significantly up-regulated in cancer tissues (), and 653 lncRNAs that were significantly down-regulated () were finally retrieved. Compared with lncRNA, whose expression level has not changed significantly, we define that the expression level of up-regulated lncRNA is significantly increased in cancer tissue (T), while the expression level of down-regulated lncRNA is higher in normal tissue (N) (, https://cdn.amegroups.cn/static/public/tau-21-154-1.xlsx). The volcano map results also verified the correctness of the differentially expressed lncRNA we defined ().
Table 1

Baseline data sheet of PRAD patients in TCGA database

Clinical characteristicsNumber (%)
Vital status
   Alive488 (97.6)
   Dead10 (2.0)
   Not reported2 (0.4)
Age
   40–4927 (5.4)
   50–59177 (35.4)
   60–69244 (48.8)
   70–7952 (10.4)
Morphology
   8140/3487 (97.4)
   8255/33 (0.6)
   8480/31 (0.2)
   8500/39 (1.8)
Primary Gleason grade
   Pattern 21 (0.2)
   Pattern 3200 (40.0)
   Pattern 4250 (50.0)
   Pattern 549 (9.8)
Secondary Gleason grade
   Pattern 3152 (30.4)
   Pattern 4238 (47.6)
   Pattern 5110 (22.0)
Pathology T stage
   T2188 (37.6)
   T3295 (59.0)
   T410 (2.0)
   Not reported7 (1.4)
Pathology N stage
   N0348 (69.6)
   N179 (15.8)
   Not reported73 (14.6)
Figure 1

Identify differentially expressed lncRNA based on transcriptome data of normal and cancerous tissues from PRAD patients. (A) Identification of significantly up-regulated genes (N1, N2: normal tissue grouping, TI, T2: cancerous tissue grouping); (B) identification of significantly down-regulated genes; (C) display of the relative expression of genes whose expression level is unchanged, up-regulated and down-regulated (N: normal tissue, T: cancer tissue); (D) Volcano map display of differentially expressed genes.

Identify differentially expressed lncRNA based on transcriptome data of normal and cancerous tissues from PRAD patients. (A) Identification of significantly up-regulated genes (N1, N2: normal tissue grouping, TI, T2: cancerous tissue grouping); (B) identification of significantly down-regulated genes; (C) display of the relative expression of genes whose expression level is unchanged, up-regulated and down-regulated (N: normal tissue, T: cancer tissue); (D) Volcano map display of differentially expressed genes.

Construction of prognostic model

Combining the results of differential expression analysis and the clinical data of PRAD patients, through univariate Cox regression analysis, we retrieved 13 lncRNAs that were significantly related to overall survival. Further, through multivariate Cox regression analysis, 6 lncRNAs (AC245884.1, LINC01524, AL807752.4, AP000844.2, AC016590.1, LINC00641) were screened out as independent prognostic factors (). This result can be verified by Lasso regression (). The prognostic risk model based on these 6 lncRNAs () is: risk score = 0.259 × AC245884.1 + 0.187 × LINC01524 -0.100 × AL807752.4 + 0.00256 × AP000844.2 + 0.0588 × AC016590.1 + 0.00557 × LINC00641. Then, we analyzed patients’ survival based on the risk score and found that more patients died in the high-risk group than in the low-risk group (), and the survival time decreased as the risk score increased (). Further analysis of K-M survival showed that the high-risk group’s prognosis was significantly worse than that of the low-risk group (). The risk model has good predictive power, with an AUC value of 0.778 ().
Table 2

Univariate and multivariate Cox regression analysis results

GeneUnivariate regression modelMultivariate regression model
Hazard ratioCoefficientP valueHazard ratioCoefficientP value
AC245884.1 1.07660.0737870.00034841.29612320.25937770.083409
LINC01524 1.1450.135380.00007621.20602140.18732680.014544
AL807752.4 1.05240.0510460.00076060.904675−0.10017950.054436
AP000844.2 1.00220.00223160.00000351.00256680.00256350.000152
AC016590.1 1.05680.0552670.00002311.06058960.0588250.015543
LINC00641 1.00530.0052660.00075801.00558650.0055710.028062
WASIR1 1.07660.0737870.0003484
AL031674.1 1.26730.236880.0002725
AC114296.1 1.08570.0822590.0000521
AC137834.2 1.46860.384340.0004462
AL133325.3 1.01080.0107350.0001391
LINC00839 1.02890.0284920.0009408
AC004448.2 1.20560.186990.0003778
Figure 2

Univariate and multivariate Cox analysis to identify independent prognostic lncRNA and risk models. (A) Lasso regression to verify independent prognostic factors; (B) Forest plot of multivariate Cox regression model; (C) the distribution of risk scores and patient survival status; (D) the relationship between risk score, survival time and status; (E) K-M survival curve of high and low risk groups (P value: log-rank test); (F) time-dependent ROC curve of 5-year survival prediction.

Univariate and multivariate Cox analysis to identify independent prognostic lncRNA and risk models. (A) Lasso regression to verify independent prognostic factors; (B) Forest plot of multivariate Cox regression model; (C) the distribution of risk scores and patient survival status; (D) the relationship between risk score, survival time and status; (E) K-M survival curve of high and low risk groups (P value: log-rank test); (F) time-dependent ROC curve of 5-year survival prediction.

Relationship between 6 lncRNAs and clinical data

For these 6 lncRNAs, only LINC00641 was significantly downregulated in cancer tissues, and the expression of the remaining lncRNAs was significantly increased (). Simultaneously, we found that AC245884.1 was significantly correlated with the Gleason score, and the other 5 lncRNAs were not found to be significantly correlated (). We verified these results in the GSE115414 data set and found that the expression levels of AC245884.1, LINC01524, and AP000844.2 in PRAD tissues were significantly higher than those in normal tissues, while AC016590.1, although not significant, has the same trend, AL807752. 4 and LINC00641 have no obvious difference (). As for the relationship between gene expression and Gleason score, we can observe that the expression level of AC245884.1 may have a potential correlation with Gleason score (). Since the conclusions drawn by the two data sets of TCGA and GSE115414 are not completely consistent, we can explain that the number of lncRNA samples is smaller.
Figure 3

Clinical analysis of independent prognostic lncRNA. (A) Independent prognostic lncRNA expression in normal and cancer tissues in the TCGA database (P value: t test); (B) the correlation between the expression of independent prognostic IncRNA (TCGA)and Gleason score (P value: F test); (C) independent prognostic IncRNA expression in normal and cancer tissues in the GSE1 15414 data set (P value: t test); (D) the correlation between the expression of independent prognostic IncRNA (GSE1 15414) and Gleason score (P value: F test).

Clinical analysis of independent prognostic lncRNA. (A) Independent prognostic lncRNA expression in normal and cancer tissues in the TCGA database (P value: t test); (B) the correlation between the expression of independent prognostic IncRNA (TCGA)and Gleason score (P value: F test); (C) independent prognostic IncRNA expression in normal and cancer tissues in the GSE1 15414 data set (P value: t test); (D) the correlation between the expression of independent prognostic IncRNA (GSE1 15414) and Gleason score (P value: F test).

The biological functions of 6 lncRNAs

To further clarify the biological functions of these six lncRNAs, we used WGCNA to identify the coding genes co-expressed with them. To make the constructed network more in line with the scale-free network’s characteristics, we choose a soft threshold of 3 (). Among them, 14 coding genes were co-expressed with LINC01524 (), 35 were co-expressed with AL807752.4 (), and 555 were related to LINC00641 (). The remaining 3 lncRNAs were not found any co-expressed coding genes (https://cdn.amegroups.cn/static/public/tau-21-154-2.xlsx). Next, we performed GO analysis on these coding genes and found that the biological process of their significant enrichment is the sense of smell and taste (). The molecular function is the receptor activity of smell and taste, and alditol: NADP + 1-oxidation Reductase activity (). However, cell components were not significantly enriched. Enrichment analysis of the KEGG pathway also showed that these genes are related to smell and taste transmission (). We used the MSigDB gene set for enrichment analysis and found that these genes were significantly related to the C/3 metabotropic glutamate pheromone receptor (, https://cdn.amegroups.cn/static/public/tau-21-154-3.xlsx). We screened out 10 key genes among these genes, of which 7 are related to taste receptors (). Through immune infiltration analysis of PRAD, it was found that the expression levels of TAS2R4 (as a representative of taste receptor-related genes), P2RY14, GPR18 and CCR9 were related to immune cell (B cell, CD4+ T cells, CD8+ T cells, and macrophages) infiltration levels were significantly correlated (). Except that the expression level of TAS2R4 was not significantly correlated with tumor purity, the expression levels of P2RY14, GPR18 and CCR9 were negatively correlated with tumor purity.
Figure 4

WGCNA analysis and identification of independent prognostic lncRNA co-expressed coding genes. (A) Scale-free fit index under different soft thresholds; (B) average connectivity under different soft thresholds; (C) the coding gene co-expressed with LINC01524; (D) co-expressed coding genc with AL807752.4; (E) the coding gene co-expressed with LINC00641.

Figure 5

Functional analysis of coding genes related to independent prognosis. (A) GO terms for biological processes; (B) GO terms for molecular functions; (C) KEGG pathway; (D) MSigDB gene set.

Figure 6

Identification of key coding genes and immune infiltration. (A) Protein interaction network of key coding genes; (B) the correlation between the expression of some key coding genes (TAS2R4, P2RY14, GPR18, CCR9) and immune infiltration.

WGCNA analysis and identification of independent prognostic lncRNA co-expressed coding genes. (A) Scale-free fit index under different soft thresholds; (B) average connectivity under different soft thresholds; (C) the coding gene co-expressed with LINC01524; (D) co-expressed coding genc with AL807752.4; (E) the coding gene co-expressed with LINC00641. Functional analysis of coding genes related to independent prognosis. (A) GO terms for biological processes; (B) GO terms for molecular functions; (C) KEGG pathway; (D) MSigDB gene set. Identification of key coding genes and immune infiltration. (A) Protein interaction network of key coding genes; (B) the correlation between the expression of some key coding genes (TAS2R4, P2RY14, GPR18, CCR9) and immune infiltration.

Discussion

Prostate cancer is one of the most common tumors in the male urinary system. The increasing incidence and poor prognosis have increasingly become a crucial factor limiting human life quality. LncRNA was originally thought to be a by-product of genome transcription and did not have biological functions. However, in later studies, it was found that it can interact with DNA and various RNAs, affecting the transcription and translation processes of cells. The function of most lncRNAs is still unclear, and many lncRNAs are still not taken seriously. Recent studies have shown that the amount of lncRNA in cancer differs significantly from that in normal tissues, and there are many differential expressions related to prognosis. Therefore, the study of lncRNA is helpful to improve the prognosis research of prostate cancer. In this study, the differential expression analysis of lncRNA in PRAD patients in the TCGA database was performed. It was found that there are many differentially expressed lncRNAs in prostate cancer. Through COX regression analysis, we finally screened out 6 lncRNAs (AC245884.1, LINC01524, AL807752.4, AP000844.2, AC016590.1, LINC00641) as independent prognostic factors and successfully constructed a risk prognosis model. By calculating the risk score, the survival outcome of the patient can be effectively evaluated. Through further analysis, it was found that some of these lncRNAs were related to the Gleason score (AC245884.1), while others were not found to be related to common prognostic indicators. To understand the biological functions of these lncRNAs, WGCNA was used to determine the existence of a certain number of co-expressed genes in 3 lncRNAs. Through GO and KEGG enrichment analysis, it is found that the main enrichment is the sense and conduction of smell and taste. Also, enrichment analysis with MSigDB showed that these genes are related to C/3 metabotropic glutamate pheromone receptors. Through immune infiltration analysis, the expression levels of AS2R4, P2RY14, GPR18, and CCR9 are significantly related to the tumor’s purity and immune cell infiltration level. There is a limitation in this article that the specific role of LncRNA in prostate cancer remains unclear. In existing studies, the overexpression of LINC00641 can inhibit the growth and invasion of prostate cancer cells (14). There is no relevant basic research on the other five genes in the results. Therefore, the lncRNA in the prediction, diagnosis, and treatment of prostate cancer has enormous potential. Some calculation results were run than can be included in the article. Interested readers can find them in a supplementary appendix online: https://cdn.amegroups.cn/static/public/tau-21-154.zip. The article’s supplementary files as
  14 in total

1.  clusterProfiler: an R package for comparing biological themes among gene clusters.

Authors:  Guangchuang Yu; Li-Gen Wang; Yanyan Han; Qing-Yu He
Journal:  OMICS       Date:  2012-03-28

Review 2.  Considering the potential for gene-based therapy in prostate cancer.

Authors:  Justin R Gregg; Timothy C Thompson
Journal:  Nat Rev Urol       Date:  2021-02-26       Impact factor: 14.432

3.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.

Authors:  Steffen Durinck; Paul T Spellman; Ewan Birney; Wolfgang Huber
Journal:  Nat Protoc       Date:  2009-07-23       Impact factor: 13.491

Review 4.  Long non-coding RNAs in genitourinary malignancies: a whole new world.

Authors:  Ronan Flippot; Guillaume Beinse; Alice Boilève; Julien Vibert; Gabriel G Malouf
Journal:  Nat Rev Urol       Date:  2019-08       Impact factor: 14.432

5.  Cytoscape 2.8: new features for data integration and network visualization.

Authors:  Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

6.  STRING v10: protein-protein interaction networks, integrated over the tree of life.

Authors:  Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2014-10-28       Impact factor: 16.971

7.  cytoHubba: identifying hub objects and sub-networks from complex interactome.

Authors:  Chia-Hao Chin; Shu-Hwa Chen; Hsin-Hung Wu; Chin-Wen Ho; Ming-Tat Ko; Chung-Yen Lin
Journal:  BMC Syst Biol       Date:  2014-12-08

8.  An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.

Authors:  Juan A Botía; Jana Vandrovcova; Paola Forabosco; Sebastian Guelfi; Karishma D'Sa; John Hardy; Cathryn M Lewis; Mina Ryten; Michael E Weale
Journal:  BMC Syst Biol       Date:  2017-04-12

Review 9.  Global Trends of Latent Prostate Cancer in Autopsy Studies.

Authors:  Takahiro Kimura; Shun Sato; Hiroyuki Takahashi; Shin Egawa
Journal:  Cancers (Basel)       Date:  2021-01-19       Impact factor: 6.639

10.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.

Authors:  Antonio Colaprico; Tiago C Silva; Catharina Olsen; Luciano Garofano; Claudia Cava; Davide Garolini; Thais S Sabedot; Tathiane M Malta; Stefano M Pagnotta; Isabella Castiglioni; Michele Ceccarelli; Gianluca Bontempi; Houtan Noushmehr
Journal:  Nucleic Acids Res       Date:  2015-12-23       Impact factor: 16.971

View more
  2 in total

Review 1.  Role of Long Non-Coding RNA LINC00641 in Cancer.

Authors:  Xue Han; Shitai Zhang
Journal:  Front Oncol       Date:  2022-01-27       Impact factor: 6.244

2.  Comprehensive Characterization of Ageing-Relevant Subtypes Associated With Different Tumorigenesis and Tumor Microenvironment in Prostate Cancer.

Authors:  Liang Huang; Zhenzhou Xu; Yu Xie; Shusuan Jiang; Weiqing Han; Zhengyan Tang; Quan Zhu
Journal:  Front Mol Biosci       Date:  2022-02-21
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.