Literature DB >> 36203430

Glycosyltransferase-related long non-coding RNA signature predicts the prognosis of colon adenocarcinoma.

Jiawei Zhang1,2, Yinan Wu2,3, Jiayi Mu2, Dijia Xin1, Luyao Wang1, Yili Fan1, Suzhan Zhang2, Yang Xu1,4.   

Abstract

Purpose: Colon adenocarcinoma (COAD) is the most common type of colorectal cancer (CRC) and is associated with poor prognosis. Emerging evidence has demonstrated that glycosylation by long noncoding RNAs (lncRNAs) was associated with COAD progression. To date, however, the prognostic values of glycosyltransferase (GT)-related lncRNAs in COAD are still largely unknown.
Methods: We obtained the expression matrix of mRNAs and lncRNAs in COAD from The Cancer Genome Atlas (TCGA) database. Then, the univariate Cox regression analysis was conducted to identify 33 prognostic GT-related lncRNAs. Subsequently, LASSO and multivariate Cox regression analysis were performed, and 7 of 33 GT-related lncRNAs were selected to conduct a risk model. Gene set enrichment analysis (GSEA) was used to analyze gene signaling pathway enrichment of the risk model. ImmuCellAI, an online tool for estimating the abundance of immune cells, and correlation analysis were used to explore the tumor-infiltrating immune cells in COAD. Finally, the expression levels of seven lncRNAs were detected in colorectal cancer cell lines by reverse transcription-quantitative polymerase chain reaction (RT-qPCR).
Results: A total of 1,140 GT-related lncRNAs were identified, and 7 COAD-specific GT-related lncRNAs (LINC02381, MIR210HG, AC009237.14, AC105219.1, ZEB1-AS1, AC002310.1, and AC020558.2) were selected to conduct a risk model. Patients were divided into high- and low-risk groups based on the median of risk score. The prognosis of the high-risk group was worse than that of the low-risk group, indicating the good reliability and specificity of our risk model. Additionally, a nomogram based on the risk score and clinical traits was built to help clinical decisions. GSEA showed that the risk model was significantly enriched in metabolism-related pathways. Immune infiltration analysis revealed that five types of immune cells were significantly different between groups, and two types of immune cells were negatively correlated with the risk score. Besides, we found that the expression levels of these seven lncRNAs in tumor cells were significantly higher than those in normal cells, which verified the feasibility of the risk model.
Conclusion: The efficient risk model based on seven GT-related lncRNAs has prognostic potential for COAD, which may be novel biomarkers and therapeutic targets for COAD patients.
Copyright © 2022 Zhang, Wu, Mu, Xin, Wang, Fan, Zhang and Xu.

Entities:  

Keywords:  colorectal cancer; glycosyltransferase; lncRNAs; overall survival; risk model

Year:  2022        PMID: 36203430      PMCID: PMC9530784          DOI: 10.3389/fonc.2022.954226

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   5.738


Introduction

Colorectal cancer (CRC) is the third leading cause of cancer mortality worldwide. The 5-year overall survival (OS) rate for localized CRC is about 90%, while that for CRC with metastasis is <15% (1, 2). Pathologically, colon adenocarcinoma (COAD) is the most common subtype of CRC (3, 4). At present, the major treatment options for COAD include surgical resection, neoadjuvant chemoradiotherapy, and postoperative chemoradiotherapy (5). Recently, targeted therapy and immunotherapy have shown promising efficacies in patients with COAD (6–8). However, the clinical outcome remains unsatisfactory, largely due to disease heterogeneity arising from the complex interplay between numerous genetic and environmental factors. Thus, it is important to identify novel suitable biomarkers that allow better risk-stratified treatment for COAD patients. Glycosylation, the most abundant and complex post-translational modification of proteins and lipids, is an enzymatic process regulated by numerous glycosyltransferases (GTs) and glycosidases. In this process, GTs catalyze the transfer of carbohydrate chains to glycoproteins, which is essential for cell adhesion, protein stability, and signal transduction (9). Notably, there are approximately 200 GTs involved in 14 different human protein glycosylation pathways, which are further grouped into N-glycosylation, O-glycosylation, C-mannosylation, and glypiation (10). Aberrant glycosylation signatures on cell surface are closely related to the initiation and development of cancer, primarily through sustaining cell proliferation, enhancing tumor invasion, and immune escape (11–13). More importantly, altered expression of GTs has been shown to correlate with tumorigenesis. For instance, ST6GAL1, a sialyltransferase that catalyzes the transfer of sialic acid to galactose-containing substrates, is highly expressed in CRC and positively associated with microsatellite instability (MSI), BRAF mutations, and mucinous phenotype (14). Fernández et al. confirmed that the upregulation of GCNT3 conferred a better prognosis and improved response to initial therapy in CRC (15). In addition, Gu et al. found that, compared with control mice, the CRC mice showed decreased expression in a subset of GTs, including Mgat1, Mgat2, Mgat3, St6gal1, St3gal4, Fut8, and B4galt1 (16). Cumulative evidence indicates that abnormal expression of GTs seems to be a “hallmark of cancer” and contributes to cancer development. Long noncoding RNAs (lncRNAs) are RNA species with more than 200 nucleotides (nt) that are not translated to proteins and are involved in the regulation of many essential cellular processes, such as chromatin remodeling, transcription, and post-transcription regulation (17). Studies have demonstrated that lncRNAs play crucial roles in cancer initiation and progression by acting as oncogenes and tumor suppressors (18, 19). They can affect not only the proliferation, migration, and invasion but also energy metabolism of cancer cells (20, 21). Moreover, emerging evidence suggests that lncRNAs play important roles in regulating post-translational modifications of metabolic enzymes, transcription factors, and cancer-associated metabolic pathways (22). Recently, emerging evidence has demonstrated that lncRNAs has been associated with cancer progression via regulating the expression level of glycosyltransferases or glycosidases and subsequently influencing the glycosylation pattern. For example, LINC01296 increased the proliferation and metastasis of CRC cells by modulating the O-glycosylated MUC1 via PI3K/AKT pathway (23). However, the biological functions of GT-related lncRNAs have rarely been studied. In this study, we first explored the role of glycosyltransferases in COAD patients from The Cancer Genome Atlas (TCGA) database; then, we screened out seven GT-related lncRNAs with prognostic value, performed a prognostic GT-related lncRNA model, and explored the correlation between the GT-related lncRNA and tumor immune microenvironment (TIM) as well.

Methods

Data acquisition

The RNA sequence transcriptome data of COAD patients and relevant clinical information were extracted from The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov/) database (24). Patients without clinical information were excluded. A total of 429 COAD samples and 37 normal samples were included in the study. The clinical characteristics of the patients are shown in . The corresponding gene set of 210 glycosyltransferases was downloaded from GlycoGene DataBase (GGDB; https://acgg.asia/ggdb2/) (25). The GEO dataset (https://www.ncbi.nlm.nih.gov/geo/) GSE39582, which contains 579 samples, was used to verify the prognostic value of the risk model (26).
Table 1

Clinical information of COAD patients in the TCGA database.

VariablesNo. of patientsPercentage (%)
Age (years)
 ≤6013429.2
 >6032570.8
Gender
 Female21947.7
 Male24052.3
Pathological stage
 I7817.0
 II17939.0
 III12527.2
 IV6413.9
 Unknown132.8
T stage
 Tis10.2
 T1112.4
 T27817.0
 T331067.5
 T45712.4
 Unknown20.4
N stage
 N027259.3
 N110422.7
 N28117.6
 Unknown20.4
M stage
 M033773.4
 M16413.9
 Unknown5812.6
Clinical information of COAD patients in the TCGA database.

Identification, functional classification, and mutation analysis of differentially expressed GT genes

We used the limma software package in R version (3.6) to identify the differentially expressed GT genes (p value < 0.05 and | log2 (fold change) | > 1.5) between COAD and normal colorectal tissues (27). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were conducted to analyze the enrichment pathways associated with differentially expressed genes (DEGs). Protein–protein interaction (PPI) network of DEGs was analyzed by the Search Tool for Interaction Genes (STRING) database (28). Single nucleotide variant (SNV) mutation frequency of DEGs was downloaded from GSCALite (http://bioinfo.life.hust.edu.cn/web/GSCALite/) (29). Copy number variation (CNV) mutation frequencies of GT DEGs in COAD patients were extracted from the TCGA database and analyzed by R software.

GT-related lncRNA identification

We used “limma” R package to identify GT-related lncRNAs (27). Pearson correlation analysis was conducted to detect the association between the expression levels of GT genes and lncRNAs. A total of 1,140 GT-related lncRNAs were identified based on the correlation coefficient >0.3 and p < 0.001.

Establishing the prognostic risk model of GT-related lncRNAs

To identify GT-related lncRNAs with prognostic value, we conducted the univariate and multivariate COX analysis. We first screened GT-related lncRNAs with potential survival impact on COAD patients based on the standard of p < 0.05 and subsequently incorporated those lncRNAs into LASSO Cox regression analysis and multivariate COX regression analysis (30). The following formula was used to calculate the risk score of each patient: risk score = (coefficient lncRNA1 × expression of lncRNA1) + (coefficient lncRNA2 × expression of lncRNA2) +…+ (coefficient lncRNAn × expression of lncRNAn). The COAD patients were divided into high- and low-risk groups based on the median value of risk score. R package “timeROC” was used to evaluate the accuracy of risk model (31). Additionally, we used the chi-square test to analyze the relationship between the model and clinical features in order to evaluate the prognostic role of this model.

Identification of independent prognostic factors

Univariate Cox and multivariate Cox regression analyses were used to evaluate whether the risk score was an independent prognostic factor. Receiver operating characteristic (ROC) curve was made to evaluate the risk score and different clinical characteristics in predicting outcomes (32). Kaplan–Meier (KM) plot was used to analyze the outcome of high- and low-risk groups in subgroups according to clinical characteristics. A p-value <0.05 was considered statistically significant.

Nomogram and calibration

With “rms” R package, all significantly independent prognostic factors, including age, stage, and risk score were used to build up a nomogram for the 1-, 3-, and 5-year OS (33). The calibration plot was applied to evaluate whether the prediction outcome of the nomogram showed good consistency with practical application.

Gene set enrichment analyses

With curated gene set (kegg.v7.4.symbols.gmt) and GSEA software (https://www.gsea-msigdb.org/gsea/login.jsp), GSEA analysis was conducted in the COAD cohort to explore the significantly enriched pathways between the high- and low-risk groups based on the criterion: p<0.05 and false discovery rate (FDR) <0.25 (34, 35).

Analysis of immune cell characteristics and immune-specific gene expression in the risk model

Immune Cell Abundance Identifier (ImmuCellAI) is an analytical tool that provides the quantitative infiltration of immune cells and predict the response of immune checkpoint blockade (ICB) therapy by using gene expression matrix data (36). We estimated the relative abundance of immune cell subtypes for each sample based on gene expression data. The abundance of immune cell subtypes and response to ICB therapy were compared between the high- and low-risk groups based on p < 0.05. The correlation between risk scores and immune infiltration was calculated by Pearson correlation analysis. Besides, the expression of immune-specific genes was compared between high- and low-risk groups.

Cell culture and reverse transcription and quantitative PCR analysis

The COAD cell lines (HCT116, DLD1, and HT-29) and normal human colon mucosal epithelial cell line NCM460 were acquired from the American Type Culture Collection (ATCC, USA). All cells were cultured in Roswell Park Memorial Institute (RPMI)-1640 medium (Gibco, CA, USA) with 10% fetal bovine serum (FBS, Gibco, CA, USA), 100 U/ml penicillin, and 100 mg/ml streptomycin. To evaluate the expression levels of GT-related lncRNAs, we used RNA Trizol reagent (Invitrogen, USA) to extract total cellular RNA, which were then reverse transcribed to cDNA using PrimeScript™ RT reagent Kit (Takara, Japan). Real-time fluorescent quantitative PCR was performed by using SYBR Green Master Mix (Yeasen, China). The β-actin mRNA was chosen as an endogenous control. The expression level of GT-related lncRNAs was analyzed using 2−ΔΔCT. Each PCR reaction was performed in triplicate. The primer sequences are listed in .

Statistical analysis

The data analysis of this study was performed by R software (version 3.6; https://www.r-project.org/) and GraphPad Prism 8 software. One-way analysis of variance (ANOVA) was used to compare the expression level of GT in 429 COAD tissues and 37 normal tissues. The two-tailed t-test was performed to analyze continuous variables between the two subgroups, and the Chi-square test was conducted to evaluate categorical data. The DEGs co-expression network and GT-lncRNA-mRNA co-expression network were conducted by the Cytoscape software (version 3.6.0; https://cytoscape.org/). The Kaplan–Meier method was used to compare the OS of each group based on the median value, and log-rank tests were applied to evaluate the significance of differences. The RT-qPCR results were analyzed using one-way ANOVA, a p < 0.05 on both sides was considered statistically significant.

Results

Differentially expressed GTs in the COAD and normal colorectal tissues

To identify the differentially expressed GTs and explore the biological functions of GTs in COAD, we used the TCGA database to analyze 429 COAD and 37 normal tissues. We uncovered 46 differentially expressed GTs (18 upregulated and 28 downregulated, ; ). The GO analyses revealed that these DEGs were most enriched in glycoprotein biosynthetic and metabolic processes ( ). KEGG analysis showed that 46 DEGs mainly involved in the mucin type O-glycan biosynthesis and glycosphingolipid biosynthesis including lacto and neolacto series, globo and isoglobo series, and ganglio series. We also clarified the relationship between these differentially expressed GTs via STRING database ( ). The incidence of single nucleotide variant (SNV) and copy number variations (CNVs) of differentially expressed GTs were analyzed in COAD. As shown in , SNV mutations were found in all 46 DEGs, among which WBSCR17 had the highest mutation frequency (31%), followed by UGGT2 (29%) and HS6ST3 (14%). We also found that CNV alterations were common in DEGs. Nearly half of the DEGs had copy number deletion, while the CNV amplification frequencies of PIGU, DPM1, PIGZ, ALG3, ST6GALNAC1, UGGT2, LFNG, HS6ST3, B4GALNT2, UST, HS3ST4, and GALNT6 were widespread ( ).
Figure 1

Landscape of genetic and expression variation of glycosyltransferase genes in COAD. (A, B) Volcano and heatmap visually showed differentially expressed genes between normal and COAD. (C, D) GO and KEGG analysis of DEGs. (E) Protein–protein interaction (PPI) network showing the interaction between DEGs among glycosyltransferases. (F) The SNV mutation frequency of DEGs in the COAD cohort. (G) The CNV variation frequency of DEGs in COAD. The height of the column represented the alteration frequency. N, normal tissues; T, tumor; BP, biological process; CC, cellular component; MF, molecular function; FC, fold change.

Landscape of genetic and expression variation of glycosyltransferase genes in COAD. (A, B) Volcano and heatmap visually showed differentially expressed genes between normal and COAD. (C, D) GO and KEGG analysis of DEGs. (E) Protein–protein interaction (PPI) network showing the interaction between DEGs among glycosyltransferases. (F) The SNV mutation frequency of DEGs in the COAD cohort. (G) The CNV variation frequency of DEGs in COAD. The height of the column represented the alteration frequency. N, normal tissues; T, tumor; BP, biological process; CC, cellular component; MF, molecular function; FC, fold change.

Identification of GT-related lncRNAs with prognostic value

From the TCGA database, we identified 1,140 GT-related lncRNAs. We further combined these lncRNAs with COAD survival data in univariate COX regression analysis to identify 33 GT-related lncRNAs with significant prognostic value. As a result, 32 GT-related lncRNAs were found to be associated with increased cancer risk (p < 0.05 and HR > 1), while only 1 lncRNA, AC124067.4, was a protective factor for COAD (p < 0.05 and HR < 1) ( ; ). In correlation analysis, almost all prognostic lncRNAs have a weak to moderate correlation with other lncRNAs, whereas LINC00174 and AC026471.4, and LINC00174 and AC107375.1 had the strongest correlation (r =0.58, and 0.57, respectively) ( ).
Table 2

Thirty-three GT-related lncRNA associated with prognosis in patients with COAD.

GeneHRHR.95LHR.95Hp-value
LINC023811.252421.0688141.4675670.00539
LBX2-AS11.09641.0043621.1968720.039662
AC005083.11.076991.001731.1579040.044778
AC002310.11.9423661.4911012.53028.58E-07
AC068580.31.3947481.03081.8871960.031037
AC068870.21.2200031.0754561.3839780.001998
AL354707.11.1633731.0165031.3314640.027972
AC073869.11.0963031.0114861.1882320.025224
AC011462.41.6698051.2308892.265230.000984
PCAT61.2510751.1215861.3955145.86E-05
AL035587.11.4674351.081921.9903190.013651
LINC009571.3883021.0568061.8237820.018429
AL162586.11.5294391.2054521.9405020.000468
AL392172.11.2030271.0325441.4016580.017754
LINC010111.7132641.2283652.3895780.001516
ZEB1-AS12.3532071.6345733.3877874.17E-06
AC020558.21.6183021.1197272.3388740.010414
AC007128.11.5382971.0957262.1596270.012843
RPARP-AS11.2236191.0247641.4610620.025727
AC107375.11.2728581.0044961.6129150.045815
AC105219.11.159471.0347161.2992650.010848
AL451050.21.5378511.0853842.1789390.015485
AL161729.41.3688321.1055111.6948720.003975
MIR210HG1.1611891.0719141.2578990.000251
LINC001741.4261971.1240881.8095010.003466
AC063948.11.8696231.3222022.6436890.0004
AC026471.41.1725911.0435941.3175340.007416
AL354836.11.1081251.0135581.2115140.024079
LINC011381.6320561.1641182.288090.00449
AC005261.31.2098381.0180991.4376870.030487
AC124067.40.9645750.9306480.9997380.048344
AC009237.141.18311.0790351.2972010.000345
AL450326.11.4714681.0037442.1571440.047804
Thirty-three GT-related lncRNA associated with prognosis in patients with COAD.

Generation of a GT-related lncRNA risk model

To better clarify the prognostic potential of GT-related lncRNAs, we conducted the LASSO and multivariate COX analysis on 33 GT-related lncRNAs ( ). Ultimately, seven lncRNAs, namely, LINC02381, AC002310.1, ZEB1AS1, AC020558.2, AC105219.1, MIR210HG, and AC009237.14, were identified to construct the prediction model ( ). The prognostic risk model was established based on the following formulation: risk score = (0.2552 × expression value of LINC02381) + (0.5370 × expression value of AC002310.1) + (0.4937 × expression value of ZEB1AS1) + (0.3940 × expression value of AC020558.2) + 0.1416 × expression value of AC105219.1) + (0.1187 × expression value of MIR210HG) + (0.1524 × expression value of AC009237.14). The risk score of patients was calculated, and COAD patients were divided into high- and low-risk groups according to the median risk score. The OS in high-risk patients was significantly lower than that in low-risk patients (p < 0.001, ). The distribution of the risk score and survival status are shown in . The risk score had a good prognostic predictive ability with the area under the ROC curve (AUC) values at 1, 3, and 5 years of 0.726, 0.748, and 0.844, respectively ( ). Furthermore, we investigated the prognostic implication of the expression of lncRNAs. The OS of patients in the high-expression group of seven lncRNAs was significantly lower than that in the low-expression group (all p < 0.05, ). We conducted a lncRNA–mRNA co-expression network to visualize the relationship among seven prognostic lncRNAs and GTs ( ). According to the risk score, a Sanky diagram was produced to exhibit the association among GTs, GT-related lncRNA, and outcome types of lncRNAs, indicating a single lncRNA corresponding to one or more mRNA, and these seven GT-related lncRNAs were all risk factors for a poor prognosis ( ). To verify the prognostic value of the risk model, we acquired the GSE39582 dataset from GEO database and then extracted the expression levels of seven GT-related lncRNAs and the survival data of patients. The results showed that patients with low-risk score had a better prognosis compared to those with high-risk score ( ).
Figure 2

Construction of a prognostic GT-related lncRNA risk model. (A) Screening of optimal parameters (lambda) in the LASSO regression model. (B) LASSO coefficient profiles of 11 candidate GT-related lncRNAs. (C) Forest map of seven GT-related lncRNAs significantly correlated with outcome and identified by multivariate cox regression. (D) Kaplan–Meier curve for the OS of COAD patients in the high- and low-risk group. (E) Distribution of survival status and risk score in the patient cohort. (F) ROC curves at 1, 3, and 5 years in COAD patients.

Figure 3

Kaplan–Meier survival curve of the selected GT-related lncRNAs and co-expression network between lncRNA and mRNA. (A–G) The overall survival curve of LINC02381, AC002310.1, ZEB1AS1, AC020558.2, AC105219.1, MIR210HG, and AC009237.14 of COAD patients in the high- and low-risk groups. (H) Co-expression network of GT genes and prognostic lncRNAs. Red nodes represent GT-related lncRNAs, while blue nodes represent GT genes. (I) The relationships among GT genes, GT-related lncRNAs, and risk type in the Sankey diagram.

Construction of a prognostic GT-related lncRNA risk model. (A) Screening of optimal parameters (lambda) in the LASSO regression model. (B) LASSO coefficient profiles of 11 candidate GT-related lncRNAs. (C) Forest map of seven GT-related lncRNAs significantly correlated with outcome and identified by multivariate cox regression. (D) Kaplan–Meier curve for the OS of COAD patients in the high- and low-risk group. (E) Distribution of survival status and risk score in the patient cohort. (F) ROC curves at 1, 3, and 5 years in COAD patients. Kaplan–Meier survival curve of the selected GT-related lncRNAs and co-expression network between lncRNA and mRNA. (A–G) The overall survival curve of LINC02381, AC002310.1, ZEB1AS1, AC020558.2, AC105219.1, MIR210HG, and AC009237.14 of COAD patients in the high- and low-risk groups. (H) Co-expression network of GT genes and prognostic lncRNAs. Red nodes represent GT-related lncRNAs, while blue nodes represent GT genes. (I) The relationships among GT genes, GT-related lncRNAs, and risk type in the Sankey diagram.

Correlation between risk model and clinicopathological features

In the next step, we explored the relationship between the risk model and clinical variables. Significant differences were found between high- and low-risk groups in the pathological stage, T stage, M stage, and N stage, whereas the two groups showed no significant differences in age and gender ( ; ). The high-risk group underwent significant upstaging compared with those in the low-risk group (52.2% vs. 2.1%, p=0.004). With the respect to the distribution of the American Joint Committee on Cancer (AJCC) stage, the high-risk group also comprised much poorer AJCC stage (T3/4, M1, and N1/2) than the low-risk group (85.1% vs. 74.7%, p = 0.078; 14.9% vs. 9.9%, p = 0.021; 51.9% vs. 28.4%, p < 0.001, respectively). Next, we performed an in-depth analysis of the correlation between seven lncRNAs and clinical variables ( ). In the pathological stage, the expression level of four lncRNAs (ZEB-AS1, AC105219.1, MIR210HG, and AC009237.14) was gradually elevated as the tumor stage progressed. For T stage, only one lncRNA was identified with significant differences across subgroups. Regarding M and N stage, three and five lncRNAs, respectively, were found with different expressions among different subgroups. Interestingly, we noticed that the expression level of ZEB1-AS1 was significantly increased with the progression of all stages (T, M, N, and S), suggesting that this lncRNA might be a core prognostic factor in COAD.
Figure 4

Correlation between risk model and clinicopathological factors. (A–D) Distribution of stage I/II and III/IV, T1/2 and T3/4, M0 and M1, and N0 and N1/2 tumors between high- and low-risk group. (E–H) Expression levels of seven prognostic GT-related lncRNAs in T, M, N, and S stage groups. *p < 0.05, **p < 0.01, and ***p < 0.001.

Correlation between risk model and clinicopathological factors. (A–D) Distribution of stage I/II and III/IV, T1/2 and T3/4, M0 and M1, and N0 and N1/2 tumors between high- and low-risk group. (E–H) Expression levels of seven prognostic GT-related lncRNAs in T, M, N, and S stage groups. *p < 0.05, **p < 0.01, and ***p < 0.001.

Independent prognostic role of the GT-related lncRNA signature

We used univariate and multivariate COX regression analyses to evaluate whether the risk model had an independent prognosis value. The results of univariate and multivariate analysis revealed that age (HR=1.622 and 2.519, respectively; 95%CI, 0.930–2.829 and 1.382–4.592, respectively; p = 0.088 and 0.003, respectively), stage (HR=3.163 and 2.383, respectively; 95%CI, 1.939–5.158 and 1.176–4.831, respectively; p < 0.001 and 0.016, respectively), and risk score (HR=3.064 and 2.271, respectively; 95%CI, 1.843–5.092 and 1.341–3.847, respectively; p < 0.001 and 0.002, respectively) can be independent prognostic factors ( ). Time-dependent ROC curves were conducted to evaluate the prognostic and predictive ability of clinical features and risk score ( ). Results showed that the AUC value of risk scores at 1, 3, and 5 years was 0.694, 0.741, and 0.834, respectively. More importantly, the risk score had higher precision compared with other predictors. As we all know, age, gender, and clinical stage are identified as independent prognostic characteristics of COAD patients. Then, patients were stratified according to age (≤60 and >60), gender (female and male), stage (I/II and III/IV), T stage (T1/2 and T3/4), M stage (M0 and M1), and N stage (N0 and N1/2) ( ; ). Among the subgroups, patients in the high-risk group exhibited much poorer survivability than those in the low-risk group.
Figure 5

Independent prognostic role of risk model signature. (A, B) Univariate and multivariate Cox regression analyses of risk scores and clinical characteristics. (C–E) Determination of the area under the ROC curve (AUC) of the risk score and clinical features based on the ROC curve. (F–H) The Kaplan–Meier curve shows the prognostic value of the risk model for COAD patients categorized by age, stage, and T stage.

Independent prognostic role of risk model signature. (A, B) Univariate and multivariate Cox regression analyses of risk scores and clinical characteristics. (C–E) Determination of the area under the ROC curve (AUC) of the risk score and clinical features based on the ROC curve. (F–H) The Kaplan–Meier curve shows the prognostic value of the risk model for COAD patients categorized by age, stage, and T stage.

Construction of GT-related lncRNAs nomogram based on the prognostic model and clinical characters

To evaluate the clinical characters and risk model for colorectal cancer prognosis, we integrated age, stage, and risk score to build a nomogram ( ). Additionally, a time-dependent ROC analysis was performed to evaluate whether risk score and clinical traits were capable for survival prediction ( ). The AUC value of risk score in time of 5-year follow-up was 0.726, 0.703, 0.729, 0.757, and 0.825, respectively. The AUC value of stage in time of 5-year follow-up was 0.755, 0.720, 0.758, 0.704, and 0.656, respectively. For age, the average AUC was below 0.6 during a follow-up of 5 years. These results showed that risk score exhibited much more powerful capacity of survival prediction compared with other clinical traits. Besides, calibration plots in 1, 3, and 5 years are shown in , and it revealed that the observed vs. predicted rates of 1-, 3-, and 5-year overall survival were in perfect concordance.
Figure 6

Construction and calibration of the nomogram. (A) Construction of a nomogram based on age, stage, and risk score as independent prognostic factors. (B) Time-dependent ROC analysis based on the nomogram and clinical characteristics. (C–E) The calibration plot for internal validation of the nomogram within 1, 3, and 5 years, respectively. AUC, the area under the ROC curve.

Construction and calibration of the nomogram. (A) Construction of a nomogram based on age, stage, and risk score as independent prognostic factors. (B) Time-dependent ROC analysis based on the nomogram and clinical characteristics. (C–E) The calibration plot for internal validation of the nomogram within 1, 3, and 5 years, respectively. AUC, the area under the ROC curve.

GSEA enrichment analysis and immune cell infiltration in GT-related lncRNA signature

To further explore the abnormally activated signal pathways between the high- and low-risk groups, we performed GSEA enrichment analysis. The results showed that the low-risk group was involved in several important pathways, including citrate cycle TCA cycle, fatty acid metabolism, glutathione metabolism, glycolysis gluconeogenesis, oxidative phosphorylation, P53 signaling pathway, proteasome, pyrimidine metabolism, ribosome, and valine leucine and isoleucine degradation ( ). Given the important role of tumor immune microenvironment (TIM), we looked into the expression levels of immune-related genes between the high- and low-risk score groups. The result showed that CD200R1, ARG1, CD160, ADORA2A, IL4, VEGFA, TNFRSF4, TNFRSF14, TNFRSF25, TBX2, VSIR, TGFB1, and NOS3 are increased, while CXCL8, NOS2, EZH2, and HHLA2 are decreased in the high-risk group compared with the low-risk group ( ). In addition, Pearson correlation analysis was conducted to analyzed the correlation between expression levels of immune-related genes and seven GT-related lncRNAs ( ). The results turn out that AC020558.2, AC105219.1, and MIR210HG were positively correlated with TNFRSF25 and TNFRSF14, which are immune suppression genes, while LINC02381, AC002310.1, and ZEB1-AS1 were negatively associated with immune response gene NOS2 and HHLA2. ImmuCellAI is an online tool to estimate the response of immune checkpoint blockade (ICB) therapy and the infiltration of immune cells based on the gene expression dataset (36). We used ImmuCellAI to predict the response of ICB therapy between the high- and low-risk groups. The result revealed that patients in the low-risk group tend to have a better response to ICB therapy than those in the high-risk group, even though the difference was not significant (p=0.0723, ). The abundance of immune cells with significant differences between the high- and low-risk groups is shown in . The levels of B cell and CD4+ T cell in the high-risk group were higher than those in the low-risk group, while the levels of T-helper 1 cells (Th1), neutrophil, and natural killer T (NKT) in the high-risk group were lower than those in the low-risk group (p<0.05). To determine the correlation between the risk score and tumor-infiltrating immune cells, we found that the risk score was positively correlated with B cell and CD4+ T cell and negatively correlated with Th1 and neutrophil (p<0.05) ( ). Taken together, these data suggest that our risk model can distinguish different characteristics of tumor immune cells in COAD.
Figure 7

Gene set enrichment analysis (GSEA) and the correlation between risk model and tumor-infiltrating immune cells. (A) Enrichment analysis of signaling pathways in the risk model. (B) The expression levels of immune-related genes between the high- and low-risk groups. (C) The response of immune checkpoint blockade (ICB) therapy between the high- and low-risk group. R, response; NR, no response. (D) The fraction of tumor immune infiltrating cells in the high- and low-risk group. (E) Correlation of risk score with five tumor-infiltrating immune cell subtypes. *p < 0.05, **p < 0.01, ***p < 0.001. Th1, T helper 1; NKT, natural killer T.

Gene set enrichment analysis (GSEA) and the correlation between risk model and tumor-infiltrating immune cells. (A) Enrichment analysis of signaling pathways in the risk model. (B) The expression levels of immune-related genes between the high- and low-risk groups. (C) The response of immune checkpoint blockade (ICB) therapy between the high- and low-risk group. R, response; NR, no response. (D) The fraction of tumor immune infiltrating cells in the high- and low-risk group. (E) Correlation of risk score with five tumor-infiltrating immune cell subtypes. *p < 0.05, **p < 0.01, ***p < 0.001. Th1, T helper 1; NKT, natural killer T.

Expression levels of GT-related lncRNAs in COAD cell lines

To further verify our results, we used RT-qPCR assay to analyze the expression level of seven GT-related lncRNAs in COAD cell lines ( ). Three human COAD cell lines, namely, HCT116, DLD1, and HT-29, were used, and NCM460, a normal human colon mucosal epithelial cell line, was used as control. Almost all seven prognostic GT-related lncRNAs were upregulated in COAD cell lines, indicating that these GT-related lncRNAs play key roles during progression of COAD. The results were generally consistent with our work.
Figure 8

The expression level of LINC02381 (A), MIR210HG (B), AC009237.14 (C), AC105219.1 (D), ZEB1-AS1 (E), AC002310.1 (F), AC020558.2 (G) between normal cell line NCM460 and HCT116, DLD1, and HT-29 CRC cell lines.

The expression level of LINC02381 (A), MIR210HG (B), AC009237.14 (C), AC105219.1 (D), ZEB1-AS1 (E), AC002310.1 (F), AC020558.2 (G) between normal cell line NCM460 and HCT116, DLD1, and HT-29 CRC cell lines.

Discussion

COAD, as the most frequent type of colorectal cancer, have been widely investigated in terms of its rapid development and treatment. The circulating tumor DNA, tumor-infiltrating immune cells, and microsatellite status have emerged as promising biomarkers for the diagnosis and treatment of CRC (37–39). As an important epigenetic player in cancer pathogenesis, lncRNA together with N6-methyladenosine (m (6)A) RNA methylation has been shown to predict treatment response and disease prognosis in CRC patients (40, 41). Epigenetic modifications have gained much attention in the research on the development of CRC, which is regarded as a diagnostic and predictive biomarker of CRC patients (42, 43). Protein glycosylation is the most common post-translational modification, in which GTs catalyze the formation of glycoproteins (44). Thus, dysregulation of GTs may contribute to aberrant glycosylation, leading to tumorigenesis. However, the role of GT-related lncRNAs in COAD remains poorly defined. For this purpose, it is necessary to construct a useful risk model based on the GT-related lncRNAs. In this study, we acquired RNA-sequencing data and clinical information of COAD from the TCGA database and extracted 210 GT genes from the GlycoGene DataBase. First, we identified 46 differentially expressed GT genes and explore their mutation frequency in COAD. Then, we screened 1,140 GT-related lncRNAs based on the co-expression of lncRNAs and GTs. Subsequently, seven prognostic GT-related lncRNAs (LINC02381, AC002310.1, ZEB1AS1, AC020558.2, AC105219.1, MIR210HG, and AC009237.14) were identified through LASSO and Cox regression analysis (30). The risk signature was determined based on the seven GT-related lncRNAs, which stratified COAD patients into two high- and low-risk groups. Survival analysis revealed that patients in the high-risk group had a much worse prognosis. Moreover, patients with high expression levels of seven GT-related lncRNAs turned out to have poorer survivability, suggesting these lncRNAs as risk factors in COAD prognosis. Next, we performed univariate and multivariate Cox regression analyses to identify whether risk score can be an independent prognostic factor. In subgroup analyses, patients in the high-risk subgroup had poorer clinical phenotypes compared with those in the low-risk subgroup, indicating that our risk signature could affect the progression of COAD patients. A nomogram based on risk score, age, and stage was built to predict the prognosis of patients over 1, 3, and 5 years, which showed good performance in clinical practice. Moreover, a time-dependent ROC analysis was conducted to validate the accuracy of the risk model. Compared with the AUC value of age and stage, risk score presented more powerful capacity in survival prediction. These results turned out that our prognostic risk signature can be used as a potential predictor in COAD prognosis. Currently, only LINC02381, ZEB1AS1, and MIR210HG have been studied extensively. Huang et al. found that LINC02381 was upregulated in breast cancer, and knockdown of LINC02381 impaired the malignant phenotypes of breast cancer cells, including cell proliferation, migration, and invasion (45). Moreover, LINC02381 plays an oncogenic role in cervical cancer and osteosarcoma (46, 47). ZEB1AS1 is a famous cancer-related lncRNA, and its overexpression is widely found in various cancers, including colorectal cancer (48). Lv et al. demonstrated that ZEB1AS1 was increased in CRC, and it promoted CRC cell proliferation via mediating Wnt/β-catenin signaling (49). CRC patients with high ZEB1AS1 expression led to a poor prognosis and lower OS rate than those with low ZEB1AS1 expression, suggesting that ZEB1-AS1 is a promising biomarker in predicting clinical outcomes (50, 51). Studies about MIR210HG in tumorigenesis have been widely reported. In pancreatic cancer, MIR210HG acted as oncogenic regulator and promoted pancreatic cell proliferation and migration (52). In addition to pancreatic cancer, MIR210HG can also promote various cancer types, such as breast cancer, gastric cancer, and ovarian cancer (53–55). Additionally, previous studies have identified seven GT-related lncRNAs as prognostic factors in colon cancer patients. For example, ZEB1-AS1, LINC02381, AC105219.1, and AC002310.1 were identified as independent prognostic factors in predicting survival in colon cancer patients (56). MIR210HG and AC009237.14 also play an important role in predicting prognosis in colon cancer patients (57). This evidence make our results more reliable, suggesting that our GT-related lncRNA signature has a promising potential for predicting COAD prognosis. To further explore the potential biological mechanism of our prognostic risk signature, we further used GSEA to determine the enriched signaling pathways. The results showed that the prognostic risk signature was mainly enriched in metabolism-related signaling pathways. Abnormal metabolism plays a critical functional role in colorectal cancer development and progression; thus, targeting key metabolic factors might be a potential treatment in colorectal cancer. Ag120 (ivosidenib), an inhibitor for glutamine uptaking protein ASCT/SLC1A5, was found to block glutamine transport and metabolism, thus leading to impaired cell proliferation, increased autophagy, and oxidative stress (58). Besides, our prognostic risk model was also enriched in P53 pathway signaling. P53 was encoded by TP53 gene, and mutations of TP53 have been associated with poor prognosis in various cancer types, including colorectal cancers (59). In advanced CRC patients with distant metastases, the TP53 mutation rate reached 80% (60). The specific mechanism by GT-related lncRNAs participating in the regulation of COAD biological process requires further in vivo and in vitro experimentation. In addition, we compared immune-related genes between high- and low-risk groups. Interestingly, several immunosuppression-related genes were significantly higher expressed in the high-risk group, including CD200R1, ARG1, TNFRSF4, TNFRSF14, and VSIR. These genes have been reported to be associated with the inhibition of T-cell responses and cell apoptosis (61–65). Besides, several tumor-related genes, like VEGFA, TBX2, and TGFB1, were also enriched in the high-risk group (66–68). These findings suggested that patients in the high-risk group might have a poorer prognosis compared with those in the low-risk group. Recently, studies about the role of lncRNAs in TIM have been widely reported. The prognostic value of immune-related lncRNAs has been found in various cancers. Currently, a prognostic model has been conducted based on eight immune-related lncRNAs pairs that can effectively predict the prognosis of patients with COAD (69). Here, we performed an in-depth analysis of the relationship between the risk model and the distribution of immune-infiltrating cells by ImmuCellAI. We found that the high- and low-risk groups significantly distinguished the characteristics of B cells, CD4+ T cells, Th1, neutrophils, and NKT cells, among which B cells and CD4+ T cells exhibited a higher degree of infiltration in the high-risk group than in the low-risk group. Alternatively, Th1, neutrophils, and NKT cells exhibited a higher degree of infiltration in the low-risk group than in the high-risk group. In this study, the results of correlation between risk score and immune-infiltrating cells showed that the risk score was positively correlated with B cells and CD4+ T cells and negatively correlated with Th1 and neutrophils. Altogether, our results showed that the risk model could evaluate the tumor infiltrating-immune cells to analyze the tumor immune characteristics, thereby determining the prognosis of patients with COAD. To further validate the accuracy of our model, we employed GSE39582 as the validation set, and the same results were obtained, suggesting that our risk model had a good prognostic predictive ability in COAD patients. Moreover, we conducted RT-qPCR assay to analyze the expression level of seven GT-related lncRNAs in COAD cell lines. The expression of seven GT-related lncRNAs were significantly upregulated in COAD cells than that in the control cell, which were consistent with our model. Combined with and , high expression levels of seven GT-related lncRNAs were associated with worse survival, HR>1, indicating that they might play a role as cancer‐promoting genes, which were consistent with our risk model.

Conclusion

The risk model based on seven GT-related lncRNAs has independent prognostic value and high reliability in predicting the prognosis of patients with colorectal cancer. Owing to the important role of lncRNAs in the cellular process, these GT-related lncRNAs might be promising biomarkers and targets for CRC diagnosis and treatment, respectively. Moreover, the risk model for COAD was transformed into a nomogram, providing a convenient tool for clinical practice and new sights for a better understanding of the mechanism of immune cell-specific genes and tumor infiltrating-immune cells in cancer regulation.

Data availability statement

The original contributions presented in the study are included in the article/ . Further inquiries can be directed to the corresponding authors.

Author contributions

JZ, YW, and YX designed the study. JZ, YW, and JM acquired the data and performed data analysis and interpretation. JZ and YW drafted the manuscript. DX, LW, and YF provided suggestion to improve it. SZ and YX critically revised it for important intellectual content. All authors contributed to the article and approved the submitted version.

Funding

This study was supported by the Natural Science Foundation of Zhejiang Province of China (LY21H080005) and National Natural Science Foundation of China (81572920).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  68 in total

1.  Time-dependent ROC curves for censored survival data and a diagnostic marker.

Authors:  P J Heagerty; T Lumley; M S Pepe
Journal:  Biometrics       Date:  2000-06       Impact factor: 2.571

2.  Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks.

Authors:  Paul Blanche; Jean-François Dartigues; Hélène Jacqmin-Gadda
Journal:  Stat Med       Date:  2013-09-12       Impact factor: 2.373

3.  ELK1-induced upregulation lncRNA LINC02381 accelerates the osteosarcoma tumorigenesis through targeting CDCA4 via sponging miR-503-5p.

Authors:  Xia Bian; Yun-Ming Sun; Li-Min Wang; Ying-Lie Shang
Journal:  Biochem Biophys Res Commun       Date:  2021-02-25       Impact factor: 3.575

Review 4.  Global view of human protein glycosylation pathways and functions.

Authors:  Katrine T Schjoldager; Yoshiki Narimatsu; Hiren J Joshi; Henrik Clausen
Journal:  Nat Rev Mol Cell Biol       Date:  2020-10-21       Impact factor: 94.444

Review 5.  Immunotherapy efficacy on mismatch repair-deficient colorectal cancer: From bench to bedside.

Authors:  Darleny Y Lizardo; Chaoyuan Kuang; Suisui Hao; Jian Yu; Yi Huang; Lin Zhang
Journal:  Biochim Biophys Acta Rev Cancer       Date:  2020-10-06       Impact factor: 10.680

6.  Randomized Trial of Irinotecan and Cetuximab With or Without Vemurafenib in BRAF-Mutant Metastatic Colorectal Cancer (SWOG S1406).

Authors:  Scott Kopetz; Katherine A Guthrie; Van K Morris; Heinz-Josef Lenz; Anthony M Magliocco; Dipen Maru; Yibing Yan; Richard Lanman; Ganiraju Manyam; David S Hong; Alexey Sorokin; Chloe E Atreya; Luis A Diaz; Carmen Allegra; Kanwal P Raghav; Stephen E Wang; Christopher H Lieu; Shannon L McDonough; Philip A Philip; Howard S Hochster
Journal:  J Clin Oncol       Date:  2020-12-23       Impact factor: 50.717

7.  Development of an Immune-Related Gene Signature for Prognosis in Melanoma.

Authors:  Jia-An Zhang; Xu-Yue Zhou; Dan Huang; Chao Luan; Heng Gu; Mei Ju; Kun Chen
Journal:  Front Oncol       Date:  2021-01-21       Impact factor: 6.244

8.  Canonical TGFβ signaling induces collective invasion in colorectal carcinogenesis through a Snail1- and Zeb1-independent partial EMT.

Authors:  Marion Flum; Severin Dicks; Yu-Hsiang Teng; Monika Schrempp; Alexander Nyström; Melanie Boerries; Andreas Hecht
Journal:  Oncogene       Date:  2022-01-24       Impact factor: 8.756

Review 9.  LncRNAs, the Molecules Involved in Communications With Colorectal Cancer Stem Cells.

Authors:  Boyang Fan; Qian Zhang; Ning Wang; Guiyu Wang
Journal:  Front Oncol       Date:  2022-01-27       Impact factor: 6.244

10.  Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana.

Authors:  Daishin Ueno; Harunori Kawabe; Shotaro Yamasaki; Taku Demura; Ko Kato
Journal:  BMC Bioinformatics       Date:  2021-07-22       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.