Literature DB >> 32480397

Novel Prognostic Model for Gastric Cancer using 13 Co-Expression Long Non-Coding RNAs (LncRNAs).

Xi Luo1, Kuan-Jui Su2, Chuan Qiu2, Xing Liu1, Fang Yang1.   

Abstract

BACKGROUND The established clinical criteria for gastric cancer prognosis are insufficient due to molecular heterogeneity. Therefore, constructing a robust prognostic model is essential to predict gastric cancer patient survival. MATERIAL AND METHODS A comprehensive method, which combined weighted gene co-expression network analysis (WGCNA) with elastic-net Cox regression, was utilized to identify prognostic long non-coding RNAs (lncRNAs) from Gene Expression Omnibus database for overall survival (OS) prediction. Methods using WGCNA or elastic-net Cox regression alone were treated as "contrast" methods. The univariate and multivariate Cox regression was used to identify independent prognostic clinical factors. We performed 3-year and 5-year area under the curve (AUC) of the time-dependent receiver operating characteristic comparison of 3 different methods in gene and clinical-gene models to explore the prediction ability of the comprehensive method. The optimal model identified in the training set were validated in the validation set. Biological information analysis for the optimal model was also explored. RESULTS The clinical-gene model containing 13 co-expression lncRNAs identified by the comprehensive method and 3 clinical factors including molecular subtype, recurrence status and operation type, was the found to be the optimal model in the study, with 0.832 and 0.830 for the 3-year and 5-year AUC in the training set, and 0.764 and 0.778 in the validation set, respectively. Biological information analysis suggested that lipid metabolism played an important role in the occurrence and development of gastric cancer. CONCLUSIONS We constructed a novel prognostic model containing 13 co-expression lncRNAs and 3 clinical factors for gastric cancer patients.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32480397      PMCID: PMC7288832          DOI: 10.12659/MSM.923295

Source DB:  PubMed          Journal:  Med Sci Monit        ISSN: 1234-1010


Background

Gastric cancer (GC) is a widely known malignant cancer, identified as the third leading cause of death from cancer worldwide. In 2015, approximately 1 313 000 people were diagnosed with the condition, and 819 000 patients died worldwide with a mortality rate of nearly 50% [1]. Although development in neoadjuvant therapy and surgical techniques can improve the potential survival of GC patients, the 5-year overall survival (OS) rate of GC patients remains at an unsatisfactory level [2]. Earlier studies have researched that the time of initial diagnosis, disease stage, recurrence or distance-metastasis of GC, the infection status of Helicobacter pylori and other demographic and clinical factors that are associated with a poor prognosis [3,4]. However, established clinical criteria of prognostic strategies based on these factors, such as tumor/node/metastasis (TNM) classification, Lauren classification, and the World Health Organization (WHO) classification, were insufficient to predict the OS of patients with the tumor involving complex genetic alteration [5-7]. Therefore, constructing a robust prognostic model including genetic factors is essential to effectively predict the prognosis of GC patients. Recent studies have focused on high-throughput sequencing technology to identify biomarkers related to the survival of GC patients. Since less than 2% of the human genome encode proteins, there is increasing evidence that has indicated that the non-coding RNA is involved in tumor occurrence and progression by regulating gene expression [8,9]. Long non-coding RNA (lncRNA), a group of non-coding RNA with a length greater than 200 nucleotides, which is a 3 times greater quantity than protein-coding RNA, has been shown to exhibit a vital role in cancer prognosis in recent years [10]. Zhu et al. [11] identified 24 lncRNAs which were related to the prognosis of GC patients by using a multivariable Cox regression model, with area under the curve (AUC) was 0.85 when combined with the American Joint Committee on Cancers (AJCC) stage. Fan et al. [12] selected 5 lncRNAs as OS biomarkers through a random survival forest algorithm, with an AUC of 0.86. Cheng et al. [13] screened 3 prognostic lncRNAs using the least absolute shrinkage and selection operator (LASSO) Cox regression and the 19-month AUC was 0.737. Peng et al. [14] identified 7 prognostic lncRNA pairs as a prognostic signature using a permutation method and LASSO Cox regression with concordance index of 0.872. Zhang et al. [15] identified 11 lncRNAs as an independent survival signature for GC patients by utilizing the co-expression of genes and LASSO Cox regression. However, most of these studies were faced with the challenge of high dimensionality and collinearity in data analysis, and they might not reflect the interconnection among genes, which might cause model over-fitting and lose meaningful molecules during analysis [16]. Therefore, it is necessary to consider the interconnection among genes and avoid model over-fitting simultaneously when predicting GC prognosis. In recent years, various statistical methods have been introduced to reduce over-fitting in microarray data analysis [10,12,17]. Penalized regression combined with the Cox proportional hazards model, including LASSO Cox regression, ridge regulated Cox regression, and elastic-net Cox regression, can achieve greater performance of genomic survival analysis by adjusting the parameters, rather than using traditional Cox regression [18]. The LASSO Cox regression can reduce the dimensions of microarray data, but it cannot solve the collinearity problem; while the ridge Cox regression can address the multicollinearity issues but cannot execute the variable selection. The elastic-net Cox regression, which combines the advantages of both LASSO and ridge Cox regressions, has been used in a number of research studies to screen genes associated with cancer prognosis [19,20]. In addition, weighted gene co-expression network analysis (WGCNA) has been widely performed to identify highly interconnected genes and to explore the correlation between co-expression modules and clinical traits [21]. Thus, in this study, in order to construct a robust prognostic model of GC, we used a comprehensive method, which combined WGCNA with elastic-net Cox regression, to identify the OS prediction lncRNAs. Methods using WGCNA or elastic-net Cox regression alone were treated as “contrast” methods. Three- and 5-year AUC of the time-dependent receiver operating characteristic (ROC) were calculated to evaluate the prediction ability of different models and identify an optimal model as the robust prognostic model of GC patients in our study. Stratification analysis based on independent clinical factors was used to validate the independence of the optimal model. Biological information analysis such as the Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses was used to identify the biological functions of the lncRNAs involved in the optimal model, so as to provide a more comprehensive reference for future prognostic researches and the treatment of GC patients.

Material and Methods

Data resource and preprocessing

The chip data were obtained from Gene Expression Omnibus (GEO) database. Due to the chip platform requirement of the lncRNA re-annotation pipeline, the inclusion criteria were as follows: gene expression profiles of GC specimens could be accessed; a total sample of GC >50; the chip platform was GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array, Santa Clara, CA, USA); basic clinical data containing OS information was available. Lastly, the data set GSE62254 was selected, which consisted of 300 GC samples, the corresponding clinical variables were obtained from the original research. Next, we randomly selected 70% of the samples without returning to be our training set and the other 30% of samples were used as the validation set. All statistical analysis was conducted by R 3.5.0 software, significance level was set as P<0.05. In general, lncRNA expression levels are lower than those of protein-coding genes. The robust multichip average (RMA) method is an effective method used to obtain a consistent estimate value of lncRNA expression profiles [22,23]. Therefore, the raw CEL file of GSE62254 was processed with background correction, quantile normalization and log2 transformation by using the RMA method of R package “affy”. Several missing clinic values were filled by using R package “rpart”.

LncRNA re-annotation

Affymetrix HG-U133 Plus 2.0 probe set ID annotation was based on the annotation of NetAffx, RefSeq and Ensembl databases for lncRNAs [24,25]. Firstly, we mapped the chip probe set ID to the NetAffx Annotation Files (HG-U133 Plus 2.0 Annotations, CSV format, release 36, January 2017), the Refseq IDs of NetAffx Annotation Files which were labeled “NR” and “XR” were retained. Secondly, the probe sets from the latest Refseq database annotation with the gene type of “long non-coding RNA” were retained. For the next step, we retained the probe sets from both the NetAffx and Refseq database annotations. Then, in the annotation of the Ensembl database, the IDs with “3prime-overlapping-ncRNA”, “antisense”, “bidirectional_promoter_lncRNA”, “lincRNA”, “macro-lncRNA”, “misc-RNA”, “processed transcripts”, “sense-overlapping”, “sense_intronic” in the Ensembl database were retained. Finally, the probe sets which were assigned with a Refseq transcript ID or Ensembl gene ID were retained for annotation.

Prognostic model construction and comparison

Construction of weighted gene co-expression networks

The lncRNA expression profiles were used to construct weighted co-expression networks through the R package “WGCNA” [26,27]. Firstly, a co-expression similarity matrix was constructed by using Pearson’s correlation coefficients for all pairwise lncRNAs. Secondly, we transformed the co-expression similarity matrix into an adjacency matrix by choosing the soft threshold power β=7 (scale-free topology fitting index R2=0.92) for scale-free topology network construction. A topological matrix (TOM) was created using a topological overlap measure. Then, we calculated the corresponding dissimilarity of TOM (dissTOM) for further analysis. The average linkage hierarchical clustering method was used to define the network modules by node dissimilarity, and the hybrid dynamic tree cutting method was used to cut branches by setting a minimum gene group size of 30 with a cut height of 0.99 for the resulted dendrogram. Additionally, the module eigengenes (MEs) were used to represent the gene expression profiles of modules, defined as the first principal component following principal component analysis in the expression profiles of lncRNAs within a given module. The module dissimilarity correlation was calculated based on MEs to merge the modules with similar expression profiles greater than 20%. A univariate Cox regression analysis was performed to identify prognostic modules with P value <0.05. Then, the genes in prognostic modules with P value <0.01 were selected as hub genes by the univariate Cox regression analysis.

Development of prognostic models

To construct a robust prognostic model, we used 3 methods to identify prognosis candidate biomarkers: the comprehensive method which used the WGCNA algorithm and elastic-net Cox regression simultaneously, and the “contrast” methods using the WGCNA or the elastic-net Cox regression alone. The hub genes of the co-expression modules play important roles within biological processes and have generally high interconnection. Therefore, for the WGCNA method, we used hub genes of prognostic modules as candidate prognostic genes. For the elastic-net Cox regression method, all lncRNAs were included in an elastic-net Cox regression, of which lncRNAs with non-zero regression coefficient were considered as candidate genes. For the comprehensive method, we incorporated the lncRNAs of the prognostic modules which was identified by WGCNA into the elastic-net Cox regression to further screen candidate genes. The significant independent prognostic clinical variables were selected using univariate and multivariate Cox regression analysis with a threshold α=0.05. We then constructed clinical, gene, and clinical-gene models of the 3 methods by using risk score (RS) formula for GC patients. The risk score was calculated as follows [27,28]: where exp indicates the expression of the candidate variable i, and β is the regression coefficient of i which is calculated using ridge regulated Cox regression to ensure consistency of all models in our study [16]. The penalized Cox regression, including the elastic-net Cox regression and ridge regulated Cox regression, were performed using R package “glmnet” with 10 000 iterations and 10-fold cross-validations.

Prediction ability comparison of prognostic models

The 3-year and 5-year AUC of each model were calculated by R package “timeROC” to explore the model predictive ability. The Z-test of the AUC values were used to compare the predictive ability of the models and identify an optimal prognostic model of GC in our study. Bonferroni correction was used for multiple comparisons with threshold α=0.025. Furthermore, all GC patients were divided into 2 groups (high-risk group and low-risk group) based on their risk score using the median value of risk score as the cutoff value. Overall survival comparison of the 2 risk groups was carried out using Kaplan-Meier (K-M) analysis and a log-rank test to identify the prognostic value of the RS index. In addition, stratification analysis was performed based on the independent prognostic clinical variables to assess the independence of the optimal model which we identified.

Biological function analysis

The top 200 mRNAs of the Spearman correlation coefficient between the optimal lncRNAs and the data corresponding with mRNA were identified as model target genes [28]. The GO function and KEGG pathway enrichment analyses of the target genes were used to explore potential biological functions involved in the optimal model. In our study, the analysis was performed through the database for annotation, visualization and integrated discovery (DAVID) 6.8 ().

Results

Basic characteristics of the data microarray

A total of 14 different clinical factors were utilized when conducting this study, which were age, sex, T stage, M stage, N stage, AJCC stage, WHO classification, Lauren subtype, molecular subtype, recurrence status, tumor site, adjuvant concurrent chemoradiation therapy (adjuvant CCRT), operation (OP) type, and Epstein-Barr virus (EBV) status. The basic characteristics of these clinical variables were all displayed in Table 1. Based on random sampling, 210 independent samples were classified as a training data set and 90 samples were classified as a validation data set. Using the data re-annotation method, a total of 7150 probes (containing the 5238 lncRNAs) were identified for further analysis.
Table 1

Basic characteristics of the clinical variables in GC patients.

Clinical variablesTotalTraining datasetValidate dataset
Sample size30021090
Survival status
 Dead15210250
 Survived14810840
Median age (year)61.9447.5962.76
Sex
 Female1017427
 Male19913663
T stage
 218813058
 3916724
 421138
N stage
 0382810
 11319437
 2804931
 3513912
M stage
 027319182
 127198
AJCC stage
 I30219
 II977027
 III966531
 IV775423
Recurrent status
 Yes12511053
 No1577729
 Unknown18198
Tumor cite4
 Antrum15511045
 Body1077730
 Cardia321913
 Whole642
WHO classification
 W/D and M/D tubular1148925
 P/D tubular1167343
 Signet ring cell372512
 Mucinous853
 Papillary963
 Other16124
Lauren type
 Intestinal15011040
 Diffuse1429547
 Mixed853
Molecular subtype
 MSS/TP53−1077928
 MSS/TP53+795326
 MSI684424
 MSS/EMT463412
EBV status
 Positive1818176
 Negative257135
 Missing12169
Adjuvant CCRT
 Completed734627
 Not completed752
 Not done22012961
OP type
 TG1359441
 STG16511649

GC – gastric cancer; AJCC – American Joint Committee on Cancer; WHO – World Health Organization; W/D – well-differentiated; M/D – moderately differentiated; P/D – poorly differentiated; MSS – microsatellite stable; TP53− – tumor protein 53 inactive; TP53+ – tumor protein 53 active; MSI – microsatellite instability; EMT – epithelial-mesenchymal transition; TG – total gastrectomy; STG – subtotal gastrectomy.

Construction and comparison of prognostic models for GC patients

Clinical model

The univariate and multivariate Cox regression analyses of the clinical information only screened 3 clinical variables related to OS which were molecular subtype, recurrence status, and OP type, shown in Table 2. Clinical model RSclinical was constructed based on these 3 independent prognostic factors, the corresponding 3-year and 5-year AUC were 0.765 and 0.780.
Table 2

Prognostic clinical factors of OS for GC patients estimated by univariate and multivariate Cox regression.

Clinical variablesUnivariate analysisMultivariate analysis
HRP value95% CIHRP value95% CI
Sex0.800.3110.53–1.230.660.0740.42–1.04
Age1.010.4160.99–1.031.030.0141.01–1.05
Molecular subtype*0.790.0280.64–0.980.700.0030.55–0.88
Lauren subtype1.250.2200.87–1.791.150.5060.76–1.75
WHO classification1.110.1480.96–1.281.030.7520.87–1.22
T stage1.680.0011.24–2.281.510.0590.98–2.32
N stage1.990.0001.55–2.561.330.2660.81–2.19
M stage3.810.0021.61–9.041.830.2570.64–5.19
AJCC stage2.200.0001.7–2.861.370.3070.75–2.49
Tumor site1.480.0031.14–1.911.180.3520.83–1.66
Recurrence status*1.390.0131.07–1.811.570.0071.13–2.17
OP type*0.530.0010.36–0.780.560.0490.32–0.99
EBV status0.580.4410.14–2.350.590.4710.14–2.49
Adjuvant. CCRT1.070.5760.85–1.331.001.0000.78–1.28

Independent prognostic clinical variables with statistical significance at P<0.05 level both in univariate and multivariate Cox analyses.

OS – overall survival; GC – gastric cancer; HR – hazard ratio; CI – confidence interval; WHO – World Health Organization; AJCC – American Joint Committee on Cancer; OP – operation; EBV – Epstein-Barr virus; CCRT – concurrent chemoradiation therapy.

Gene model

Seven modules including blue, brown, green, red, turquoise, yellow, and gray modules were identified by WGCNA in our study. The modules were represented by branches of different colors, shown in Figure 1. The univariate Cox regression based on MEs of co-expression modules indicated only the red module (containing 55 lncRNAs) had a significant association with OS, and the increased expression of lncRNAs in the red module was associated with poor prognosis (hazard ratio [HR]=42.25, P=0.013, Table 3).
Figure 1

Gene clustering dendrogram and merged module colors based on a dissimilarity measure (1-TOM).

Table 3

Prognostic modules of OS for GC patients estimated by univariate Cox regression analysis.

Modules#HRP-Value95%CIGene numbersHub genes
Blue0.180.2340.01–3.090826
Brown1.030.9810.06–17.34110
Turquoise1.870.6710.10–33.442064
Yellow1.720.7120.10–30.3986
Green3.290.3920.22–50.3265
Red*42.250.0132.20–811.45511
Gray0.050.0530.00–1.0402032

Modules identified by WGCNA;

prognostic module with statistical significance at P<0.05 level in univariate Cox analysis.

OS – overall survival; GC – gastric cancer; HR – hazard ratio; CI – confidence interval. WGCNA – weighted gene co-expression network analysis.

Firstly, 11 hub genes of red module were identified as candidate lncRNAs of WGCNA method to construct the gene model RSW. The 3-year and 5-year AUC of the RSW model were 0.689 and 0.682, respectively. Next, the 3 lncRNAs (LINC00930, AP000550.1, and AC009052.1) that were screened by the elastic-net Cox regression were used to construct the gene model RSe. The 3-year and 5-year AUC of RSe were 0.715 and 0.694, respectively. Then, the 13 candidate lncRNAs screened by the comprehensive method were used to construct the gene model RSc, the corresponding 3-year and 5-year AUC were 0.731 and 0.732, respectively. The candidate prognostic lncRNAs of different methods are shown in Table 4.
Table 4

The candidate LncRNAs identified by 3 different methods.

RSWRSeRSc
lncRNACoeflncRNACoeflncRNACoef
LOC6446560.0375LINC00930−1.4153LOC6446560.1710
VWA8.AS10.0435AP000550.1−0.6437VWA8.AS10.2159
LINC010850.0216AC009052.1−1.9817LOC101928069−0.4790
LINC006060.0302LINC01206−0.2111
KMT2E.AS10.0679LINC010850.1170
DLG1.AS10.0186KMT2E.AS10.3330
BVES.AS10.0666DAPK1.IT10.1238
ADAMTSL4.AS10.0275AC139713.20.3901
AC139713.20.0773AC023509.1−0.7042
AC017091.10.1007AC017091.10.2683
PXN.AS10.0679PXN.AS10.1654
PTPRD.AS1−0.5058
PRKAG2.AS1−0.3523

Coef was the corresponding ridge regression coefficient of the lncRNA (long noncoding RNA).

Clinical-gene model

The integrated clinical-gene models consisted of the candidate lncRNAs with 3 independent clinical factors for further analysis. The model RSW-clinical, which combined the candidate lncRNAs of RSW with 3 independent clinical factors, had 3-year and 5-year AUC of 0.816 and 0.805, respectively. The model RSe-clinical, containing 3 lncRNAs of RSe, had 3-year and 5-year AUC of 0.814 and 0.796, respectively. Furthermore, the 3-year and 5-year predictive ability of the model RSc-clinical, containing 13 lncRNAs identified by the comprehensive method, were 0.832 and 0.830, respectively.

Comparison of predictive ability

The comparison between the different models were illustrated in Figure 2. The 3-year and 5-year AUC comparisons of gene models indicated that the model RSc performed much better than models RSW and RSe with P<0.05. Considering the influence of clinical factors on model predictive ability, we compared the clinical-gene models of different methods. The model RSc-clinical exhibited a higher predictive ability than the RSW-clinical and RSe-clinical models in 3-year and 5-year survival prediction with P<0.001 (Figure 2A). Furthermore, the comparison between clinical, gene, and clinical-gene models of the comprehensive method indicated the RSc-clinical exhibited the highest predictive ability of GC patient’s OS (Figure 2B). Therefore, the optimal model RSc-clinical were identified as the robust prognostic model in the study. The median of the RSc-clinical index was used as our cutoff value to divide GC patients into 2 separate groups (high-risk group and low-risk group). The K-M survival analysis indicated that the low-risk group had a significantly better survival than the high-risk group (P<0.001, Figure 3A).
Figure 2

The 3-year and 5-year area under the curve (AUC) comparison. (A) The 3-year and 5-year AUC comparison between 3 methods in gene and clinical-gene models in the training set. (B) The 3-year and 5-year AUC comparison between clinical, gene, and clinical-gene models of the comprehensive method in the training set. (C) The 3-year and 5-year AUC comparison between 3 methods in gene and clinical-gene models in the validation set. (D) The 3-year and 5-year AUC comparison between clinical, gene, and clinical-gene models of the comprehensive method in the validation set.

Figure 3

Kaplan-Meier curve. (A) Kaplan-Meier curve of high-risk and low-risk groups in the training set (the median of RSc-clinical as a cutoff value for patients grouping); (B) Kaplan-Meier curve of high-risk and low-risk groups in the validation set (the median of RSc-clinical as a cutoff value for patients grouping).

Validation of predictive ability

The predictive ability of models aforementioned were validated using the remaining 30% of samples. The 3-year and 5-year AUC of gene model RSc were 0.646 and 0.552, respectively. A comparison between gene models proved that the model RSc had a relatively better performance than models RSW and RSe in 3-year OS prediction of GC patients (the 3-year AUC of model RSW was 0.580; and the 3-year AUC of model RSe was 0.492; P<0.001, Figure 2C). There was no significant difference in the 5-year survival prediction between RSc and RSe, the 5-year AUC of gene model RSe was 0.555, P=0.768. In terms of comparison between the clinical-gene models, the model RSc-clinical, with 3-year and 5-year AUCs of 0.764 and 0.778, demonstrated a better OS predictive ability than RSW-clinical and RSe-clinical (the 3-year and 5-year AUC values of RSW-clinical were 0.750 and 0.751; and the 3-year and 5-year AUC values of RSe-clinical were 0.725 and 0.740, Figure 2C). The 3-year and 5-year AUC of the clinical model in the validation set were 0.747 and 0.714, respectively. Additionally, the comparison between clinical, gene, and clinical-gene models of the comprehensive method indicated that the model RSc-clinical had the most precise predictive ability in the validation set (P<0.05, Figure 2D). Using the model RSc-clinical, the K-M survival analysis between high-risk group and low-risk group in the validation set also indicated that high risk patients had a lower survival rate than the rate among low risk patients (Figure 3B).

Stratification analysis

Stratification analysis of the optimal model RSc-clinical was performed based on molecular subtypes, recurrent status and operation type (Table 5). The molecular subtypes of GC were identified based on different patterns of disease occurrence, progression, and prognosis from several studies [29-33]. The 4 molecular subtypes, including microsatellite instability (MSI), microsatellite stable with epithelial-mesenchymal transition (MSS/EMT), MSS with active tumor protein 53 (MSS/TP53+), and MSS with inactive tumor protein 53 (MSS/TP53−), were proven to have a significant difference regarding survival rates. Of the 4 subtypes, MSI subtype led to the best outcome, followed by MSS/TP53+ and MSS/TP53−, and MSS/EMT had the worst prognosis [34]. By the stratification analysis, the model performed stably and reliably in MSS/TP53+ and MSS/TP53− subgroups than other 2 subgroups: the 3-year and 5-year AUC values of MSS/TP53+ were 0.860 and 0.816; and the 3-year and 5-year AUC values of MSS/TP53− were 0.843 and 0.800. The AUC values of the MSI and MSS/EMT subgroups displayed a large fluctuation: the 3-year and 5-year AUC of MSI were 0.677 and 0.717; the 3-year and 5-year AUC values of MSS/EMT were 0.782 and 0.881. In addition, stratification analysis in recurrent status and operation type indicated the model performed well in patients without recurrence (the 3-year and 5-year AUC were 0.812 and 0.730), and had a good prediction ability for patients with total gastrectomy (the 3-year and 5-year AUC were 0.765 and 0.804). The stratified analysis was not performed in the validation set due to the insufficient sample size of subgroups.
Table 5

Stratification analysis of the optimal model based on 3 independent clinical factors in the training set.

Independent clinical factors3-year AUC5-year AUC
Molecular type
 MSS/TP53+0.8600.816
 MSS/TP53−0.8430.800
 MSI0.6770.717
 MSS/EMT0.7820.881
Recurrent status
 Yes0.6640.748
 No0.8120.730
Operation type
 TG0.7650.804
 STG0.7700.684

AUC – area under the curve; MSS – microsatellite stable; TP53+ – tumor protein 53 active; TP53− – tumor protein 53 inactive; MSI – microsatellite instability; EMT – epithelial-mesenchymal transition; TG – total gastrectomy; STG – subtotal gastrectomy.

Biological function analysis of 13-lncRNA

The list of 200 target genes was inputted into the DAVID database for GO and KEGG enrichment analysis and the results are shown in Figure 4. The GO analysis was performed for 3 different categories: biological process (BP); cell component (CC); and molecular function (MF). The top 10 items of BP, indicated that these lncRNAs were all associated with “cytoskeleton organization” and “fatty acid metabolic process”(Figure 4A); the top 10 CC terms shown in Figure 4B, such as “intracellular part” and “cytoplasmic part”, indicated that it might be involved in the composition of intracellular components. For the top 10 GO MF categories, items such as “oxidoreductase activity”, “phospholipase A2 activity”, and “catalytic activity” indicated that these biological activity processes potentially play a significant role in the occurrence of GC (Figure 4C). The top 5 significant KEGG pathways, which included “linoleic acid metabolism”, “alpha-linolenic acid metabolism”, “arachidonic acid metabolism”, “ether lipid metabolism”, and “valine, leucine and isoleucine degradation”, are displayed in Figure 4D. The biological pathways analysis indicated that the alterations in lipid metabolism were associated with cell proliferation of GC, and might play distinctive roles at various stages of tumor development [35].
Figure 4

The most significantly enriched Gene Ontology (GO) annotations and pathways of 13 lncRNAs which we identified in the study. The length of bars and the size of dots represent the numbers of genes; the color of bars and dots corresponds to P value according to legend. (A) Top 10 significantly enriched biological process GO annotations. (B) Top 10 significantly enriched cellular component GO annotations. (C) Top 10 significantly enriched molecular function GO annotations. (D) Top 5 significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.

Discussion

Gastric cancer (GC) is a widely known cancer with unsatisfied survival. It is crucial to predict the survival of GC patients by constructing a robust prognostic model. However, most of the studies could not cope with the problems of high dimensionality and collinearity in data analysis, and they did not consider the interconnection among genes. Therefore, we combined the weighted co-expression gene analysis with elastic-net Cox regression based on the lncRNA expression, and we identified 13 co-expression lncRNAs as prognostic biomarkers of GC, which were LOC644656, VWA8-AS1, LOC101928069, LINC01206, LINC01085, KMT2E-AS1, DAPK1-IT1, AC139713.2, AC023509.1, AC017091.1, PXN-AS1, PTPRD-AS1, and PRKAG2-AS1. The 3-year and 5-year AUC comparison of 3 different methods in gene and clinical-gene models indicated that the 13 co-expression lncRNAs identified by the comprehensive method were the best prognostic biomarkers in the study. The comparison between clinical, gene, and clinical-gene models of the comprehensive method identified the model RSc-clinical was the optimal prognostic model, with 0.832 and 0.830 for 3-year and 5-year AUC in the training set, respectively. Compared with previous prognostic research, although the 3-year and 5-year AUC of RSc-clinical were not the highest, the comprehensive method performed well in avoiding model over-fitting in high dimensional data and considering the association between genes, which were the limitations in previous studies. The results also indicated that both the clinical and genetic factors were indispensable in prognosis prediction. Analysis in the validation set further validated our findings. The comparison between different methods in the clinical-gene models validated the value of the comprehensive method in prognostic prediction for GC. The 3-year and 5-year AUC of the optimal model RSc-clinical were 0.764 and 0.778 in the validation set, respectively. However, the predictive ability of model RSc was performed unstably in the 3-year and 5-year survival prediction, which might be due to insufficient sample size. Furthermore, the K-M analysis of GC patients indicated that the RSc-clinical index might be an effective prognostic factor to distinguish high-risk and low-risk patients with GC. In addition, stratification analysis indicated the model RSc-clinical performed stably in the MSS/TP53+ and MSS/TP53− subgroups. And the optimal model had a good prediction ability for patients without recurrence and patients with total gastrectomy. However, the model performed inaccurately in other subgroups. The large fluctuations in the 3-year and 5-year survival prediction of the optimal model in other subgroups might be caused by the small sample size and the extreme survival status of subgroup GC patients. A more reliable predictive model is required in future research for patients with special subtypes of GC. One of the 13 prognostic lncRNAs PXN-AS1 has been reported to play an important role in tumor development, apoptosis, metastasis, and drug-resistance in several previous studies. Yuan et al. [36] identified PXN-AS1 as an alternative splicing factor which was modulated by Muscleblind-like-3. It was associated with focal adhesion protein, involved in transducing signals of the extracellular matrix, post-transcriptional gene regulation, and promoted cell proliferation in hepatocellular carcinoma. Zhang et al. [37] reported the mechanisms of PXN-AS1-L in non-small cell lung cancer (NSCLC). The over-expression of PXN-AS1-L increased the diversity of NSCLC cell and was significantly associated with advanced TNM stages and poor prognosis of NSCLC patients, and could be a potential prognostic biomarker and therapeutic target of NSCLC. Furthermore, lncRNA LINC01206 has been reported over-expression in lung squamous cell carcinoma or lung adenocarcinoma, and might be involved in cancer-related pathways such as apoptosis and migration of cell [38]. However, other lncRNAs which we identified in this study have not been reported currently, and the biological role of these lncRNAs in GC remains unknown. Therefore, we performed the GO and KEGG pathway enrichment analyses to briefly describe the potential molecular mechanisms of these 13 prognostic lncRNAs in GC. The results of the GO analysis indicated that these lncRNAs were associated with cytoskeleton components and intracellular components such as cytoplasm and organelle membrane, and participated in the fatty acid metabolic process and various biological activities such as oxidoreductase activity and catalytic activity. The KEGG enrichment analysis suggested that lipid metabolism plays an important role in GC cell proliferation, differentiation, and survival. The lipids are a diverse group of hydrophobic molecules which includes fats, oils, waxes, phospholipids, and steroids. Several studies have confirmed that various human cancers displayed aberrant activation of lipid metabolism, and this enabled cancer cells to proliferate, grow, and metastasize [35]. The alterations in lipid metabolism might be associate with GC progression and prognosis, and it might provide a novel diagnostic and therapeutic target for clinics. The biological functions of the 13 lncRNAs included in this study requires further investigation to provide a better understanding of the molecular mechanism in GC. In addition, some limitations in our study need to be improved in future research. Firstly, the lncRNA re-annotation pipeline was based on the Affymetrix HG-U133 Plus 2.0 platform, which only represented a part of lncRNAs. A more comprehensive and reliable lncRNA re-annotation pipeline for all platforms is required. In addition, we only used a 30% internal sample as our validation set; large external cohorts of GC patients are required to further assess the robustness of the optimal model. Also, corresponding cell experiments and clinical trials are needed to validate our findings in future investigation. In addition, some biases might exist in selecting prognostic biomarkers based on lncRNAs profiles in this study; thus, analysis based on multi-omics data is required to better understand the molecular functions and disease etiology.

Conclusions

In summary, we used a comprehensive method which combined WGCNA with elastic-net Cox regression to identify potential biomarkers in the OS prediction of GC patients. The clinical-gene model, which contained 13 co-expression lncRNAs (LOC644656, VWA8-AS1, LOC101928069, LINC01206, LINC01085, KMT2E-AS1, DAPK1-IT1, AC139713.2, AC023509.1, AC017091.1, PXN-AS1, PTPRD-AS1, and PRKAG2-AS1) and 3 independent clinical variables (molecular subtypes, recurrent status, and operation type) were identified as a robust prognostic model for this study. The novel prognosis model might provide molecular knowledge to improve the clinical findings for the OS of GC patients. Further studies are required to validate our findings and explain the biological functions of these lncRNAs.
  38 in total

Review 1.  The development of the TNM classification of gastric cancer.

Authors:  Christian Wittekind
Journal:  Pathol Int       Date:  2015-06-03       Impact factor: 2.534

2.  Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes.

Authors:  Razvan Cristescu; Jeeyun Lee; Michael Nebozhyn; Kyoung-Mee Kim; Jason C Ting; Swee Seong Wong; Jiangang Liu; Yong Gang Yue; Jian Wang; Kun Yu; Xiang S Ye; In-Gu Do; Shawn Liu; Lara Gong; Jake Fu; Jason Gang Jin; Min Gew Choi; Tae Sung Sohn; Joon Ho Lee; Jae Moon Bae; Seung Tae Kim; Se Hoon Park; Insuk Sohn; Sin-Ho Jung; Patrick Tan; Ronghua Chen; James Hardwick; Won Ki Kang; Mark Ayers; Dai Hongyue; Christoph Reinhard; Andrey Loboda; Sung Kim; Amit Aggarwal
Journal:  Nat Med       Date:  2015-04-20       Impact factor: 53.440

3.  A long non-coding RNA signature to improve prognosis prediction of gastric cancer.

Authors:  Xiaoqiang Zhu; Xianglong Tian; Chenyang Yu; Chaoqin Shen; Tingting Yan; Jie Hong; Zheng Wang; Jing-Yuan Fang; Haoyan Chen
Journal:  Mol Cancer       Date:  2016-09-20       Impact factor: 27.401

4.  Intrinsic subtypes of gastric cancer, based on gene expression pattern, predict survival and respond differently to chemotherapy.

Authors:  Iain Beehuat Tan; Tatiana Ivanova; Kiat Hon Lim; Chee Wee Ong; Niantao Deng; Julian Lee; Sze Huey Tan; Jeanie Wu; Ming Hui Lee; Chia Huey Ooi; Sun Young Rha; Wai Keong Wong; Alex Boussioutas; Khay Guan Yeoh; Jimmy So; Wei Peng Yong; Akira Tsuburaya; Heike Grabsch; Han Chong Toh; Steven Rozen; Jae Ho Cheong; Sung Hoon Noh; Wei Kiat Wan; Jaffer A Ajani; Ju-Seog Lee; Manuel Salto Tellez; Patrick Tan
Journal:  Gastroenterology       Date:  2011-04-28       Impact factor: 22.682

Review 5.  Treatment of gastric cancer.

Authors:  Michele Orditura; Gennaro Galizia; Vincenzo Sforza; Valentina Gambardella; Alessio Fabozzi; Maria Maddalena Laterza; Francesca Andreozzi; Jole Ventriglia; Beatrice Savastano; Andrea Mabilia; Eva Lieto; Fortunato Ciardiello; Ferdinando De Vita
Journal:  World J Gastroenterol       Date:  2014-02-21       Impact factor: 5.742

Review 6.  LncRNAs: emerging biomarkers in gastric cancer.

Authors:  Yu Gu; Tianxiang Chen; Guangliang Li; Xiongfei Yu; Yimin Lu; Haohao Wang; Lisong Teng
Journal:  Future Oncol       Date:  2015-08-20       Impact factor: 3.404

7.  A combined comparative genomic hybridization and expression microarray analysis of gastric cancer reveals novel molecular subtypes.

Authors:  Su Ting Tay; Siew Hong Leong; Kun Yu; Amit Aggarwal; Soo Yong Tan; Chee How Lee; Keith Wong; Jaya Visvanathan; Dennis Lim; Wai Keong Wong; Khee Chee Soo; Oi Lian Kon; Patrick Tan
Journal:  Cancer Res       Date:  2003-06-15       Impact factor: 12.701

Review 8.  Evolutionary conservation of long non-coding RNAs; sequence, structure, function.

Authors:  Per Johnsson; Leonard Lipovich; Dan Grandér; Kevin V Morris
Journal:  Biochim Biophys Acta       Date:  2013-10-27

9.  Characterization of long noncoding RNA and messenger RNA signatures in melanoma tumorigenesis and metastasis.

Authors:  Siqi Wang; Wenliang Fan; Bing Wan; Mengqi Tu; Feng Jin; Fang Liu; Haibo Xu; Ping Han
Journal:  PLoS One       Date:  2017-02-22       Impact factor: 3.240

10.  Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients.

Authors:  Yun Hak Kim; Dae Cheon Jeong; Kyoungjune Pak; Tae Sik Goh; Chi-Seung Lee; Myoung-Eun Han; Ji-Young Kim; Liu Liangwen; Chi Dae Kim; Jeon Yeob Jang; Wonjae Cha; Sae-Ock Oh
Journal:  Oncotarget       Date:  2017-08-24
View more
  1 in total

1.  Ox-LDL-mediated ILF3 overexpression in gastric cancer progression by activating the PI3K/AKT/mTOR signaling pathway.

Authors:  Danping Sun; Mingxiang Zhang; Meng Wei; Zhaoyang Wang; Wen Qiao; Peng Liu; Xin Zhong; Yize Liang; Yuanyuan Chen; Yadi Huang; Wenbin Yu
Journal:  Aging (Albany NY)       Date:  2022-05-04       Impact factor: 5.955

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.