Literature DB >> 35282133

Development of a prognostic nomogram based on an eight-gene signature for esophageal squamous cell carcinoma by weighted gene co-expression network analysis (WGCNA).

Jiahong Xie¹, Pingshan Yang², Hongjian Wei¹, Peiwen Mai³, Xiaoli Yu¹.

Abstract

Background: Esophageal squamous cell carcinoma (ESCC) is a highly aggressive malignant tumor. This study aims to develop a robust prognostic model for ESCC.
Methods: Expression profiles of ESCC were downloaded from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. Co-expressed modules were constructed by weighted gene co-expression network analysis (WGCNA). Differentially expressed genes (DEGs) between ESCC and normal samples were identified with the screening criteria of adjusted P value <0.05 and log |fold change (FC)| >1. After univariate and multivariate Cox regression analysis, an 8-gene module was constructed. A receiver operating characteristic (ROC) curve for overall survival (OS) was used to assess the prediction efficacy of the risk score. A nomogram was developed based on the risk score, age, gender, and stage for 1-, 2- and 3-year survival. The potential biological functions and pathways of the 8 genes were predicted using the Metascape database.
Results: The 2 ESCC-related co-expression modules were built via WGCNA. Among all DEGs, 55 survival-related genes were identified for ESCC. Based on these genes, an 8-gene module was constructed, composed of CFAP53, FCGR2A, FCGR3A, GNGT1, IGF2, LINC01524, MAGEA3, and MAGEA6. The area under the curve (AUC) was 0.961, suggesting that the risk score could effectively predict the OS of patients with ESCC. Furthermore, the nomogram exhibited high accuracy in predicting the survival rate of ESCC patients at 1, 2, and 3 years. These genes were mainly involved in ESCC-related pathways such as extracellular matrix organization, collagen formation, and blood vessel development. Conclusions: Our nomogram based on the 8-gene risk score could be a reliable prognostic tool for ESCC. 2022 Annals of Translational Medicine. All rights reserved.

Entities: Chemical

Keywords: Esophageal squamous cell carcinoma (ESCC); nomogram; prognosis; risk score; weighted gene co-expression network analysis (WGCNA)

Year: 2022 PMID： 35282133 PMCID： PMC8848369 DOI： 10.21037/atm-21-6935

Source DB: PubMed Journal: Ann Transl Med ISSN： 2305-5839

Introduction

Esophageal cancer is the eighth most common cancer globally (1). Due to recurrence and metastasis, its 5-year survival rate is <20% (2). Esophageal squamous cell carcinoma (ESCC) is the main malignant subtype of esophageal cancer, accounting for over 90% of esophageal cancer cases (3). Despite advances in diagnosis and treatment techniques for ESCC, the 5-year survival rate is still very low (4). Current treatment methods include chemotherapy, radiation therapy, and surgery. There is still a lack of approved targeted therapy drugs for ESCC (5). The tumor node metastasis (TNM) staging system remains the gold standard for ESCC prognosis. Due to the heterogeneity of ESCC, the prognosis of patients in the same clinical stage varies (6). That is to say, relying on the TNM staging system to predict the prognosis of ESCC is often not accurate enough. Therefore, predictors that can accurately assess ESCC prognosis will be of great value for the individualized management of ESCC1. With the development of high-throughput technologies such as microarray and RNA-seq, gene expression profiling has become a powerful tool for identifying prognostic biomarkers of ESCC (7-9). Furthermore, various differentially expressed genes (DEGs) and signaling pathways involved in the progression of ESCC have been identified (7-9). Nevertheless, the application of relevant research to clinical practice guidance is still very few. In this study, to obtain reliable results, we first used 2 independent datasets to build the 2 ESCC-related co-expression modules via weighted gene co-expression network analysis (WGCNA). By combining DEGs and genes in the ESCC-related modules, an 8-gene module was developed. Due to the heterogeneity and complexity of ESCC, multi-parameter markers are more accurate than a single marker for ESCC prognosis (10). Therefore, this study established a prognostic nomogram based on the 8-gene module and other factors. Furthermore, we explored the underlying mechanisms of the 8 genes during ESCC progression. Our findings may provide novel clues for the development of a promising prognostic tool for ESCC. We present the following article in accordance with the TRIPOD reporting checklist (available at https://atm.amegroups.com/article/view/10.21037/atm-21-6935/rc).

Methods

Datasets

ESCC microarray and RNA-seq expression profiles were retrieved from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) database (accession: GSE23400 and GSE130078) and The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). The GSE23400 dataset contained 53 ESCC samples and 53 matched normal samples (11). The GSE130078 dataset contained 23 ESCC samples and 23 corresponding normal samples (12). The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

WGCNA

The GSE23400 and GSE130078 datasets were used for WGCNA, which was performed by the WGCNA package in R (13). To ensure a scale-free co-expression network, the soft threshold value β (the range was 0–30) was determined by the pickSoftThreshold function. The correlation coefficient matrix between genes (called an adjacency matrix) was constructed. Genes with similar expression patterns were assigned into a module. The dynamic cutting tree method was utilized to assign gene modules. Using topological overlap matrix (TOM), co-expression modules were constructed. The minimum number of genes in each gene module was set to 30. The correlation between gene significance (GS) and module significance (MS) was assessed.

Differential expression analysis

Differential expression analysis between ESCC and normal samples was performed using the GEO2R and DESq2 packages in R in the GSE23400 and GSE130078 datasets (14). The threshold of DEGs was set as adjusted P value <0.05 and log |fold change (FC)| >1. P values were corrected by Bonferroni’s method.

Functional enrichment analysis

Functional enrichment analysis was carried out via the Metascape online database (15). Metascape integrates multiple authoritative data resources such as Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), UniProt, and DrugBank. It not only completes pathway enrichment and biological process annotation, but also performs gene-related protein network analysis and related information. Based on the integration of the above-mentioned database information, the rich biological pathways and protein complexes contained in the data are explained. The adjusted P value <0.05 was set as a significant result.

Construction of a prognostic risk score

Gene expression data and clinical information were obtained from TCGA database. Univariate Cox regression analysis was utilized to screen survival-related genes (P<0.05). The results were visualized by the forest package in R. Using multivariate Cox regression analysis, a prognostic model was built and the risk score was calculated according to the expression level of each gene and its regression coefficient. The Akaike information criterion (AIC) was calculated to assess the model. All ESCC samples from TCGA database were divided into high- and low-risk score groups. Kaplan-Meier survival analysis was then performed using the survival package in R. The prediction efficacy of the model was assessed by construction of a time-dependent receiver operating characteristic (ROC) curve utilizing the survivalROC package in R.

Nomogram

Based on the Cox proportional hazards regression model, a nomogram was constructed by integrating gender, age, stage, and the risk score through the rms package in R. The Bootstrap self-sampling method was utilized to verify the prediction effect of the model, which was assessed by the C-index.

mRNA-lncRNA co-expression network

The correlation between mRNAs and lncRNAs was analyzed based on the disease-related co-expression gene modules. Then, the mRNA-lncRNA co-expression network was visualized using Cytoscape software (version 3.7.2) (16). Functional enrichment analysis of the co-expressed mRNAs was achieved using the Metascape database.

Statistical analysis

All analyses were performed with R version 4.0.2 (https://www.r-project.org/) and the corresponding packages. OS was assessed with the Kaplan-Meier method and log-rank test for variance analysis. P value less than 0.05 was considered statistically significant.

Results

Identification of ESCC-related co-expression modules

In this study, 2 datasets were utilized for WGCNA. In the GSE23400 dataset, to ensure the network was scale-free, the optimal soft threshold β was determined as 6 (). Highly similar genes were assigned to a module. Finally, a total of 13 modules were determined by dynamic cutting tree (). A total of 400 genes were randomly selected for the heatmap. As shown in , 1 module was independent from the others. Among the 13 modules, the brown module was significantly correlated with ESCC (r=0.74 and P=4e−10), which was considered as a disease-related module (). In , the genes in the brown module were highly related to ESCC (r=0.82 and P<1e−200). Furthermore, we performed WGCNA in the GSE130078 dataset. The optimal soft threshold β was set to 20 (). Following module assignment by dynamic cutting tree, 13 modules were constructed (). In , the heatmap depicted that 1 module was independent from the others based on the 400 randomly selected genes. The yellow module had the highest correlation with ESCC (; r=0.9 and P=3e−17). In the module, the genes had a highly positive relationship with ESCC (; r=0.87 and P<1e−200).

Figure 1

Figure 2

Construction of a co-expression network for esophageal squamous cell carcinoma (ESCC) in the GSE130078 dataset. (A) Determination of soft threshold β. (B) Gene dendrogram through average linkage hierarchical clustering. (C) Heatmap of topological overlap in a gene network. (D) A module-trait relationship network. (E) Scatter plot of the correlation between module membership and gene significance in the yellow module.

Construction of a co-expression network for esophageal squamous cell carcinoma (ESCC) in the GSE23400 dataset. (A) Determination of soft threshold β. (B) Gene dendrogram through average linkage hierarchical clustering. Different colors below the tree diagram indicate the assigned modules determined by dynamic tree cutting. The gray module contains genes that cannot be assigned to any module. (C) Heatmap of topological overlap in a gene network. Each row and column correspond to a gene. The depth of the color is proportional to the degree of topological overlap. The lower and right sides of the tree diagram express the modules marked in different colors. (D) A module-trait relationship network. Red expresses positive correlation and blue expresses negative correlation. In the box, the first line is the correlation coefficient, and the second line is the P value. (E) Scatter plot of the correlation between module membership and gene significance in the brown module. Construction of a co-expression network for esophageal squamous cell carcinoma (ESCC) in the GSE130078 dataset. (A) Determination of soft threshold β. (B) Gene dendrogram through average linkage hierarchical clustering. (C) Heatmap of topological overlap in a gene network. (D) A module-trait relationship network. (E) Scatter plot of the correlation between module membership and gene significance in the yellow module.

DEGs in the ESCC-related co-expression modules

The genes in the ESCC-related “brown” module obtained from the GSE23400 dataset were intersected with the genes in the ESCC-related “yellow” module from the GSE130078 dataset. These overlapped genes were considered as ESCC-related genes. With the threshold of adjusted P value <0.05 and log |FC| >1, 222 DEGs were screened between ESCC and normal samples in the GSE23400 dataset (table available at https://cdn.amegroups.cn/static/public/atm-21-6935-1.xlsx). Furthermore, 5,661 DEGs were identified for ESCC in the GSE130078 dataset (table available at https://cdn.amegroups.cn/static/public/atm-21-6935-2.xlsx). Then, these ESCC-related genes were overlapped with DEGs from the 2 datasets. Finally, 3 ESCC-related DEGs (DUXAP10, WDR72, and FST) were identified, which could be critical genes for ESCC (). To probe the underlying biological functions and pathways of the genes in the 2 ESCC-related co-expression modules, functional enrichment analysis was carried out using the Metascape database. In , genes in the ESCC-related co-expression module from the GSE23400 dataset were mainly involved in mitochondrial gene expression, non-coding RNA (ncRNA) metabolic process, and chromosome segregation. Genes in the module from the GSE130078 dataset were mainly involved in extracellular matrix organization, collagen formation, NABA matrisome associated, skeletal system development, PID integrin 1 pathway, blood vessel development, collagen fibril organization, and regulation of cell adhesion ().

Figure 3

Differentially expressed genes (DEGs) in the esophageal squamous cell carcinoma (ESCC)-related co-expression modules. (A) Venn diagram depicting overlapping genes between DEGs and genes in the 2 ESCC-related co-expression modules in the GSE23400 and GSE130078 datasets. (B,C) Enrichment bar graph and network diagrams of genes in the “brown” module obtained from the GSE23400 dataset. (D,E) Enrichment bar graph and network diagrams of genes in the “yellow” module from the GSE130078 dataset.

A nomogram based on an 8-gene prognostic model for ESCC

After univariate Cox regression analysis, 55 survival-related genes were identified for ESCC through TCGA database (). MAGEA6, MAGEA3, LINC01524, CFAP53, IGF2, GNGT1, FCGR3A, and FCGR2A were used to construct a prognostic model for ESCC following multivariate Cox regression analysis (). The risk score was calculated based on coefficients and their expression levels. Among them, in , MAGEA6 [hazard ratio (HR): 0.270, 95% confidence interval (CI): 0.087–0.850, P=0.026], CFAP53 (HR: 0.080, 95% CI: 0.014–0.460, P=0.004), GNGT1 (HR: 0.340, 95% CI: 0.150–0.920, P=0.009), and FCGR3A (HR: 0.370, 95% CI: 0.150–0.920, P=0.033) were protective factors for ESCC. Also, LINC01524 (HR: 2.7e+06, 95% CI: 518.649–0.850, P<0.01), IGF2 (HR: 2.000, 95% CI: 1.205–3.400, P=0.008), and FCGR2A (HR: 3.400, 95% CI: 1.287–9.200, P=0.014) were risk factors for ESCC. All ESCC patients were divided into high and low risk groups in line with the median value of the risk score. Kaplan-Meier survival analysis results demonstrated that patients with a high-risk score usually had a poorer overall survival (OS) time than those with a low-risk score (; P=1.78e−05). An ROC curve was generated to validate the prediction performance for the prognosis of ESCC. The area under the curve (AUC) was 0.961, suggesting that the risk score was highly sensitive and accurate for prognostic prediction (). Furthermore, 4 prognostic factors (gender, age, stage, and risk score) were used to establish a nomogram for OS prediction. As shown in , the predictive ability of the nomogram was accurate for the OS of ESCC patients.

Table 1

The 55 survival-related genes for ESCC

Gene	HR	z	P value
MAGEA6	0.632622	−3.06737	0.00216
MAGEA3	0.64959	−3.04259	0.002346
LINC02154	1.57539	3.030769	0.002439
AMIGO2	2.050024	3.015008	0.00257
LUCAT1	3.411003	2.91137	0.003598
TREML2	13.93435	2.894222	0.003801
LINC02081	2.495196	2.836985	0.004554
LINC01524	9571.259	2.829145	0.004667
CFAP53	0.286082	−2.78916	0.005284
IGF2	1.681858	2.771445	0.005581
GNGT1	0.425633	−2.77041	0.005599
HAS2-AS1	2.432934	2.74082	0.006129
SLC44A5	0.511921	−2.70496	0.006831
IFITM3	1.839623	2.681378	0.007332
FCGR3A	1.722055	2.644744	0.008175
FCGR2A	1.802957	2.623342	0.008707
FCER1G	1.80513	2.608749	0.009087
IFITM1	1.556451	2.582223	0.009817
MSC	1.548024	2.531968	0.011342
LINC00898	0.018806	−2.52327	0.011627
KIAA1324L	0.427248	−2.44412	0.014521
RPL29P19	1.60387	2.443574	0.014543
SLC2A3	1.885667	2.435088	0.014888
GAS1	1.462505	2.402565	0.016281
C3AR1	2.049546	2.379985	0.017313
SPP1	1.313026	2.365654	0.017998
SERPINH1	2.181163	2.354909	0.018527
DENND2D	0.332112	−2.33765	0.019405
CTSL	1.718827	2.319918	0.020345
LY96	1.904947	2.294088	0.021785
APBA2	1.929652	2.292849	0.021857
C1R	1.625295	2.28929	0.022063
IFITM2	1.776594	2.277153	0.022777
HOXC8	2.075444	2.274953	0.022909
POPDC3	1.532657	2.259542	0.02385
MAGEA11	0.595434	−2.25558	0.024097
APLN	1.751567	2.236769	0.025301
STC2	1.705247	2.185285	0.028868
MIR4435-2HG	2.654231	2.166308	0.030288
HAS2	1.608846	2.165967	0.030314
MNDA	1.79297	2.119068	0.034085
PARVB	2.112533	2.115255	0.034408
G0S2	1.502597	2.112957	0.034604
CSF3	0.579595	−2.08523	0.037048
TWIST2	1.729333	2.084705	0.037096
TNFRSF11B	0.301497	−2.08378	0.03718
FAM225A	538.3035	2.071588	0.038304
HOOK1	0.565932	−2.07044	0.038411
TIMP1	1.623557	2.046971	0.040661
HSPD1P6	141.4708	2.03818	0.041532
FCGR1A	2.772473	2.030437	0.042312
HK3	2.14885	1.9925	0.046316
ACAN	4.27515	1.978141	0.047913
OSM	2.206695	1.978062	0.047922
PDLIM7	1.62091	1.969755	0.048866

P values less than 0.01 were considered significant. ESCC, esophageal squamous cell carcinoma; HR, hazard ratio; z, the value of the hypothesis test statistic for the regression coefficients.

Table 2

An 8-gene model for ESCC based on univariate and multivariate Cox regression analysis

Gene	Exp (coef)
MAGEA6	0.271936
MAGEA3	2.102895
LINC01524	2737999
CFAP53	0.080066
IGF2	2.014585
GNGT1	0.338051
FCGR3A	0.372708
FCGR2A	3.446703

ESCC, esophageal squamous cell carcinoma; Exp (coef), weighting factor for gene expression.

Figure 4

A nomogram based on an 8-gene prognostic model for esophageal squamous cell carcinoma (ESCC). (A) A forest diagram depicting the correlation between the 8 genes and the overall survival of ESCC patients. (B) Kaplan-Meier survival analysis of the risk score for ESCC patients. (C) Construction of a receiver operating characteristic (ROC) curve for validation of the prediction performance of the risk score for the prognosis of ESCC patients. (D) A nomogram used to predict the overall survival of ESCC patients. *, P<0.05; **, P<0.01; ***, P<0.001. AIC, Akaike information criterion; AUC, area under the curve.

P values less than 0.01 were considered significant. ESCC, esophageal squamous cell carcinoma; HR, hazard ratio; z, the value of the hypothesis test statistic for the regression coefficients. ESCC, esophageal squamous cell carcinoma; Exp (coef), weighting factor for gene expression. A nomogram based on an 8-gene prognostic model for esophageal squamous cell carcinoma (ESCC). (A) A forest diagram depicting the correlation between the 8 genes and the overall survival of ESCC patients. (B) Kaplan-Meier survival analysis of the risk score for ESCC patients. (C) Construction of a receiver operating characteristic (ROC) curve for validation of the prediction performance of the risk score for the prognosis of ESCC patients. (D) A nomogram used to predict the overall survival of ESCC patients. *, P<0.05; **, P<0.01; ***, P<0.001. AIC, Akaike information criterion; AUC, area under the curve.

Identification of 8 prognostic factors for ESCC

We further performed Kaplan-Meier survival analysis for CFAP53 (), FCGR2A (), FCGR3A (), GNGT1 (), IGF2 (), LINC01524 (), MAGEA3 (), and MAGEA6 (). The results showed that ESCC patients with low CFAP53 (P=1.04e−02), GNGT1 (P=2.059e−02), MAGEA3 (P=1.144e−02), and MAGEA6 (P=3.648e−02) expression had a shorter OS time than those with high expression. Also, high FCGR2A (P=1.001e−01), FCGR3A (P=3.816e−02), IGF2 (P=1.211e−01), and LINC01524 (P=3.139e−02) expression indicated poorer OS compared to low expression.

Figure 5

Kaplan-Meier survival analysis of 8 genes for esophageal squamous cell carcinoma (ESCC). (A) CFAP53; (B) FCGR2A; (C) FCGR3A; (D) GNGT1; (E) IGF2; (F) LINC01524; (G) MAGEA3; (H) MAGEA6.

Construction of an mRNA-lncRNA co-expression network for ESCC

Based on the 8 prognostic signatures, a mRNA-lncRNA co-expression network was constructed for ESCC (). Co-expressed RNAs of the 8 prognostic RNAs were enriched in various biological processes and signaling pathways such as extracellular matrix organization, collagen formation, NABA matrisome associated, skeletal system development, and blood vessel development (). Pathway enrichment network diagram results revealed that the functional network of these RNAs was complex and diverse ().

Figure 6

Construction of an mRNA-lncRNA co-expression network for esophageal squamous cell carcinoma (ESCC). (A) The mRNA-lncRNA co-expression network. Triangles indicate 8 prognostic-related RNAs, dots indicate RNAs in the yellow module, and square dots indicate RNAs in the brown module. Yellow-green represents mRNA, and light purple represents lncRNA. (B) Pathway enrichment bar chart. (C) Pathway enrichment network diagram.

Discussion

As the main histological subtype of esophageal cancer, ESCC is a highly aggressive malignant tumor. A variety of environmental factors contribute to ESCC, such as smoking, drinking, and chemical exposure. Genomic studies have confirmed that changes in gene expression in ESCC mediate the biological behavior of tumor cells (17). Despite in-depth studies on its molecular mechanisms, the clinical outcomes of ESCC patients are still unsatisfactory. Thus, in this study, we constructed a robust prognostic nomogram based on the 8-gene signature, age, gender, and stage. This model exhibited good performance for prognostic prediction of ESCC. Hence, our study may provide novel clues for the early detection and treatment of ESCC. WGCNA has been widely applied to explore ESCC-related modules. For instance, TPX2, CDK1, and CEP55 hub genes related to relapse-free survival have been identified in ESCC by WGCNA (18). In this study, we constructed 2 ESCC-related co-expression modules from 2 GEO datasets. Functional enrichment analysis results demonstrated that genes in the 2 co-expression modules were significantly involved in ESCC-related pathways such as mitochondrial gene expression (17), ncRNA metabolic process (19), and chromosome segregation (20), which confirmed the clinical significance of the 2 modules for ESCC. Based on univariate and multivariate Cox regression analyses, an 8-gene model was built for ESCC. TNM staging is the main tool used to guide therapeutic strategies for ESCC as a prognostic indicator. However, due to heterogeneity at the molecular level, the clinical outcome of patients is different. Our findings proposed that the 8-gene signature could accurately predict the prognosis of ESCC patients, the risk scores have the ability to discriminate high-risk patients, who have worse survival than low-risk patients. ROC confirmed its good performance for the prognostic prediction of ESCC. In a previous study, an immune-related nomogram was shown to provide more accurate prognostic prediction for patients with operable ESCC, as a supplement to TNM staging (21). In this study, by integrating the risk score and other factors, the nomogram could more accurately predict the OS of patients with ESCC. Our survival analysis revealed that ESCC patients with low CFAP53, GNGT1, MAGEA3, and MAGEA6 expression had a shorter OS time than those with high expression. Moreover, high FCGR2A, FCGR3A, IGF2, and LINC01524 expression indicated poorer OS than low expression. Thus, these 8 genes are considered as potential prognostic markers for ESCC. In previous study, CFAP53 has been detected in the bronchial epithelium (22), and is highly expressed in the sputum of asthmatics. FCGR2A gene polymorphism is related to the prognosis and treatment response of a variety of cancer types. For example, the FCGR3A-158 gene polymorphism may predict the efficacy of trastuzumab for early ERBB2/HER2-positive breast cancer patients (23). What this study suggests to us is that 8 gene signatures in the prognostic models may be targets for therapy. The FCGR2A rs1801274 variant is associated with a high risk of gastric cancer in the Chinese population (24). MiR-139-3p is a candidate serum biomarker for predicting the prognosis of ESCC. Previous study showed that FCGR2A could be mediated by miR-139-3p at the post-transcriptional level (25). GNGT1 can predict the response to platinum-based chemotherapy drugs (26). IGF2 could maintain the stem cell characteristics of ESCC cells (27), and the prognostic potential of IGF2 in ESCC has been confirmed (28). IGF2 may promote ESCC cell migration and invasion (29). High expression of IGF2 can enhance the chemoresistance of ESCC (30). In comparison to mRNAs, lncRNAs possess higher tissue specificity, which is easier to detect (31). Thus, lncRNAs are also a key marker for ESCC diagnosis and prognosis. Only one study has demonstrated that LINC01524 is up-regulated in Helicobacter pylori-positive gastric cancer tissues compared to Helicobacter pylori-negative tissues (32). MAGEA3 is an independent prognostic factor for ESCC patients (33). Its expression is induced by decitabine, thereby enhancing the recognition of ESCC by T cells (34). The roles of the 8 genes in ESCC require in-depth exploration. An mRNA-lncRNA co-expression network was built based on the 8 genes for ESCC. These co-expressed genes are involved in a variety of biological functions. For example, the extracellular matrix participates in the adhesion and metastasis of ESCC cells (35), which could be mediated by these co-expressed genes. Collagen is a component of the extracellular matrix, and is closely related to tumor growth as well as epithelial-mesenchymal transition (36). Blood vessel development as a key prognostic factor was distinctly enriched by these genes (37). Combining previous research, the 8 genes may participate in the progression of ESCC through complex interactions. However, there are still some limitations in this study. Firstly, the 8-gene signature should be verified in an independent dataset. Secondly, more clinical features should be integrated into our nomogram model. Thirdly, the specific functional mechanism of the 8-gene signature and 3 ESCC-related DEGs (DUXAP10, WDR72, and FST) () in ESCC needs further study. Fourth, the relationship between risk levels and disease treatment response remains to be explored in treatment-group samples. In this study, in the GSE23400 and GSE130078 datasets, WGCNA was carried out, and the co-expression gene modules related to ESCC were determined. Then, the genes in these modules were analyzed by Metascape, revealing that these genes might play important roles in ESCC. Combining the genes in these modules and DEGs, we identified 8 survival-related genes in TCGA database. The Cox regression model composed of these 8 genes demonstrated good performance in predicting prognosis. At the same time, the mRNA-lncRNA co-expression network was analyzed, indicating that these 8 genes exhibited complex interaction relationships. In summary, the 8 genes found by the analysis of multiple datasets can be used as ESCC biomarkers to provide certain theoretical support for ESCC research.

Conclusions

Taken together, WGCNA identified ESCC-related co-expression modules. A robust 8-gene signature could accurately predict the prognosis of ESCC patients. Furthermore, a prognostic nomogram based on risk score, age, gender, and stage was constructed for ESCC, which may be beneficial for early diagnosis and treatment. In future studies, the 8 genes will be verified in more clinical trials. The article’s supplementary files as

37 in total

1. Missing value estimation methods for DNA methylation data.

Authors: Pietro Di Lena; Claudia Sala; Andrea Prodi; Christine Nardini
Journal: Bioinformatics Date: 2019-10-01 Impact factor: 6.937

2. MiR-141-3p is upregulated in esophageal squamous cell carcinoma and targets pleckstrin homology domain leucine-rich repeat protein phosphatase-2, a negative regulator of the PI3K/AKT pathway.

Authors: Osamu Ishibashi; Ichiro Akagi; Yota Ogawa; Takashi Inui
Journal: Biochem Biophys Res Commun Date: 2018-05-16 Impact factor: 3.575

3. E-selectin rs5361 and FCGR2A rs1801274 variants were associated with increased risk of gastric cancer in a Chinese population.

Authors: Hong-Zhen Xia; Wei-Dong Du; Qiang Wu; Gang Chen; Yuan Zhou; Xian-Fa Tang; Hua-Yang Tang; Yi Liu; Feng Yang; Jian Ruan; Song Xu; Xian-Bo Zuo; Xue-Jun Zhang
Journal: Mol Carcinog Date: 2011-07-20 Impact factor: 4.784

10. A nomogram-based immunoprofile predicts overall survival for previously untreated patients with esophageal squamous cell carcinoma after esophagectomy.

Authors: Jingjing Duan; Yongwei Xie; Lijuan Qu; Lingxiong Wang; Shunkai Zhou; Yu Wang; Zhongyi Fan; Shengsheng Yang; Shunchang Jiao
Journal: J Immunother Cancer Date: 2018-10-03 Impact factor: 13.751