Literature DB >> 32047546

Identification of key genes and pathways associated with esophageal squamous cell carcinoma development based on weighted gene correlation network analysis.

Mingrui Shao1,2, Wenya Li1,2, Shiyang Wang1,2, Zhenghua Liu1,2.   

Abstract

Background: As one of the most aggressive malignancies, esophageal squamous cell carcinoma(ESCC) remains one of the leading causes of cancer related death worldwide. The majority of ESCCs are diagnosed at advanced stages with poor five-year survival rate, making it urgent to identify specific genes and pathways associated with its initiation and prognosis. Materials and
Methods: The differentially expressed genes in TCGA were analysed to construct a co-expression network by WGCNA. Gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis were performed for the selected genes. Module-clinical trait relationships were analyzed to explore the genes and pathways that associated with clinicopathological parameters of ESCC. Log-rank tests and COX regression were used to identify the prognosis-related genes.
Results: The brown module containing 716 genes which most significantly contributed to ESCC. GO analysis suggested enrichment of adaptive immune response, cyclin-dependent protein serine, regeneration and mRNA metabolic process. KEGG analysis indicated pathways including Cellular senescence, Ribosome biogenesis, Proteasome, Base excision repair and p53 signaling pathway. Clinical stage was associated with cyan module; clinical M was associated with grey60 module; clinical T was associated with darkturquoise module; while clinical N, histological type and cancer location were associated with turquoise module. Key genes of TCP1, COQ3, PTMA and MAPRE1 might be potential prognostic markers for ESCC. Discussion: Differentially expressed genes and key modules contributing to initiation and progression in ESCC were identified by WGCNA. These findings provide novel insights into the mechanisms underlying the initiation, prognosis and treatment of ESCC. © The author(s).

Entities:  

Keywords:  esophageal squamous cell carcinoma; prognosis.; risk; weighted gene correlation network analysis

Year:  2020        PMID: 32047546      PMCID: PMC6995384          DOI: 10.7150/jca.30699

Source DB:  PubMed          Journal:  J Cancer        ISSN: 1837-9664            Impact factor:   4.207


Introduction

As one of the most aggressive malignancies, esophageal cancer remains one of the leading causes of cancer related death worldwide 1, 2. Among the two main histological types of esophageal cancer including esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC), ESCC represents the dominant subtype in Asians and EAC incidence is higher in Western countries 3. These two subtypes have different pathogenesis, biology features and clinical outcomes 4. At present, the majority of ESCCs are diagnosed at advanced stages, the five-year survival rate of which still remains unfavourable with frequently metastasis and recurrence 5, 6. Therefore, comprehensive researches are urgently needed to fully clarify critical molecular mechanisms related to the initiation, progression and prognosis of ESCC. It has been widely accepted that alcohol drinking and tobacco consumption are certain risk factors for ESCC with synergistic effects 7, 8. There is also evidence suggesting that possible association of human papillomavirus (HPV) with ESCC risk according to meta-analysis 9. Studies have found that polymorphisms in aldehyde dehydrogenase 2 (ALDH2) gene also lead to the development of ESCC especially in Asian populations 10-12. Multiple studies including high-throughput analysis revealed that genes and pathways involved in ESCC were related with cell cycle, differentiation, and Epidermal Growth Factor Receptor signalling 13. In addition, epigenetic alterations including DNA methylation of such as APC, RB1 and CDKN2A, histone modification, and loss of genome imprinting also contribute to ESCC 14-16. Although a number of genes and mechanisms have been proved to be closely implicated in the development of ESCC, the comprehensive picture of the whole genes and regulations of ESCC is still unclear. In recent years, bioinformatic methods become increasingly effective in exploration and analysis of multiple genes or proteins of complicated diseases. Weighted gene co-expression network analysis (WGCNA), a new gene co-expression network-based method, has been successfully used to screen biomarkers and pathways that could be applied in susceptibility genes, diagnose and treatment of cancer 17. In this study, WGCNA was conducted to analyse data of TCGA data repository of ESCC to identify gene modules and biomarkers (hub genes) implicated in the pathogenesis, progression and prognosis of ESCC.

Materials and Methods

Acquiring and preparing genetic and clinical data

The RNA sequencing and clinical data of esophageal squamous cell carcinoma patients were downloaded from TCGA data repository (https://cancergenome.nih.gov/). The level of gene expression was measured as fragments per kilobase of transcript per million mapped reads (FPKM). Clinical data included the sample type, clinical TNM stage, histologic grade, cancer position and survival information. Samples with incomplete pathologic stage or histologic grade information were not included. As genes with little variation in expression usually represent noise, the most variant genes were filtered for network construction. Gene variabilities were measured by median absolute deviation (MAD).

Constructing gene co-expression network

Gene co-expression network was constructed by the WGCNA package in R and was visualized by Cytoscape software 18. Power values were screened out by WGCNA algorithm in the construction of co-expression modules. Scale independence and average connectivity analysis of modules with different power value were performed by gradient test (power value ranging from 1 to 20). Appropriate power value was determined when the scale independence value was equal to 0.9. WGCNA algorithm was then used to construct the co-expression network and extract the gene information in the most relevant module. The criterion of co-expression weight> 2 was used to select the candidate network. Heatmap tool package in R language was adopted to describe the strength of the relationship among modules (strong or weak degree).

Relating modules to cancer risk and identifying the prognosis related genes

One of the advantages of WGCNA is that the correlation between modules and clinical parameters can be analyzed. The module eigengene (ME), which can be regarded as a representative of the gene expression profiles from a module, is defined as the first principal component of a given module. Given that the ME can summarize the gene expression profiles, we calculated the correlation between MEs and external sample type data. The gene in the most relevant module was chosen as the risk-related gene. Log-rank tests and COX regression were used to select the prognosis-related genes in the risk-related genes. R package survival was used to carry out log-rank tests and survminer package was used to plot Kaplan-Meier survival curves. In order to validate our findings, microarray dataset of GSE53625 from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) in NCBI (The National Center for Biotechnology Information) containing 179 esophageal squamous cell carcinoma patients with prognosis information were used to confirm our results.

Gene ontology and pathway Enrichment analysis

To explore the potential biological themes and pathways of genes in risk-related module, the clusterprofiler package in R was used to annotate and visualize gene ontology (GO) terms 19 and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways 20.

Screening for candidate module with clinical significance and KEGG analysis

We calculated the correlation between MEs and external clinical data. p <0.05 suggested significant correlation. We selected the most relevant module in each clinical data for analysis. KEGG pathway analysis was used to enrich the related genes.

Results

Gene co-expression network of ESCC

Clinical and level-3 RNA sequencing data for 162 esophageal squamous cell carcinoma samples and 11 normal samples were obtained from TCGA database. For module detection, the 15121 most variant genes were selected according to MAD value for further analysis. When the value of soft thresholding power β was 4, the connectivity between genes met a scale-free network distribution (Fig. S1). Altogether 21 modules were identified by hierarchical clustering and the Dynamic branch Cutting. Each module was assigned a unique color as an identifier. Interaction relationship analysis of co-expression genes was shown in Figure 1. The number of genes in modules ranged from 34 to 2353. The grey module represented a gene set that was not assigned to any of the modules.
Figure 1

The connectivity analysis of critical genes in different module

The risk related module in the WGCNA algorithm

We then use the cancer type as the Clinical phenotype to select the most relevant module with the occurrence of ESCC. Finally, the brown module was selected (Fig. S2). The brown module contained 716 genes. Then we constructed the co-expression network with the criterion of co-expression weight >2. As was shown in Figure 2, a total of 161 genes were selected and their multiple interactions were visualized. The size represents the connection numbers and the color represents the fold change compared by the cancer and normal samples.
Figure 2

The protein-protein interaction network of the genes in the risk-related brown module of ESCC.

Enrichment analysis of the brown module

GO and KEGG enrichment analysis was performed on the genes in the constructed cancer-risk-related module. Altogether 984 terms showed difference in GO enrichment (Table 1). As was illustrated in Figure 3, this module in GO analysis was related with biological process of regulation of adaptive immune response, regulation of cyclin-dependent protein serine, regeneration and positive regulation of mRNA metabolic process. According to the KEGG analysis, 19 pathways were associated with brown module including Cellular senescence, Ribosome biogenesis in eukaryotes, Proteasome, Base excision repair and p53 signaling pathway.
Table 1

Top 5 enrichment analysis of the brown module

OntologyIDDescriptionGeneRatioadjusted PCount
BPGO:0002821positive regulation of adaptive immune response8/6640.0493728
BPGO:1903313positive regulation of mRNA metabolic process7/6640.04857
BPGO:0031099regeneration14/6640.04820814
BPGO:0006730one-carbon metabolic process5/6640.0476675
BPGO:0045737positive regulation of cyclin-dependent protein serine/threonine kinase activity5/6640.0476675
CCGO:0030687preribosome, large subunit precursor4/6840.0370334
CCGO:0000932P-body8/6840.0357638
CCGO:0000315organellar large ribosomal subunit6/6840.0352416
CCGO:0005762mitochondrial large ribosomal subunit6/6840.0352416
CCGO:0036464cytoplasmic ribonucleoprotein granule14/6840.03351714
MFGO:0016273arginine N-methyltransferase activity3/6630.0495953
MFGO:0016274protein-arginine N-methyltransferase activity3/6630.0495953
MFGO:0016889endodeoxyribonuclease activity, producing 3'-phosphomonoesters4/6630.0474194
MFGO:0018024histone-lysine N-methyltransferase activity6/6630.0458996
MFGO:0004540ribonuclease activity10/6630.04589910
KEGGhsa04115p53 signaling pathway11/3280.00235511
KEGGhsa05166Human T-cell leukemia virus 1 infection25/3280.00121325
KEGGhsa03008Ribosome biogenesis in eukaryotes15/3280.00047515
KEGGhsa04218Cellular senescence20/3280.00022120
KEGGhsa03050Proteasome10/3280.00020710
Figure 3

Gene Ontology analysis and KEGG pathway enrichment analysis for genes in the risk-related brown module of ESCC.

Prognosis analysis of the brown module

Since the brown module were selected from the modules related to patients' cancer type. It was of great significance to evaluate the potentiality of them to serve as prognostic biomarkers. Among the 716 genes, 39 genes were associated with the prognosis of esophageal squamous cell carcinoma (Table 2). As was summarized in Figure 4, the top four up-regulated genes which indicated a poor prognosis included TCP1 (HR=2.427, P=0.001), COQ3 (HR=2.247, P=0.003), AC016205.1 (HR=2.227, P=0.004) and MTHFD2 (HR=2.179, P=0.003). Of the top four down-regulated genes, PTMA (HR=1.706, P=0.039), MAPRE1 (HR=1.698, P=0.043) and BOLA3 (HR=1.667, P=0.048) significantly associated with poor prognosis while TTL predicted better survival (HR=0.572, P=0.035).
Table 2

Prognosis analysis of the brown module

GeneGene full nameLocationHRLower limitUpper limitP
TCP1t-complex 16q25.3-q262.4271.4394.0980.001
MTHFD2methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 22p13.12.1791.3023.6500.003
COQ3coenzyme Q3, methyltransferase6q16.22.2471.3183.8310.003
AC016205.1MIR924 Host Gene18q12.22.2271.2843.8610.004
VBP1VHL binding protein 1Xq282.0881.2303.5460.006
MED30Mediator Complex Subunit 308q24.112.1101.2083.6900.009
SNRPCsmall nuclear ribonucleoprotein polypeptide C6p21.311.9761.1783.3110.010
NONOnon-POU domain containing, octamer-bindingXq13.11.9721.1703.3330.011
SNRPBsmall nuclear ribonucleoprotein polypeptides B and B120p131.9611.1573.3110.012
NUDT21nudix hydrolase 2116q12.21.9921.1643.4130.012
ALYREFAly/REF Export Factor17q25.31.8871.1383.1250.014
CCT4chaperonin containing TCP1 subunit 42p151.9311.1333.2890.016
IRAK1interleukin 1 receptor associated kinase 1Xq281.8661.1103.1350.019
AHSA1activator of Hsp90 ATPase activity 114q241.8321.0953.0580.021
CDT1chromatin licensing and DNA replication factor 116q24.31.8351.0953.0670.021
CCT2chaperonin containing TCP1 subunit 212q151.7891.0802.9670.024
HNRNPDheterogeneous nuclear ribonucleoprotein D4q211.8081.0823.0210.024
SAAL1Serum Amyloid A Like 111p15.11.7831.0732.9590.026
SNRPEsmall nuclear ribonucleoprotein polypeptide E1q321.8051.0743.0400.026
SPDL1spindle apparatus coiled-coil protein 15q35.11.7891.0663.0030.028
YARS2tyrosyl-tRNA synthetase 212p11.211.7951.0633.0300.029
E2F7E2F Transcription Factor 712q21.21.8281.0553.1650.031
MRPL18mitochondrial ribosomal protein L186q25.31.7331.0482.8650.032
LSM5LSM5 homolog, U6 small nuclear RNA and mRNA degradation associated7p14.31.7451.0492.9070.032
TTLTubulin Tyrosine Ligase2q14.10.5720.3400.9620.035
PTGES3prostaglandin E synthase 312q13.31.7331.0402.8900.035
SRSF2serine and arginine rich splicing factor 217q25.11.7181.0402.8410.035
HSPD1heat shock protein family D (Hsp60) member 12q33.11.7181.0312.8650.038
SF3A3splicing factor 3a subunit 31p34.31.7121.0292.8490.039
PTMAprothymosin, alpha2q37.11.7061.0262.8330.039
CTPS1CTP synthase 11p34.11.6841.0222.7700.041
MAGOHBmago homolog B, exon junction complex core component12p13.21.7091.0202.8650.042
MAPRE1microtubule associated protein RP/EB family member 120q11.1-q11.231.6981.0162.8330.043
MCM8minichromosome maintenance 8 homologous recombination repair factor20p12.31.6921.0142.8250.044
TRIM37tripartite motif containing 3717q23.20.5960.3600.9890.045
HNRNPABheterogeneous nuclear ribonucleoprotein A/B5q35.31.6691.0102.7620.046
STRAPserine/threonine kinase receptor associated protein12p12.31.7061.0042.9070.048
BOLA3BolA Family Member 32p13.11.6671.0052.7620.048
LSM6LSM6 homolog, U6 small nuclear RNA and mRNA degradation associated4q31.221.6981.0052.8650.048
Figure 4

The correlation between the expression levels of key genes of risk-associated brown module and the survival of ESCC patients.

Altogether 179 esophageal squamous cell carcinoma patients with prognosis information of microarray GSE53625 were used to validate our results. As was shown in Table 4, AC016205.1 (P=0.013), PTMA (P=0.023), TTL (P=0.013) still showed significant relation with prognosis of ESCC patients, while MTHFD2 (P=0.082) indicated borderline significant association (Table 4).
Table 4

Verification for top 4 prognosis gene in GEO datasets

GeneGene full nameHR(95% CI)PGSE53625_P
TCP1t-complex 12.427(1.439-4.098)0.0010.196
MTHFD2methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 22.179(1.302-3.65)0.0030.082
COQ3coenzyme Q3, methyltransferase2.247(1.318-3.831)0.0030.130
AC016205.1MIR924 Host Gene2.227(1.284-3.861)0.0040.013
PTMAprothymosin, alpha1.706(1.026-2.833)0.0390.023
MAPRE1microtubule associated protein RP/EB family member 11.698(1.016-2.833)0.0430.519
BOLA3BolA Family Member 31.667(1.005-2.762)0.0480.781
TTLTubulin Tyrosine Ligase0.572(0.34-0.962)0.0350.013

Module-clinical trait relationships

Identifying genes associated with a certain clinical trait is of great value to explore the molecular mechanisms behind the trait. In the present study, the clinical parameters of ESCC patients, including clinical T, clinical N, clinical M, clinical stage, recrudescence stage, histological type and cancer location were involved in the module-trait relationship (MTR) analysis. As was suggested in Figure 5, clinical stage was associated with cyan module; clinical M was associated with grey60 module; clinical T was associated with darkturquoise module; while clinical N, histological type and cancer location were associated with turquoise module. No module correlated with recrudescence.
Figure 5

The module-clinical trait relationships of genes involved in ESCC.

KEGG analysis of clinical related module

We then performed KEGG pathway analysis of the candidate module. The results indicated that genes of cyan module were enriched in Cell cycle; grey60 module was associated with Ribosome; darkturquoise module was related with RNA transport; turquoise module was associated with Fat digestion and absorption, Tight junction, Maturity onset diabetes of the young, Protein digestion and absorption and Fructose and mannose metabolism (Table 3).
Table 3

KEGG analysis of clinical related module

PhenotypeModuleIDDescriptionGeneRatioAdjusted PCount
Clinical stagecyanhsa04110Cell cycle4/290.00126444
Clinical Mgrey60hsa03010Ribosome4/240.00132574
Clinical Tdarkturquoisehsa03013RNA transport6/450.00052296
Clinical NHistological typeCancer locationturquoisehsa04975Fat digestion and absorption14/6072.21E-0614
hsa04530Tight junction31/6071.499E-0531
hsa04950Maturity onset diabetes of the young10/6071.936E-0510
hsa04974Protein digestion and absorption20/6072.754E-0520
hsa00051Fructose and mannose metabolism11/6073.566E-0511

Discussion

Although significant progresses have been made in studies concerning the risk and development of ESCC, our understanding of the complex mechanisms of ESCC is still limited. In this study, we conducted a differential expression analysis followed by WGCNA to detect genes and pathways associated with the occurrence, clinical parameters and prognosis of ESCC. Finally, key modules were identified which contribute to the risk (brown module) and clinicopathological parameters (cyan module, grey60 module, darkturquoisemodule and turquoise module) of ESCC. In addition, enrichment analysis of the genes in core modules suggested significant involvement of pathways such as Cellular senescence, Ribosome biogenesis in eukaryotes, Proteasome, Base excision repair and p53 signalling pathway. In this present study, RNA sequencing data for 162 esophageal squamous cell carcinoma samples and 11 normal samples from TCGA were systematically analyzed. According to the WGCNA analysis of most variant genes, altogether 21 modules were identified and each module was assigned a unique color as an identifier. The number of genes in modules ranged from 34 to 2353. The brown module containing 716 genes was selected when we used the cancer type as the Clinical phenotype, indicating its close involvement in the development of ESCC. GO enrichment analysis were then performed on the genes in the constructed cancer-risk-related module, suggesting significance of biological process including adaptive immune, cyclin-dependent protein serine, regeneration and mRNA metabolic process. It has been reported that immune response and immune escape might play a critical role in ESCC progression and therapy 21, 22. In addition, key mRNA metabolic procedure such as alternative splicing has also been detected in ESCC. Alternative splicing isoforms of LOXL2, VIL2, OSMRβ and MUC1 have been reported to contribute to ESCC development and progression 23-26. From this point of view, the specific role of immune response and altered mRNA metabolic process in ESCC require future investigations to elucidate. As for key pathways of KEGG enrichment, aberrant base excision repair capacity and altered p53 signaling pathways have been found to be associated with ESCC development by a number of studies 27, 28. Pathways of cellular senescence, ribosome biogenesis in eukaryotes and proteasome might be novel research directions for physicians and surgeons for ESCC. The five-year survival rate of ESCC currently remains unsatisfactory despite recent improvements in treatments of surgical resection and adjuvant chemotherapy. It was therefore of great significance to evaluate the potentiality of key factors to serve as prognostic biomarkers. We eventually screened 39 prognosis-related genes in ESCC after analyzing the risk-associated brown module. Aberrant TCP1 gene status has been detected in hepatocellular carcinoma, breast cancer and colorectal cancer, but no study has been performed for its effect in ESCC 29-31. MTHFD2 RNA and protein are remarkably increased in many types of cancers and correlated with worse survival in breast cancer according to a comprehensive analysis of RNA profiles of 1,454 metabolic enzymes across 1,981 tumors spanning 19 cancer types 32. Although some of the prognosis-related genes we identified have been investigated in various types of cancers, limited information was known about their roles in ESCC until now. One limitation of this study is that most results were generated by analysis of publicly available data, further studies based on larger populations and molecular mechanism researches are therefore needed to confirm the results of our study. In order to explore the genes and modules that associated with clinicopathological parameters of ESCC, we also performed module-clinical trait relationships analysis. We finally detected specific modules which determine the clinical stage, TNM classification, histological type and cancer location. Pathway analysis of the selected modules indicated that these genes enriched in Cell cycle, Ribosome, RNA transport, Fat digestion and absorption, Tight junction, Maturity onset diabetes of the young, Protein digestion and absorption and Fructose and mannose metabolism. Remodeling of energetic metabolism is a hallmark of malignant tumor, by which tumor cells changes themselves and the environment into more suitable conditions for growth and invasion 33. According to our findings, aberrant metabolism of carbohydrate, fat and protein might be closely related to the progression of ESCC. In addition, alternations in functions of ribosome, RNA transport might also be promising research directions for ESCC. In summary, differentially expressed genes and key modules contributing to initiation and progression in ESCC were identified by means of WGCNA. Pathways including cellular senescence, ribosome biogenesis in eukaryotes, Proteasome, base excision repair, fat digestion and absorption and p53 signalling might be closely related with ESCC development. Key genes of TCP1, COQ3, PTMA and MAPRE1 might be potential prognostic markers for ESCC. These findings provide novel insights into the mechanisms underlying the etiology, prognosis and treatment of ESCC. Supplementary figures and tables. Click here for additional data file.

Supplementary Material

Acknowledgements

This study was supported by Natural Science Foundation of Liaoning Province (grant no. 2015020561), Wu JiePing Medical Found (grant no. 320.6750.18293) and the Fund for Scientific Research of The First Hospital of China Medical University (grant no. fsfh1514).
  5 in total

1.  Analysis of Differentially Expressed Genes in a Chinese Cohort of Esophageal Squamous Cell Carcinoma.

Authors:  Gang Liu; Yuan Zhao; Huili Chen; Jinru Jia; Xiaomin Cheng; Fengjie Wang; Qiang Ji; Rick F Thorne; Song Chen; Xiaoying Liu
Journal:  J Cancer       Date:  2020-04-06       Impact factor: 4.207

2.  A Prognostic Signature Based on Immunogenomic Profiling Offers Guidance for Esophageal Squamous Cell Cancer Treatment.

Authors:  Jianyao Gao; Ting Tang; Baohui Zhang; Guang Li
Journal:  Front Oncol       Date:  2021-02-24       Impact factor: 6.244

3.  Levels of Coenzyme Q10 and Several COQ Proteins in Human Astrocytoma Tissues Are Inversely Correlated with Malignancy.

Authors:  Hsiu-Chuan Yen; Bing-Shian Chen; Si-Ling Yang; Shin-Yu Wu; Chun-Wei Chang; Kuo-Chen Wei; Jee-Ching Hsu; Yung-Hsing Hsu; Tzung-Hai Yen; Chih-Lung Lin
Journal:  Biomolecules       Date:  2022-02-20

4.  Upregulation of Centromere Proteins as Potential Biomarkers for Esophageal Squamous Cell Carcinoma Diagnosis and Prognosis.

Authors:  Xiao Wang; Minshan Lai; Yue Wang; Ruihuan Chai; Nan Li; Ling Ou; Kai Zheng; Jieling Li; Guifeng Xu; Shaoqi Wang; Yun Dong; Shaoxiang Wang
Journal:  Biomed Res Int       Date:  2022-04-20       Impact factor: 3.246

5.  Identification and Validation of a Dysregulated miRNA-Associated mRNA Network in Temporal Lobe Epilepsy.

Authors:  Xing Li; Yunli Han; Dejun Li; Hai Yuan; Shiqin Huang; Xiaolan Chen; Yuanhan Qin
Journal:  Biomed Res Int       Date:  2021-10-22       Impact factor: 3.411

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.