Literature DB >> 22888304

Combination of meta-analysis and graph clustering to identify prognostic markers of ESCC.

Hongyun Gao1, Lishan Wang, Shitao Cui, Mingsong Wang.   

Abstract

Esophageal squamous cell carcinoma (ESCC) is one of the most malignant gastrointestinal cancers and occurs at a high frequency rate in China and other Asian countries. Recently, several molecular markers were identified for predicting ESCC. Notwithstanding, additional prognostic markers, with a clear understanding of their underlying roles, are still required. Through bioinformatics, a graph-clustering method by DPClus was used to detect co-expressed modules. The aim was to identify a set of discriminating genes that could be used for predicting ESCC through graph-clustering and GO-term analysis. The results showed that CXCL12, CYP2C9, TGM3, MAL, S100A9, EMP-1 and SPRR3 were highly associated with ESCC development. In our study, all their predicted roles were in line with previous reports, whereby the assumption that a combination of meta-analysis, graph-clustering and GO-term analysis is effective for both identifying differentially expressed genes, and reflecting on their functions in ESCC.

Entities:  

Keywords:  esophageal squamous cell carcinoma; graph clustering; meta-analysis

Year:  2012        PMID: 22888304      PMCID: PMC3389543          DOI: 10.1590/S1415-47572012000300021

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Introduction

Esophageal squamous cell carcinoma (ESCC) is one of the six most common cancers worldwide, especially in China and other Asian countries (Lehrbach ; Zhi ). Despite improvements in detection, surgical techniques and chemoradiotherapy, the five-year survival rate remains low (Kato ). Prediction is usually according to tumor, node and metastasis system (TNM) classification. However, TNM classification merely reflects the status of cancer progression at the time of diagnosis. In contrast, molecular biological analysis clarifies biological behavior during cancer progression. Thus, the combination of the two could be more accurate in reflecting the clinical outcome (Takeno ). Recently, it was discovered that certain genes could be associated with inadequate ESCC prognosis. For example, the high expression of CCR7 (Ding ), COX2 (Takatori ), Beclin-1 (Chen ), TLR9 (Takala ), and FOXC2 (Nishida ), could be significantly correlated with invasion, stage, tumor depth, lymph node metastasis, and poor survival. Overexpression of steroid receptor coactivatior-3(SRC-3) was more frequently observed in primary ESCC in the late T stages (T3/T4) than in the earlier T1/T2 (Xu ). Although over-expression of cysteine-rich 61 (Cyr61) was related to less overall survival of patients in stage I/II, there was no affect on the overall survival of patients in stage III/IV (Xie ). Low claudin-4 expression was found to be significantly associated with histological differentiation, invasion depth, and lymph node metastasis. Low claudin-4 expression revealed an unfavorable influence on disease-free and overall survival (Sung ). PRL-1 protein expression significantly correlated with the stage of ESCC, 79.4% of the cases (27/34) in stage III ESCC, and 33.3% of the cases (1/3) in stage 1 ESCC (Yuqiong ). However, additional molecular prognostic markers are essential as aids in developing more effective therapeutic strategies for better prognosis. In this study, the aim was to identify more differentially expressed genes in ESCC, and predict their underlying functions. Meta-analysis provides a powerful tool for analyzing microarray experiments by combining data from multiple studies, besides presenting unique computational challenges. The Bioconductor package RankProd provides a novel and intuitive tool for this, by detecting differentially expressed genes by means of the non-parametric rank product method (Hong ). The graphclustering approach was used for identifying gene expression profiles that distinguish ESCC from normal samples. Furthermore, the relevant pathways in the cluster were also analyzed by GO term analysis, to so explain potential mechanisms in response to ESCC.

Data and Methods

Meta-analysis for expression profile and differentially expressed gene (DEG) analysis

Two ESCC related expression profiles, GSE23400 and GSE20347, were obtained separately from a public functional genomics data repository GEO, based on the Affymetrix Human Genome U133A Array and Affymetrix Human Genome U133A 2.0 Array, respectively. In the GSE23400 dataset, 53 ESCC and 53 matched-normal samples were analyzed. Contributors chose not to include clinical phenotypes in their GEO submission (Su ). In the GSE20347 dataset, 17 ESCC and 17 matched-normal samples were approved by the Institutional Review Boards of the Shanxi Cancer Hospital, and the US National Cancer Institute (NCI). Cases diagnosed with ESCC between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, and considered candidates for curative surgical resection, were identified and recruited for participation in the study. None of the cases had undergone prior therapy. Shanxi was the ancestral home of all of them. After obtaining informed consent, cases were interviewed, as a means of obtaining information on demographics, cancer risk factors (e.g., smoking, alcohol consumption, and a detailed family history of cancer), and clinical information (Yoo ; Hu ).

Statistical analysis

DEGs for the GSE23400 and GSE20347 dataset were first independently identified with the limma method, whereupon the RankProd package was then applied to overcome heterogeneity. Only those with a percentage of false-positives (PFP) (Hong ) = 1% were considered differentially expressed between treatments and controls. The Spearman rank correlation (r) was used for assaying comparative target-gene correlations, to thus demonstrate the potential connection between DEGs. This coefficient (Cureton 1965), which is conceptually similar to the Pearson correlation, measures the strength of the associations between two variables. The significance level was set at r >0.9, which is a more stringent threshold than the empirical value (Fukushima ). Detection of this level was with DPClus. All statistical tests were carried out with the R program. A detailed workflow is shown in Figure 1.
Figure 1

Workflow of our study. The RankProd package was used for merging the GSE23400 and GSE20347 datasets, as was the Spearman Rank Correlation for constructing a co-expression network based on their expression profiles. The Graph-clustering approach was applied for identifying enriched clusters. Finally, the function annotation of each cluster was found through GO-Term enrichment analysis.

Co-expression Network analysis and graph-clustering

DPClus (Altaf-Ul-Amin ), a graph-clustering algorithm that can extract densely connected nodes as a cluster, was used to identify co-expressed groups. It is based on the density-and-periphery tracking of clusters. In this study, the overlapping-mode with DPClus settings was used. The parameter settings of cluster property cp; density values were set to 0.5 (Fukushima ).

GO Term enrichment analysis

The use of Gene Ontology (GO) terms by collaborating databases facilitates uniform queries. The controlled vocabularies are structured for queries at different levels, thereby also facilitating the assignment of properties to genes or gene products, also at different levels, depending on the depth of knowledge involved. DAVID (Huang da ) was used to identify which GO terms were significantly over-represented in the biological process. The terms with p-value <0.05 and count numbers >2 were considered as significant (Boyle ; Guo ).

Results

Differently expressed gene selection and correlation network construction

The publicly available microarray data sets, GSE23400 and GSE20347, were obtained from GEO. Differentially expressed genes (DEGs) with fold change >2 and p-value <0.05 were selected by microarray analysis. Following limma method analysis, 519 genes from GSE23400 and 1360 from GSE20347 were selected as DEGs. On applying RankProd packages for meta-analysis, 9 up-regulated genes and 1876 down-regulated ones, with a percentage of false-positives (PFP) 1% and fold change value >2, were considered differentially expressed. 1885 DEGs were finally collected after meta-analysis (Figure 2).
Figure 2

Differentially expressed genes (DEGs) in gse23400, gse20347 and meta-datasets. The Venn diagram shows significant genes in ESCC.

To obtain the relationships among DEGs, the co-expressed value r = 0.9 and corrected p-value = 0.01 were considered as threshold. Finally, a correlation network was constructed from 724 relationships among 202 DEGs (Figure 3).
Figure 3

Correlation network of ESCC. Yellow dots indicate DEGs and blue lines the correlation of two neighboring points, with r > 0.9.

Graph-clustering identifies modules significantly enriched in biochemical pathways

At r >0.9, DPClus (Altaf-Ul-Amin ) was used for identifying 6 clusters, ranging in size from 8 to 22 genes, in the correlation network of ESCC (Figure 3). Clusters 1, 4, 5 and 6, in particular, have mutual connections since they share the same genes. The more genes shared, the greater mutual connectivity (corresponding to thicker lines). Graph-clustering results are presented in Figure 4. The over-represented GO terms in the clusters were used to assess their significance. The results of graph-clustering by GO term enrichment analysis appear in Table 1.
Figure 4

Graph-clustering of correlated modules in ESCC (threshold r = 0.9). Red lines mean connections with other clusters containing the same genes, and green lines connections within the current cluster, without outside connections.

Table 1

List of enriched GO terms in clusters 1 to 6 detected by DPClus.

CategoryGO TermDescriptionP-valueFDR
Cluster1GO:0019317fucose catabolic process0.0103040.598071
GO:0042355L-fucose catabolic process0.0103040.598071
GO:0042354L-fucose metabolic process0.0123530.421275
GO:0006004fucose metabolic process0.0184770.421345
GO:0031424keratinization0.0436130.625078

Cluster2GO:0007010cytoskeleton organization2.06E-040.063509
GO:0030198extracellular matrix organization3.31E-040.051459
GO:0043062extracellular structure organization0.0012270.122371
GO:0030036actin cytoskeleton organization0.0031190.22052
GO:0030029actin filament-based process0.0037380.212548
GO:0048251elastic fiber assembly0.0053120.246622
GO:0007155cell adhesion0.0121640.427484
GO:0022610biological adhesion0.0122230.38762
GO:0006875cellular metal ion homeostasis0.0274390.626993
GO:0055065metal ion homeostasis0.0298130.61921
GO:0007044cell-substrate junction assembly0.0301840.588854
GO:0031032actomyosin structure organization0.036630.629181
GO:0030199collagen fibril organization0.0379150.612667
GO:0030003cellular cation homeostasis0.0440930.642105
GO:0006940regulation of smooth muscle contraction0.0494030.659545

Cluster3

Cluster4GO:0008544epidermis development1.91E-072.65E-05
GO:0007398ectoderm development2.82E-071.96E-05
GO:0030855epithelial cell differentiation3.14E-061.46E-04
GO:0009913epidermal cell differentiation2.31E-058.04E-04
GO:0060429epithelium development2.32E-056.46E-04
GO:0030216keratinocyte differentiation0.0012530.028635
GO:0022405hair cycle process0.0336380.493107
GO:0022404molting cycle process0.0336380.493107
GO:0001942hair follicle development0.0336380.493107
GO:0042633hair cycle0.0344270.455943
GO:0031424keratinization0.0344270.455943
GO:0042303molting cycle0.0344270.455943

Cluster5GO:0009611response to wounding0.001860.476806
GO:0030855epithelial cell differentiation0.0020680.302477
GO:0008544epidermis development0.0036940.349054
GO:0007398ectoderm development0.0043070.313061
GO:0060429epithelium development0.0055690.322047
GO:0006954inflammatory response0.0111560.478307
GO:0006928cell motion0.0229790.685173
GO:0050900leukocyte migration0.0291310.723626
GO:0030216keratinocyte differentiation0.0336630.733943
GO:0009913epidermal cell differentiation0.0366740.72754
GO:0006952defense response0.0372160.698761

Cluster6
Significant GO Terms (p-value <0.05, using the hypergeometric test) were listed, as to fucose catabolic process, cytoskeleton organization, epidermis development, response to injury, etc. (Table 1). Ectoderm development (GO:0007398), epidermis development (GO:0008544) and epidermal cell differentiation (GO:0009913) were commonly enriched in clusters 4 and 5. Keratinization (GO:0031424) was enriched in clusters 1 and 4.

Discussion

In this study, 1885 DEGs were first identified through meta-analysis. Among these, a correlation network was constructed with 202 DEGs producing 724 relationships. By applying graph-clustering, these 202 DEGs were then clustered into six clusters. Clusters 1, 4, 5 and 6 seemed to be mutually connected. Details of these connections were confirmed by GO-term enrichment analysis, whereby it was shown that the genes of cluster 1 may be involved in the fucose catabolic process, whereas those in clusters 4 and 5 were mainly associated with epidermis development and differentiation. Although there was no connection between cluster 2 and the others, it was also significantly effective in ESCC invasion and metastasis. The genes in this cluster were related to cytoskeleton and extracellular matrix organization. Therefore, the proposal is to speculate on certain genes in these clusters, identifiably involved in ESCC.

Cluster 2: CXCL12

CXCL12, also known as stromal cell-derived factor-1, was first discovered among the chemokines secreted by mouse bone-marrow stromal cells, with CXCR4 as its specific receptor. The two interact to form a coupled molecular pair, which plays a prominent role in regulating directional migration and proliferation in ESCC, where both are positively located within the membrane and cytoplasm (Wang ). Expression of the two is significantly correlated with lymph node metastasis, in the tumor stage, gender and lymphatic invasion. The overall and disease-free survival rate is significantly lower in patients with positive CXCL12 expression than in those with negative (Sasaki ). Furthermore, the CXCL12-CXCR4 combination possibly induces up-regulated expression of matrix metalloproteinases, whereby their further involvement in extracellular matrix modeling and the mediation of metastasis in ESCC, (Zhang ; Bartolomé ; Lu ). Apparently, MMP-7, MMP-9, and MT1-MMP are also closely associated with invasion depth and venous invasion in ESCC (Samantaray ). This predicted mechanism was in accordance with our GO-term analysis.

Cluster 3: CYP2C9

In ESCC, higher CYP2C9 expression levels occur in the early tumor stages (pT1/pT2), compared to more advanced local tumors (pT3/pT4). Its selective inhibition decreases tumor-cell proliferation and G0/G1 phase cell-cycle arrest (Schmelzle ). Although, even with GO-term analysis, the role of these genes in cluster 3 remained unknown, the inference is their possible involvement in the tumor-cell cycle, in accordance with their function.

Cluster 4: TGM3 and MAL genes

TGM3 is a member of a family of Ca2+-dependent enzymes, thought to be critically involved in the cross-linking of structural proteins and formation of the cell envelope (CE), thereby contributing to the rigid structures that play vital roles in shape determination and barrier functions. The down-regulation of TGM3 genes in human ESCC tissues may lead to incapacity to form CEs and sustain toxic material (Chen et al. 2000; Luo ). Although TGM3 expression is significantly inversely correlated with the histological grade of esophageal carcinoma, there is no obvious correlation with lymph node metastasis and depth of invasion. So, there is every indication that this gene may be an important adhesion molecule expressed by epithelial cells, and thus regarded as an invasion suppressor molecule in ESCC (Liu ). Although details of its role in ESCC were not clear, it is proposed that it may be involved in esophageal epithelial cell differentiation and keratinocyte differentiation, as previous described (Zhang ). The MAL gene, a T-cell differentiation antigen, was found to be down-regulated in 10 ESCC-patients bearing tumors in different stages of development (Kazemi-Noureini ). It has already been shown to express in four alternatively spliced forms of transcripts during the intermediate and late stages of T-cell differentiation, and in that of epithelial cells (Alonso and Weissman, 1987). Its ectopic expression in carcinoma TE3 cells could lead to the repressed formation of tumors in nude mice, the inhibition of cell motility, and the production of apoptosis by the Fas pathway (Mimori ), whereby the proposal that Mal may be a tumor suppressor gene in ESCC development.

Cluster 5: S100A9 gene

S100A9, a calcium-binding protein belonging to the S100 family, was detected as significantly down-regulated in ESCC (Ji ). Moreover, S100A9 staining is decreased in poorly and moderately differentiated ESCCs, when compared with those well-differentiated, hence the inferrence that loss of S100A9 expression in ESCC generally occurs along with worsening esophageal epithelial differentiation in histological grades (Kong ). These genes, together with the other 11 S100 genes and epidermal differentiation complex (EDC) genes, have been mapped to human chromosome 1q21, which is a region of structural and numerical aberration, involved in esophageal carcinogenesis and progression (Luo ). In brief, down-regulation of S100A9 is a common event in esophageal carcinogenesis and progression through affecting epithelial cell differentiation, proliferation and apoptosis, as well as the expression of genes encoding epidermal structural proteins in a calcium-dependent manner (Luo et al. 2003; Zhi ; Zhou ).

Common genes in clusters 4 and 5: EMP-1 and SPRR3

Clusters 4 and 5 were closely connected by several genes (Figure 4), such as EMP-1 and SPRR3. The expression of EMP-1, a member of the PMP22 family, is found to be down-regulated in esophageal cancers (Zinovyeva ). Over-expression of this gene in the EC9706 cell-line leads to inhibited cell proliferation, S-phase arrest, and G1-phase prolongation, thus indicating its participation in the cell-cycle. In addition, EMP-1 transfection could induce different forms of gene expression, such as integrin beta 7 (ITGB7), integrin beta 8 (ITGB8) and cadherin 5 (CDH5), involved in cell signaling, cell adhesion and cell-cell communication. Retinoic acid receptor beta (RAR-β) is another up-regulated gene induced by EMP-1. Retinoids can modulate epithelial cell-growth, differentiation, and apoptosis in vitro and in vivo by binding to specific nuclear retinoid receptors, such as RAR-β. They can also prevent abnormal squamous cell differentiation in non-keratinizing tissues. Therefore, the EMP-1 and RAR-β interaction might also play important roles in epithelial cell and keratinocyte differentiation (Wang ). SPRR3, one of the substrates for TGM3, is a small proline-rich protein, abundantly expressed in oral and esophageal epithelia. Its expression, although less concomitant with TGM3 loss in ESCC (Luo ), is highly induced during human epidermal keratinocyte differentiation, thus being considered a differentiation marker of squamous epithelium (Abraham ). Recent reviews imply that SPRR3 is frequently down-regulated in ESCC, when compared to adjacent paired mucosa (Chen ; de A Simão ). Further studies have shown that exogenous expression of SPRR3 significantly suppresses ESCC cell formation by inducing CDK11p46 protein expression and apoptosis (Zhang ). Therefore, SPRR3 might play a crucial role in the maintenance of normal esophageal epithelial homeostasis. Finally, the combination meta-analysis, graph-clustering and GO term analysis is presumedly effective for identifying differentially expressed genes, and speculating on their respective roles in ESCC. Hence, other unidentified genes will be the focus of our future studies.
  45 in total

1.  Activation of Vav/Rho GTPase signaling by CXCL12 controls membrane-type matrix metalloproteinase-dependent melanoma cell invasion.

Authors:  Rubén A Bartolomé; Isabel Molina-Ortiz; Rafael Samaniego; Paloma Sánchez-Mateos; Xosé R Bustelo; Joaquin Teixidó
Journal:  Cancer Res       Date:  2006-01-01       Impact factor: 12.701

2.  Assessment of clinical outcome in patients with esophageal squamous cell carcinoma using TNM classification score and molecular biological classification.

Authors:  Shinsuke Takeno; Tsuyoshi Noguchi; Yoshiaki Takahashi; Shoichi Fumoto; Tomotaka Shibata; Katsunobu Kawahara
Journal:  Ann Surg Oncol       Date:  2007-01-28       Impact factor: 5.344

3.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis.

Authors:  Fangxin Hong; Rainer Breitling; Connor W McEntee; Ben S Wittner; Jennifer L Nemhauser; Joanne Chory
Journal:  Bioinformatics       Date:  2006-09-18       Impact factor: 6.937

4.  Functional studies of a novel oncogene TGM3 in human esophageal squamous cell carcinoma.

Authors:  Wei Liu; Zai-Cheng Yu; Wen-Feng Cao; Fang Ding; Zhi-Hua Liu
Journal:  World J Gastroenterol       Date:  2006-06-28       Impact factor: 5.742

5.  Rat toxicogenomic study reveals analytical consistency across microarray platforms.

Authors:  Lei Guo; Edward K Lobenhofer; Charles Wang; Richard Shippy; Stephen C Harris; Lu Zhang; Nan Mei; Tao Chen; Damir Herman; Federico M Goodsaid; Patrick Hurban; Kenneth L Phillips; Jun Xu; Xutao Deng; Yongming Andrew Sun; Weida Tong; Yvonne P Dragan; Leming Shi
Journal:  Nat Biotechnol       Date:  2006-09       Impact factor: 54.908

6.  Proteomic analysis of global alteration of protein expression in squamous cell carcinoma of the esophagus.

Authors:  Ge Zhou; Hongmei Li; Yi Gong; Yingxin Zhao; Jingke Cheng; Peng Lee; Yingming Zhao
Journal:  Proteomics       Date:  2005-09       Impact factor: 3.984

7.  Sonic hedgehog signaling promotes motility and invasiveness of gastric cancer cells through TGF-beta-mediated activation of the ALK5-Smad 3 pathway.

Authors:  Young A Yoo; Myoung Hee Kang; Jun Suk Kim; Sang Cheul Oh
Journal:  Carcinogenesis       Date:  2008-01-03       Impact factor: 4.944

8.  Cyclooxygenase-2 expression is related to prognosis in patients with esophageal squamous cell carcinoma.

Authors:  H Takatori; S Natsugoe; H Okumura; M Matsumoto; Y Uchikado; T Setoyama; K Sasaki; K Tamotsu; T Owaki; S Ishigami; T Aikou
Journal:  Eur J Surg Oncol       Date:  2007-06-05       Impact factor: 4.424

9.  Exogenous expression of Esophagin/SPRR3 attenuates the tumorigenicity of esophageal squamous cell carcinoma cells via promoting apoptosis.

Authors:  Yu Zhang; Yan-Bin Feng; Xiao-Ming Shen; Bao-Sheng Chen; Xiao-Li Du; Man-Li Luo; Yan Cai; Ya-Ling Han; Xin Xu; Qi-Min Zhan; Ming-Rong Wang
Journal:  Int J Cancer       Date:  2008-01-15       Impact factor: 7.396

10.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks.

Authors:  Md Altaf-Ul-Amin; Yoko Shinbo; Kenji Mihara; Ken Kurokawa; Shigehiko Kanaya
Journal:  BMC Bioinformatics       Date:  2006-04-14       Impact factor: 3.169

View more
  5 in total

1.  Dual high expression of STAT3 and cyclinD1 is associated with poor prognosis after curative resection of esophageal squamous cell carcinoma.

Authors:  Haiying Li; Weiwei Xiao; Jiwei Ma; Yong Zhang; Ru Li; Jiecheng Ye; Xiao Wang; Xueyun Zhong; Shaoxiang Wang
Journal:  Int J Clin Exp Pathol       Date:  2014-10-15

2.  Ranking candidate genes of esophageal squamous cell carcinomas based on differentially expressed genes and the topological properties of the co-expression network.

Authors:  Yuzhou Shen; Jicheng Tantai; Heng Zhao
Journal:  Eur J Med Res       Date:  2014-10-29       Impact factor: 2.175

Review 3.  Viruses, Other Pathogenic Microorganisms and Esophageal Cancer.

Authors:  Wenji Xu; Zhongshu Liu; Quncha Bao; Zhikan Qian
Journal:  Gastrointest Tumors       Date:  2015-04-08

4.  Expression of NF-κB and TLR-4 is associated with the occurrence, progression and prognosis of esophageal squamous cell carcinoma.

Authors:  Xiang Li; Haiying Li; Xiuli Dong; Xiaoming Wang; Junhan Zhu; Yaozhen Cheng; Ping Fan
Journal:  Int J Clin Exp Pathol       Date:  2018-12-01

5.  Identification of molecular targets for esophageal carcinoma diagnosis using miRNA-seq and RNA-seq data from The Cancer Genome Atlas: a study of 187 cases.

Authors:  Jiang-Hui Zeng; Dan-Dan Xiong; Yu-Yan Pang; Yu Zhang; Rui-Xue Tang; Dian-Zhong Luo; Gang Chen
Journal:  Oncotarget       Date:  2017-05-30
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.