Literature DB >> 26870242

Identification of key genes associated with gastric cancer based on DNA microarray data.

Hui Sun1.   

Abstract

The present study aimed to identify genes with a differential pattern of expression in gastric cancer (GC), and to find novel molecular biomarkers for GC diagnosis and therapeutic treatment. The gene expression profile of GSE19826, including 12 GC samples and 15 normal controls, was downloaded from the Gene Expression Omnibus database. Differentially-expressed genes (DEGs) were screened in the GC samples compared with the normal controls. Two-way hierarchical clustering of DEGs was performed to distinguish the normal controls from the GC samples. The co-expression coefficient was analyzed among the DEGs using the data from COXPRESdb. The gene co-expression network was constructed based on the DEGs using Cytoscape software, and modules in the network were analyzed by ClusterOne and Bingo. Furthermore, enrichment analysis of the DEGs in the modules was performed using the Database for Annotation, Visualization and Integrated Discovery. In total, 596 DEGs in the GC samples and 57 co-expression gene pairs were identified. A total of 7 genes were enriched in the same module, for which the function was phosphate transport and which was annotated to participate in the extracellular matrix-receptor interaction pathway. These genes were collagen, type VI, α3 (COL6A3), COL1A2, COL1A1, COL5A2, thrombospondin 2, COL11A1 and COL5A1. Overall, the present study identified several biomarkers for GC using the gene expression profiling of human GC samples. The COL family is a promising prognostic marker for GC. Gene expression products represent candidate biomarkers endowed with great potential for the early screening and therapy of GC patients.

Entities:  

Keywords:  co-expression network; differentially-expressed genes; function analysis; gastric cancer; pathway analysis

Year:  2015        PMID: 26870242      PMCID: PMC4727153          DOI: 10.3892/ol.2015.3929

Source DB:  PubMed          Journal:  Oncol Lett        ISSN: 1792-1074            Impact factor:   2.967


Introduction

Gastric cancer (GC) is one of the leading causes of cancer-related mortality worldwide, and is particularly prevalent in East Asian countries, including China, Japan and Korea (1). Each year, ~990,000 people are diagnosed with GC worldwide; ~738,000 of whom succumb to the disease (2). The high patient mortality rate is due to the fact that the clinical manifestations of GC usually only become apparent at an advanced disease stage, when the current available therapies will have a limited effect (3,4). Therefore, it is of utmost importance to understand the associated mechanisms and to identify biomarkers for the development of strategies for the screening, early detection and treatment of GC. GC is a complicated and multifactorial disease, and environmental and genetic factors play important roles in its etiology (5). One of the characteristics of gastric malignant cells is metastasis, whereby cancer cells penetrate vascular channels and invade parenchymal tissue to form satellite tumors in distant organs (6). In this process, the extracellular matrix (ECM) and the basement membrane provide a protective barrier to prevent cancer cell invasion and metastasis (7). Similar to other malignancies, gene expression profiling using complementary DNA microarrays has been used to identify genes involved in gastric carcinogenesis, and to identify novel diagnostic and prognostic markers for GC (8–11). Recent studies have reported genetic alterations in GC, involving tumor suppressor genes, cell adhesion molecules, oncogenes and growth factors, such as p53, trefoil factor 1 and E-cadherin (10,12–15). However, these studies have yielded few useful biomarkers, most likely due to shortcomings concerning the experimental design, the validity of the supporting statistical analysis and the gene selection in the studies. Thus, the present study focused on the gene expression profiling of GC to identify novel biomarkers in this disease. With the same gene expression profile, Wang et al performed gene set enrichment analysis and identified that increased INHBA expression was associated with poor survival in GC (16). A study by Liu et al demonstrated that the ECM-receptor and cell cycle pathways may play important roles in GC (17). In addition, a study using the same microarray data revealed high periostin expression in GC tissues, which was associated with gene groups that regulated the cell proliferation and cell cycle (18). The present study analyzed the differentially-expressed genes (DEGs) in GC using gene expression profiling. Comprehensive bioinformatics was used to analyze the significant pathways and functions, and to construct the gene co-expression network and sub-network to investigate the critical DEGs of GC. The study aimed to obtain a better understanding of the molecular circuitry in GC and to identify genes potentially useful as novel diagnostic or therapeutic markers for GC.

Materials and methods

Affymetrix microarray data

The gene expression profile of GSE19826 (16) was downloaded from the Gene Expression Omnibus database (19), which freely distributes high-throughput molecular abundance data, largely gene expression data generated by microarray technology. The platform information is as follows: GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array (Affymetrix Inc., Santa Clara, CA, USA). In this dataset, 12 cancerous portions of gastric specimens (from Chinese patients) and 15 normal gastric tissues (controls) were included.

Data preprocessing and screening of DEGs

The preprocessed microarray data were obtained and then log2 transformation was performed on these data. The most popular method, the Linear Models for Microarray data (limma) package (20) in R language (21), was used to analyze the chip data. Upregulated and downregulated genes were identified between GC and normal controls. The false discovery rate (FDR) (22) was utilized for multiple testing correction using the Benjamini and Hochberg method (23). The threshold for the DEGs was set as |log2 fold change (FC)|>1.5 and FDR <0.05.

Hierarchical clustering

Hierarchical clustering methodology is a powerful data mining approach that has been extensively applied to identify groups of similarly expressed genes or conditions from gene expression data. In order to reveal sets of samples in which the closest groups were adjacent, two-way hierarchical clustering analysis (24) was performed on genes and conditions using Euclidean distance (25) by the ‘pheatmap’ package (http://cran.r-project.org/web/packages/pheatmap/index.html) in R language. The result was represented by a heatmap.

Co-expression network construction of DEGs

From the perspective of systems biology, functionally-related genes are frequently co-expressed across a set of samples (26). COXPRESdb (http://coxpresdb.jp) provides co-expression associations for multiple species of mammals, as comparisons of co-expressed gene lists can increase the reliability of gene co-expression determinations (27). The gene co-expression network was constructed to assess the functional associations between co-expressed genes of DEGs using COXPRESdb, in which genes were indexed by their Entrez Gene IDs. To obtain the co-expression associations, a Pearson Correlation Coefficient >0.6 was chosen as the threshold.

Selection of modules in co-expression network

Gene products in the same module often have the same or similar functions, and they work together to perform one bio-function (28). Therefore, the network was visualized using Cytoscape (29) and module division was made by using the plugin ClusterOne (30) in Cytoscape (parameters: Minimum size, 3; overlap threshold, 0.8), then module function was annotated using another plugin-Bingo (31) and the significant function of each module was achieved.

Function and pathway enrichment analysis of DEGs in modules

Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses were performed for the DEGs in the co-expression network using the online tool, DAVID (32). P<0.05 was used to indicate statistical significance.

Results

DEG screening

Following data preprocessing, 42,450 genes were mapped to the probes; the gene expression profile after normalization is shown in Fig. 1. The black lines in each of the boxes, representing the medians of each dataset, are almost in a straight line, indicating a good degree of standardization. Compared with the normal tissues, a total of 596 genes were differentially expressed, consisting of 182 upregulated and 414 downregulated genes.
Figure 1.

Normalized expressed value data. The black line in each box represents the median of each set of data, which determines the degree of standardization of data through its distribution.

The hierarchical clustering algorithm was used to group the genes and samples on the basis of similarities of gene expression. In the results shown in Fig. 2, one normal sample was grouped into the region of GC samples, suggesting that 93.33% of samples were classified correctly. Thus, the DEGs screened had significant expression patterns that could distinguish the disease samples from the normal controls.
Figure 2.

Heat-map overview of the two-way hierarchical clustering analysis of the differentially-expressed genes. Blue coloration indicates decreased expression in GC and red coloration indicates increased expression in GC. Samples (12 GC and 15 controls) are represented by columns. The red box represents the normal sample that was enriched into the cancer samples group.

Co-expression network construction and module selection

A total of 57 co-expressed gene pairs were determined between DEGs. In Fig. 3, the upregulated and downregulated genes tended to connect up, respectively. The study identified 4 modules from the network (Fig. 4). The function of the DEGs in each module is presented in Table I. Module 1 had one upregulated gene, PDZ and LIM domain 7 (PDLIM7). Genes in modules 2 and 4, which mostly belonged to the collagen (COL) family, were significantly associated with phosphate transport.
Figure 3.

Co-expression network of differentially-expressed genes. Green nodes represent downregulated genes. Red nodes indicate upregulated genes.

Figure 4.

Selected modules from the gene co-expression network. Green nodes represent downregulated genes. Red nodes indicate upregulated genes.

Table I.

Functions of the genes in the modules.

GO-IDcorr P-valueNDescriptionGenes in test set
Module 1
  312144.30×10−22Biomineral formationCASR, PDLIM7
  15034.30×10−22OssificationCASR, PDLIM7
Module 2
  68178.57×10−85Phosphate transportCOL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  156987.47×10−75Inorganic anion transportCOL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  68201.25×10−65Anion transportCOL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  68118.30×10−45Ion transportCOL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  68103.10×10−25TransportCOL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  485131.76×10−46Organ developmentFAP, COL6A3, FBN1, COL1A2, COL1A1, COL5A2
  487319.03×10−46System developmentFAP, COL6A3, FBN1, COL1A2, COL1A1, COL5A2
  488561.64×10−36Anatomical structure developmentFAP, COL6A3, FBN1, COL1A2, COL1A1, COL5A2
  72754.45×10−36Multicellular organismal developmentFAP, COL6A3, FBN1, COL1A2, COL1A1, COL5A2
  512341.03×10−26Establishment of localizationFAP, COL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  325021.29×10−26Developmental processFAP, COL6A3, FBN1, COL1A2, COL1A1, COL5A2
  511791.62×10−26LocalizationFAP, COL6A3, COL1A2, COL1A1, COL5A2, COL5A1
  325012.24×10−26Multicellular organismal processFAP, COL6A3, FBN1, COL1A2, COL1A1, COL5A2
Module 3
  69362.46×10−32Muscle contractionMYL1, KBTBD10
  30122.46×10−32Muscle system processMYL1, KBTBD10
Module 4
  68177.88×10−43Phosphate transportCOL6A3, COL12A1, COL1A1
  156982.15×10−33Inorganic anion transportCOL6A3, COL12A1, COL1A1
  68202.47×10−33Anion transportCOL6A3, COL12A1, COL1A1
  226102.64×10−23Biological adhesionCOL6A3, COL12A1, THBS2
  71552.64×10−23Cell adhesionCOL6A3, COL12A1, THBS2
  68113.28×10−23Ion transportCOL6A3, COL12A1, COL1A1
  485131.16×10−24Organ developmentFAP, COL6A3, COL12A1, COL1A1
  487312.64×10−24System developmentFAP, COL6A3, COL12A1, COL1A1
  488563.20×10−24Anatomical structure developmentFAP, COL6A3, COL12A1, COL1A1
  72753.46×10−24Multicellular organismal developmentFAP, COL6A3, COL12A1, COL1A1
  512344.04×10−24Establishment of localizationFAP, COL6A3, COL12A1, COL1A1
  325024.11×10−24Developmental processFAP, COL6A3, COL12A1, COL1A1
  511794.52×10−24LocalizationFAP, COL6A3, COL12A1, COL1A1

Corr P-value, corrected P-value; N, nodes; GO-ID, gene ontology identification.

Pathway annotation of the DEGs in modules

Two pathways were found to be enriched (Table II), the ECM-receptor interaction and focal adhesion pathways. Of these, the ECM-receptor interaction pathway was most significantly enriched (P=9.44×10−5), and 7 genes [collagen, type VI, α3 (COL6A3), COL1A2, COL1A1, COL5A2, thrombospondin 2 (THBS2), COL11A1 and COL5A1] were predicted to participate in the pathway.
Table II.

Significant pathways of DEGs in the selected modules.

TermCountFDRGenes
hsa04512: ECM-receptor interaction79.44×10−5COL6A3, COL1A2, COL1A1, COL5A2, THBS2, COL11A1, COL5A1
hsa04510: Focal adhesion89.10×10−4PGF, COL6A3, COL1A2, COL1A1, COL5A2, THBS2, COL11A1, COL5A1

Term represents the pathway name. Count represents the number of DEGs enriched in each pathway. DEGs, differentially-expressed genes; FDR, false discovery rate; ECM, extracellular matrix.

Discussion

GC is the fourth most frequently occurring malignant tumor worldwide, with high incidence and mortality rates. Therefore, it is of great importance to conduct research on the treatment of GC (33). Major efforts are being made to understand GC at a molecular level (34). Since microarrays can simultaneously investigate the expression levels of thousands of genes in the human genome, use of the technique has been widely applied in the identification of disease biomarkers (26,35). In the present study, a total of 596 DEGs were identified in the GC samples compared with the normal controls. Furthermore, the co-expression interaction network of DEGs was construction and 4 modules were identified. The upregulated PDLIM7 gene was enriched in module 1, while 7 other upregulated genes (COL6A3, COL1A2, COL1A1, COL5A2, THBS2, COL11A1 and COL5A1) were involved in the ECM-receptor interaction pathway. The COL family of genes were mainly enriched in module 2, for which the function was phosphate transport. COL1A1 and COL1A2 encode the α1 and α2 chains of type I collagen, respectively (36). Collagen is the main constituent of the ECM component in tumors, and a number of collagen types have been found in GC tissues (37). The major constituents of the ECM are collagens, adhesive glycoproteins and proteoglycans (38). Specific interactions between cells and ECM-mediated cell-surface-associated components and transmembrane molecules result in the control of cellular activities, such as adhesion and migration (39). Matsui et al showed that collagen degradation, which was an essential step in the tumor cell invasion of the surrounding tissues, was increased in GC tissues (40). Su et al reported that COL1A1 and COL1A2 were commonly upregulated in GC, and were associated with invasion and metastasis (8). In line with this previous study, the present results showed that COL1A1 and COL1A2 were upregulated in GC, suggesting that they play an important role in cancer cell invasion and metastasis in this disease. On the other hand, the COL family genes were mainly enriched in modules 2 and 4, for which the function is phosphate transport. COL6A3 was clustered into module 2. COL6A3 encodes one of the three α chains of type VI collagen. Another significant DEG that was enriched in GC was COL11A1, another member of the COL family, which encodes one of the two α chains of type XI collagen. Using microarray technology, COL6A3 and COL11A1 levels have been proven to be elevated in GC endothelium when compared with normal endothelium (41,42). The present study demonstrated that COL6A3 and COL11A1 were upregulated and participated in the ECM-receptor interaction pathway, which was in line with these previous studies. Taken together, the results indicated that the COL family in the present study may be molecular biomarkers for GC. THBS2, which has demonstrated functions as a potent inhibitor of tumor growth and angiogenesis, is a disulfide-linked homotrimetric glycoprotein that mediates cell-to-matrix and cell-cell interactions (43). Stamper et al reported that genes associated with ECM-receptor interactions, including TBHS2, underwent significant changes in expression when comparing craniosynostosis patients and controls (44). In addition, Yasui et al suggested that changes in the ECM could be induced by the degradation of collagen I, which was of great importance to the infiltration and metastasis of cancer cells in GC (45). In line with this previous study, the present results also indicated that TBHS2 was upregulated in GC compared with normal controls, suggesting that TBHS2 may play a role in ECM changes and promote GC progression. PDLIM7 is a family of proteins composed of PDZ and LIM domains that have been proposed to direct protein-protein interactions. Wu et al demonstrated that the LIM domains of Enigma recognized tyrosine-containing motifs with specificity residing in the target structures and the LIM domains (46). Another study showed that receptor tyrosine kinases play essential roles in the control of cancer cell growth and differentiation (47). In the present study, PDLIM7 was found to be upregulated, showing enrichment in module 1, and interacted with other DEGs identified in the study. Another hub gene in module 1 was adenosine deaminase, RNA-specific, B1 (ADARB1) (Fig. 4). ADARB1, also known as ADAR2, encodes the enzyme responsible for pre-mRNA editing of the glutamate receptor subunit B by site-specific deamination of adenosines (48). A previous study demonstrated that the dysregulation of adenosine to inosine in human cancers possibly contributed to the altered transcriptional program required to sustain carcinogenesis (49). Moreover, Camarata et al reported that PDLIM7 could regulate T-box protein 5 transcriptional activity, which is involved in the transcriptional regulation of genes required for mesoderm differentiation (50). In this context, we speculate that PDLIM7 may play a crucial role in GC development via the interaction with ADARB1. In conclusion, the present study investigated the critical genes in GC based on microarray data. The target genes COL1A1, COL1A2, COL6A3, THBS2, COL11A1, PDLIM7 and ADARB1 were involved in the progression of GC. COL6A3, COL1A2, COL1A1, THBS2 and COL11A1were identified to be involved in the ECM-receptor interaction pathway. Furthermore, the genes of the COL family were associated with phosphate transport. COL1A1 and COL1A2 may play an important role in tumor invasion and metastasis in GC. TBHS2 may impact ECM changes and promote GC progression. Moreover, PDLIM7 may play a crucial role in GC development via the interaction with ADARB1. The genes identified in GC tissues in the present study may prove to be molecular biomarkers for this disease, although further studies must be performed to confirm these results.
  45 in total

Review 1.  Network biology: understanding the cell's functional organization.

Authors:  Albert-László Barabási; Zoltán N Oltvai
Journal:  Nat Rev Genet       Date:  2004-02       Impact factor: 53.242

2.  Gene expression profiling of gastric cancer by microarray combined with laser capture microdissection.

Authors:  Ming-Shiang Wu; Yi-Shing Lin; Yu-Ting Chang; Chia-Tung Shun; Ming-Tsan Lin; Jaw-Town Lin
Journal:  World J Gastroenterol       Date:  2005-12-21       Impact factor: 5.742

3.  FDR control by the BH procedure for two-sided correlated tests with implications to gene expression data analysis.

Authors:  Anat Reiner-Benaim
Journal:  Biom J       Date:  2007-02       Impact factor: 2.207

4.  Collagen: a possible prediction mark for gastric cancer.

Authors:  Ying Yin; Yuan Zhao; Ai-Qing Li; Jian-Min Si
Journal:  Med Hypotheses       Date:  2008-10-31       Impact factor: 1.538

Review 5.  Evolving chemotherapy for advanced gastric cancer.

Authors:  Jaffer A Ajani
Journal:  Oncologist       Date:  2005

Review 6.  Genetic and epigenetic changes in stomach cancer.

Authors:  H Yokozaki; W Yasui; E Tahara
Journal:  Int Rev Cytol       Date:  2001

7.  Upregulated INHBA expression is associated with poor survival in gastric cancer.

Authors:  Quan Wang; Yu-Gang Wen; Da-Peng Li; Jun Xia; Chong-Zhi Zhou; Dong-Wang Yan; Hua-Mei Tang; Zhi-Hai Peng
Journal:  Med Oncol       Date:  2010-12-04       Impact factor: 3.064

8.  Cytoscape 2.8: new features for data integration and network visualization.

Authors:  Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

9.  Differential expression of extracellular matrix-mediated pathways in single-suture craniosynostosis.

Authors:  Brendan D Stamper; Sarah S Park; Richard P Beyer; Theo K Bammler; Frederico M Farin; Brig Mecham; Michael L Cunningham
Journal:  PLoS One       Date:  2011-10-19       Impact factor: 3.240

10.  Analyzing microarray data of Alzheimer's using cluster analysis to identify the biomarker genes.

Authors:  Satya Vani Guttula; Apparao Allam; R Sridhar Gumpeny
Journal:  Int J Alzheimers Dis       Date:  2012-02-14
View more
  19 in total

1.  Differential expression of alternatively spliced transcripts related to energy metabolism in colorectal cancer.

Authors:  Anastasiya Vladimirovna Snezhkina; George Sergeevich Krasnov; Andrew Rostislavovich Zaretsky; Alex Zhavoronkov; Kirill Mikhailovich Nyushko; Alexey Alexandrovich Moskalev; Irina Yurievna Karpova; Anastasiya Isaevna Afremova; Anastasiya Valerievna Lipatova; Dmitriy Vladimitovich Kochetkov; Maria Sergeena Fedorova; Nadezhda Nikolaevna Volchenko; Asiya Fayazovna Sadritdinova; Nataliya Vladimirovna Melnikova; Dmitry Vladimirovich Sidorov; Anatoly Yurievich Popov; Dmitry Valerievich Kalinin; Andrey Dmitrievich Kaprin; Boris Yakovlevich Alekseev; Alexey Alexandrovich Dmitriev; Anna Viktorovna Kudryavtseva
Journal:  BMC Genomics       Date:  2016-12-28       Impact factor: 3.969

2.  DNA Microarray Detection of 18 Important Human Blood Protozoan Species.

Authors:  Mu-Xin Chen; Lin Ai; Jun-Hu Chen; Xin-Yu Feng; Shao-Hong Chen; Yu-Chun Cai; Yan Lu; Xiao-Nong Zhou; Jia-Xu Chen; Wei Hu
Journal:  PLoS Negl Trop Dis       Date:  2016-12-02

3.  RNA Sequencing of Early-Stage Gastric Adenocarcinoma Reveals Multiple Activated Pathways and Novel Long Non-Coding RNAs in Patient Tissue Samples.

Authors:  Sadegh Fattahi; Novin Nikbakhsh; Hassan Taheri; Mohammad Ranaee; Haleh Akhavan-Niaki
Journal:  Rep Biochem Mol Biol       Date:  2021-01

4.  High expression of TREM2 promotes EMT via the PI3K/AKT pathway in gastric cancer: bioinformatics analysis and experimental verification.

Authors:  Chunmei Li; Xiaoming Hou; Shuqiao Yuan; Yigan Zhang; Wenzhen Yuan; Xiaoguang Liu; Juan Li; Yuping Wang; Quanlin Guan; Yongning Zhou
Journal:  J Cancer       Date:  2021-04-02       Impact factor: 4.207

5.  Identification of COL1A1 and COL1A2 as candidate prognostic factors in gastric cancer.

Authors:  Jun Li; Yuemin Ding; Aiqing Li
Journal:  World J Surg Oncol       Date:  2016-11-29       Impact factor: 2.754

6.  Gastric Cancer Associated Genes Identified by an Integrative Analysis of Gene Expression Data.

Authors:  Bing Jiang; Shuwen Li; Zhi Jiang; Ping Shao
Journal:  Biomed Res Int       Date:  2017-01-23       Impact factor: 3.411

7.  Regulatory effects of COL1A1 on apoptosis induced by radiation in cervical cancer cells.

Authors:  Shurong Liu; Gewang Liao; Guowen Li
Journal:  Cancer Cell Int       Date:  2017-07-28       Impact factor: 5.722

8.  Genomic expression differences between cutaneous cells from red hair color individuals and black hair color individuals based on bioinformatic analysis.

Authors:  Joan Anton Puig-Butille; Pol Gimenez-Xavier; Alessia Visconti; Jérémie Nsengimana; Francisco Garcia-García; Gemma Tell-Marti; Maria José Escamez; Julia Newton-Bishop; Veronique Bataille; Marcela Del Río; Joaquín Dopazo; Mario Falchi; Susana Puig
Journal:  Oncotarget       Date:  2017-02-14

9.  A novel gene expression-based prognostic scoring system to predict survival in gastric cancer.

Authors:  Pin Wang; Yunshan Wang; Bo Hang; Xiaoping Zou; Jian-Hua Mao
Journal:  Oncotarget       Date:  2016-08-23

10.  Identification of differentially-expressed of Olfactomedin-related proteins 4 and COL11A1 in Iranian patients with intestinal gastric cancer.

Authors:  Asma Dabiri; Kaveh Baghaei; Mehrdad Hashemi; Shekoofeh Sadravi; Habib Malekpour; Manijeh Habibi; Farhad Lahmi
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2017
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.