Literature DB >> 31181055

Differentially Expressed Gene Screening, Biological Function Enrichment, and Correlation with Prognosis in Non-Small Cell Lung Cancer.

He Huang1, Qingdong Huang1, Tingyu Tang1, Xiaoxi Zhou1, Liang Gu1, Xiaoling Lu1, Fang Liu1.   

Abstract

BACKGROUND The aim of this study was to explore the differently expressed genes and pathways in non-small cell lung cancer (NSCLC) and their correlation with the prognosis. MATERIAL AND METHODS Gene expression data series of GSE19804, GSE101929, and GSE33532 were downloaded from the Gene Expression Ominibus (GEO) database. The overlaping differently expressed genes (DEGs) were identified form the above 3 data series. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEEG) were used to analyze the biological functions and signal pathways of DEGs. The protein-protein interaction (PPI) was analyzed thorough Search Tool for the Retrieval of Interacting Gens (STRING). The relationship between the expression of hub genes and the prognosis of patients was analyzed by Kaplan-Meier Plotter online software. RESULTS Twenty-nine DEGs were identified, with 22 upregulated genes and 7 downregulated genes. The enriched biological processes were mainly related to diet-induced thermogenesis and actin filament binding. The KEGG pathways were enriched in calcium signaling, regulation of lipolysis in adipocytes, and PPAR signaling. Two downregulated genes (MMP1 and SPP1) were identified as hub genes by Cytohubba. Twenty-two dysregulated genes were correlated with patient prognosis. CONCLUSIONS Differentially expressed genes are common in NSCLC patients and can be used as biomarkers for patient prognosis.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31181055      PMCID: PMC6582684          DOI: 10.12659/MSM.916962

Source DB:  PubMed          Journal:  Med Sci Monit        ISSN: 1234-1010


Background

Lung cancer, including non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC), is the leading cause of malignant tumor-related mortality [1]. Epidemiological studies show that more than 1 million new cases of lung cancer and more than 800 000 deaths occur every year [2,3]. The lung cancer epidemiology data from China demonstrate that the overall incidence of lung cancer in China is high, especially in Tianjin city in Dagang province and Xuanwei city in Yunnan province. The incidence of lung cancer in the above 2 areas is significantly higher than the overall global level [4,5]. It is reported that 75–80% of lung cancer is NSCLC, whose biological behavior and treatment methods are different from those of small cell lung cancer. At present, the molecular mechanism of the occurrence, development, invasion, and metastasis of NSCLC is still unclear. In recent years, with the development of gene expression profiling chip and second-generation high-throughput sequencing technology, the amount of data on lung cancer expression profiles has greatly expanded, which provides the basis for the comprehensive study of differentially expressed genes and their biological functions in lung cancer [6]. In this study, 3 gene expression profiles of lung cancer were selected from the GEO () [7] database, and we explored the function of DEGs in the development of lung cancer and its relationship with patient prognosis.

Material and Methods

Microarray data screening

Three gene expression data series – GSE19804 [8], GSE101929 [9], and GSE33532 [10] – relevant to lung cancer from the GEO database were identified and included for the present analysis. The original microarry data of the 3 data series were download. For GSE19804, 120 lung cancer specimens with 60 cancer tissues and paired 60 normal lung tissues were recognized with the platform of GPL570[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array. A total of 41 non-small cell lung cancer cases were inlcuded in the data series of GSE101929 and the gene expression was detected by GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array. For GSE33532, individual primary tumors and matched distant normal lung tissues (N) from 20 patients were used to establish gene expression patterns captured by GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array.

Data processing

The microarray data of the included 3 data series were first analyzed using R 3.4.4 statistical software (), then the identified dysregulated genes were further analyzed to find the overlapped genes of the 3 data series.

Biological function enrichment and pathway analysis

The biological function enrichment and pathways analysis were performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, ) [11]. This analysis included 2 aspects: one is gene ontology (GO) [12, 13] and the other is Kyoto Encyclopedia of Genes and Genomes (KEEG) [14]. The GO enrichment includes biological process (BP), cellular component (CC), and molecular function (MF).

Protein–protein network analysis and hub gene identification

The protein–protein network was built by the Search Tool for Retrieval of Interacting Genes (STRING) database with the criteria of: minimum required interaction score of 0.4 and active interaction sources of text mining, experiments, databases, co-expression, neighborhood, gene fusion, and co-recurrence. The target hub gene was selected with the criteria of top 10 genes according to 5 Cytohubba ranking method using Cytoscape software () [15].

Survival analysis

The survival analysis of patients relevant to gene expression was expressed by the database of Kaplan-Meier Plotter () [16] through survival curves. According to the median expression of each gene in cancer tissues, the patients were divided into a high-expression group and a low-expression group. The overall survival (OS) was compared between the 2 groups for each included gene.

Results

Identification of differentially expressed genes

Datasets of GSE19804, GSE101929, and GSE33532 from the GEO database were inclued in our study. The DEGs were fist screened form each dataset, and 40 overlaping differentially expresed genes ID were identified (Figure 1). However, the 40 gene IDs correspond to 30 genes with 10 duplicate genes, and 1 gene ID had no gene name. Finally, 29 genes were inlcued for further analysis, of which 22 were upregulated and 7 downregulated (Table 1). The differentially expressed genes between cancer tissue and lung normal tissue are repressented in a heat map in Figure 2.
Figure 1

(A–D) Identification of differentially expressed genes from GSE33532, GSE19804, and GSE101929 data series (A: Volcano plot of of GSE33532; B: Volcano plot of GSE19804; C: Volcano plot of GSE101929).

Table 1

The 29 included differentially expressed genes overlapping in GSE33532, GSE19804, and GSE101929 data series.

Gene IDGene symbolMean logFC (GSE19804)
209612_s_atADH1B3.36491817
229309_atADRB13.21379367
210081_atAGER3.21379367
206209_s_atCA43.86942117
232578_atCLDN184.16088183
213317_atCLIC53.45981
204320_atCOL11A1−3.32311183
225681_atCTHRC1−3.193161
204273_atEDNRB3.190866
203980_atFABP43.7473685
209074_s_atFAM107A3.4444825
205866_atFCN33.40527367
238222_atGKN23.25140117
209469_atGPM6A3.61581183
230030_atHS6ST2−3.390935
204475_atMMP1−3.04218817
204580_atMMP12−3.17540267
239650_atNCKAP53.105129
230469_atRTKN23.4138185
205725_atSCGB1A13.38117033
214387_x_atSFTPC3.2855315
242009_atSLC6A43.59241617
213456_atSOSTDC13.38619867
206239_s_atSPINK1−3.33911383
209875_s_atSPP1−4.503354
230560_atSTXBP63.6274755
219230_atTMEM1003.56547367
209904_atTNNC13.11718933
204712_atWIF13.77360267
Figure 2

Heat map of the differentially expressed genes between cancer tissue and lung normal tissue.

GO and KEGG analysis

The 29 dysregulated genes had gene ontology enrichment in terms of biological process (BP), cellular component (CC), and molecular function (MF). The enriched biological process was mainly related to diet-induced thermogenesis, ventricular cardiac muscle tissue morphogenesis, and brown fat cell differentiation. For the cellular component, the 29 genes were enriched in extracellular space, neuron projection, and plasma membrane. In the aspect of molecular function, only 1 term of actin filament binding was enriched. KEGG pathway analysis showed that the 29 dysregulated genes were enriched in calcium signaling pathway, regulation of lipolysis in adipocytes, and PPAR signaling pathway (Table 2).
Table 2

GO and KEGG analysis of the differentially expressed genes between cancer tissue and lung normal tissue.

CategoryTermCountP-value
GOTERM_BP_DIRECTDiet-induced thermogenesis29.9E-3
GOTERM_BP_DIRECTVentricular cardiac muscle tissue morphogenesis23.4E-2
GOTERM_BP_DIRECTBrown fat cell differentiation24.2E-2
GOTERM_CC_DIRECTExtracellular space67.7E-3
GOTERM_CC_DIRECTNeuron projection31.3E-2
GOTERM_CC_DIRECTPlasma membrane82.5E-2
GOTERM_CC_DIRECTExtracellular region43.0E-2
GOTERM_CC_DIRECTCollagen trimer27.5E-2
GOTERM_MF_DIRECTActin filament binding26.3E-2
KEGG_PATHWAYCalcium signaling pathway33.0E-2
KEGG_PATHWAYRegulation of lipolysis in adipocytes27.7E-2
KEGG_PATHWAYPPAR signaling pathway9.7E-2

PPI network analysis of the 29 genes

The STRING database was used for PPI network analysis, showing 79 nodes and 336 edges, with the average node degree of 8.51 (Figure 3), and the local clustering coefficient was 0.648. We also use Cytohubba to select the hub genes, showing that 2 downregulated genes (MMP1 and SPP1) were hub genes (Figure 4).
Figure 3

Protein–protein interaction (PPI) network of the 29 dysregulated genes.

Figure 4

Hub gene identified by Cytohubba.

The prognostic significance of the 29 genes for NSCLC was analyzed in the Kaplan-Meier Plotter database. The significant difference in overall survival (OS) between upregulated and downregulated genes is shown in Figure 5. Twenty-two dysregulated genes were correlated with patient prognosis (Table 3).
Figure 5

Survival curve of non-small cell lung cancer according to low and high expression of included genes.

Table 3

Survival analysis of the 29 included genes.

Gene IDGene symbolHR (95% CI)p-Value
209612_s_atADH1B0.67 (0.59–0.76)4.5E-10
229309_atADRB10.68 (0.58–0.80)5.2e-6
210081_atAGER0.76 (0.67–0.86)2.3e-5
206209_s_atCA41.03 (0.9–1.165)0.69
232578_atCLDN180.75 (0.66–0.86)1.4E-5
213317_atCLIC50.68 (0.59–0.77)1.3e-9
204320_atCOL11A11.2 (1.02–1.42)0.028
225681_atCTHRC11.11 (0.94–1.31)0.21
204273_atEDNRB0.72 (0.36–0.81)2.5e-7
203980_atFABP41.02 (0.9–1.16)0.78
209074_s_atFAM107A0.80 (0.71–0.91)0.00078
205866_atFCN30.99 (0.87–1.12)0.88
238222_atGKN20.83 (0.70–0.98)0.028
209469_atGPM6A0.74 (0.65–0.84)2.9e-6
230030_atHS6ST20.75 (0.64–0.89)0.00071
204475_atMMP11.07 (0.94–1.21)0.30
204580_atMMP121.52 (1.34–1.73)9.1e-11
239650_atNCKAP50.64 (0.54–0.76)1.6e-7
230469_atRTKN21.02 ()0.86–1.200.85
205725_atSCGB1A10.81 (0.71–0.92)0.0012
214387_x_atSFTPC0.81 (0.71–0.92)0.0011
242009_atSLC6A40.74 (0.63–0.87)0.00035
213456_atSOSTDC11.07 (0.94–1.21)0.32
206239_s_atSPINK10.765 (0.67–0.86)1.5e-5
209875_s_atSPP11.32 (1.16–1.49)1.9e-5
230560_atSTXBP60.77 (0.65–0.91)0.0017
219230_atTMEM1000.62 (0.54–0.71)1.2e-13
209904_atTNNC11.28 (1.13–1.45)0.00014
204712_atWIF10.67 (0.59–0.76)3.2e-10

Discussion

With the rapid development of bioinformatics, more and more microarrays and sequencing data can be publicly accessed [17]. These data are collected and stored in corresponding databases, such as GEO (), TCGA (), Kaplan-Meier Plotter, and STRING. Clinical information (e.g., disease type, age, sex, and survival rate) and gene expression data can be freely downloaded or analyzed online, providing a reliable data platform for further data mining, analysis, and solving clinical problems [18,19]. The GEO database was established by the US National Library of Medicine in 2000. It is dedicated to the construction of gene expression databases and online analysis resources [20]. It mainly contains gene chip data and partial sequencing data of various tissues. At present, it is one of the most important databases in the field of bioinformatics data mining [7,21]. Fang et al. [22] performed integrative bioinformatics analysis, revealing potential long non-coding RNA biomarkers and analysis of function in non-smoking females with lung cancer. In that study, the authors found that 2 DEGs (LINC00968 and TBX5-AS1) were associated with unfavorable prognosis in never-smoking female lung cancer patients. In our present work, we selected data on 3 gene chips relevant to differential expression between lung cancer tissues and normal lung tissues of NSCLC patients in the GEO database. We finally identified 29 differentially expressed genes in 3 datasets and further analyzed them for biological function enrichment, pathways, and survival analysis. These 29 included dysregulated genes are mainly enriched in the biological function of diet-induced thermogenesis, ventricular cardiac muscle tissue morphogenesis, and actin filament binding. The KEGG pathway analysis showed that the 29 dysregulated genes were enriched in calcium signaling and regulation of lipolysis in adipocytes and in the PPAR signaling pathway. Further analysis showed that 2 genes (MMP1 and SPP1) were hub genes. Matrix metalloproteinase-1 (MMP-1) is part of a cluster of MMP genes localized to chromosome 11q22.3. MMP-1 is involved in the breakdown of extracellular matrix, which may play an important role in tumor metastasis by breaking down interstitial collagens types I, II, and III [23,24]. However, SPP1 seems to have no correlation with cancer in terms of biological function enrichment [25,26]. Our survival analysis indicated that 22 of the 29 included dysregulated genes were correlated with patient prognosis, suggesting that these 22 genes could be used as biomarkers for patient prognosis.

Conclusions

Twenty-nine differently expressed genes were identified in the present work, which were enriched in the biological functions of diet-induced thermogenesis, actin filament binding, and PPAR signaling pathway. Dysregulatd genes were correlated with NSCLC patient survival and might be useful as biomarkers of prognosis. However, this conclusion needs further confirmation by laboratory experiments.
  25 in total

1.  DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Authors:  Glynn Dennis; Brad T Sherman; Douglas A Hosack; Jun Yang; Wei Gao; H Clifford Lane; Richard A Lempicki
Journal:  Genome Biol       Date:  2003-04-03       Impact factor: 13.583

2.  Reannotation of array probes at NCBI's GEO database.

Authors:  Tanya Barrett; Ron Edgar
Journal:  Nat Methods       Date:  2008-02       Impact factor: 28.547

3.  MMP1 promotes tumor growth and metastasis in esophageal squamous cell carcinoma.

Authors:  Min Liu; Yi Hu; Mei-Fang Zhang; Kong-Jia Luo; Xiu-Ying Xie; Jing Wen; Jian-Hua Fu; Hong Yang
Journal:  Cancer Lett       Date:  2016-04-26       Impact factor: 8.679

Review 4.  Headful DNA packaging: bacteriophage SPP1 as a model system.

Authors:  Leonor Oliveira; Paulo Tavares; Juan C Alonso
Journal:  Virus Res       Date:  2013-02-16       Impact factor: 3.303

5.  Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women.

Authors:  Tzu-Pin Lu; Mong-Hsun Tsai; Jang-Ming Lee; Chung-Ping Hsu; Pei-Chun Chen; Chung-Wu Lin; Jin-Yuan Shih; Pan-Chyr Yang; Chuhsing Kate Hsiao; Liang-Chuan Lai; Eric Y Chuang
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2010-08-27       Impact factor: 4.254

6.  Global cancer statistics, 2012.

Authors:  Lindsey A Torre; Freddie Bray; Rebecca L Siegel; Jacques Ferlay; Joannie Lortet-Tieulent; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2015-02-04       Impact factor: 508.702

7.  Cancer statistics in China, 2015.

Authors:  Wanqing Chen; Rongshou Zheng; Peter D Baade; Siwei Zhang; Hongmei Zeng; Freddie Bray; Ahmedin Jemal; Xue Qin Yu; Jie He
Journal:  CA Cancer J Clin       Date:  2016-01-25       Impact factor: 508.702

8.  NCBI GEO: mining tens of millions of expression profiles--database and tools update.

Authors:  Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Dmitry Rudnev; Carlos Evangelista; Irene F Kim; Alexandra Soboleva; Maxim Tomashevsky; Ron Edgar
Journal:  Nucleic Acids Res       Date:  2006-11-11       Impact factor: 16.971

9.  NCBI GEO: mining millions of expression profiles--database and tools.

Authors:  Tanya Barrett; Tugba O Suzek; Dennis B Troup; Stephen E Wilhite; Wing-Chi Ngau; Pierre Ledoux; Dmitry Rudnev; Alex E Lash; Wataru Fujibuchi; Ron Edgar
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

Review 10.  Spp1 at the crossroads of H3K4me3 regulation and meiotic recombination.

Authors:  Laurent Acquaviva; Julie Drogat; Pierre-Marie Dehé; Christophe de La Roche Saint-André; Vincent Géli
Journal:  Epigenetics       Date:  2013-03-19       Impact factor: 4.528

View more
  3 in total

1.  Identifying candidate diagnostic markers for early stage of non-small cell lung cancer.

Authors:  Zhen Wu; Xu Zhang; Zhihui He; Liyun Hou
Journal:  PLoS One       Date:  2019-11-14       Impact factor: 3.240

2.  Identification of HMMR as a prognostic biomarker for patients with lung adenocarcinoma via integrated bioinformatics analysis.

Authors:  Zhaodong Li; Hongtian Fei; Siyu Lei; Fengtong Hao; Lijie Yang; Wanze Li; Laney Zhang; Rui Fei
Journal:  PeerJ       Date:  2021-12-22       Impact factor: 2.984

3.  Identification of Significant Genes in Lung Cancer of Nonsmoking Women via Bioinformatics Analysis.

Authors:  Yu Wang; Sibo Hu; Xianguang Bai; Ke Zhang; Ruixue Yu; Xichao Xia; Xinhua Zheng
Journal:  Biomed Res Int       Date:  2021-10-11       Impact factor: 3.411

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.