Literature DB >> 35116861

Survival-based bioinformatics analysis to identify hub genes and key pathways in non-small cell lung cancer.

Chunliang Liu1, Yu Chen2, Yuqi Deng2, Yu Dong2, Jixuan Jiang2, Si Chen1, Wenfeng Kang1, Jiong Deng1, Haipeng Sun1.   

Abstract

BACKGROUND: Lung cancer is one of the leading causes of cancer mortality worldwide. Here, we performed an integrative bioinformatics analysis to screen hub genes and critical pathways in non-small cell lung cancer (NSCLC) based on the overall survival rate of differentially expressed genes (DEGs).
METHODS: Four datasets from the gene expression omnibus (GEO) were used to identify the DEGs. To obtain robust DEGs in NSCLC, only the DEGs that co-existed in the four datasets were selected for subsequent analysis. To identify the genes correlated with overall survival, the overall survival of these genes was then analyzed using the Kaplan-Meier plotter database. The genes significantly correlated with survival were used to perform gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis; next, these genes were used to construct a protein-protein interaction network. MCODE and CytoHubba were used to identify the clusters and hub genes. Finally, the hub genes were validated in the Cancer Genome Atlas (TCGA) and the Human Protein Atlas (HPA).
RESULTS: We found 522 up-regulated DEGs, and 989 down-regulated DEGs between the NSCLC and normal lung tissue, and 895 of them were correlated with a higher overall survival. GO analysis showed that the DEGs that were associated with a higher overall survival were enriched in cell division, cell cycle, DNA replication, angiogenesis, and cell migration. KEGG analysis was consistent with GO analysis and showed that p53 signaling pathway, pyrimidine metabolism, cGMP-PKG signaling pathway and renin secretion pathway were associated with overall survival in NSCLC. In the protein-protein analysis, we identified seven clusters and six hub genes which were BUB1B, CCNB1, CENPE, KIF18A, NDC10, and MAD2L1. Of these genes, CENPE and KIF18A had not been reported until now. Finally, the dysregulated expression of the six hub genes was validated by the data from the TCGA and HPA.
CONCLUSIONS: We identified the hub genes and potential mechanisms of NSCLC based on multiple-microarray analysis and overall survival; then, validated the hub genes in the TCGA and HPA database. These hub genes may serve as potential therapeutic targets. 2019 Translational Cancer Research. All rights reserved.

Entities:  

Keywords:  Non-small cell lung cancer (NSCLC); differentially expressed genes (DEGs); hub genes; multiple microarray analysis; overall survival; potential mechanism

Year:  2019        PMID: 35116861      PMCID: PMC8797769          DOI: 10.21037/tcr.2019.06.35

Source DB:  PubMed          Journal:  Transl Cancer Res        ISSN: 2218-676X            Impact factor:   1.241


Introduction

Lung cancer is the most prevalent cancer and the leading cause of cancer-associated death (1). Recent global cancer statistics report that the number of new lung cancer cases was 2,093,876 (11.6% of the total cases), and the number of deaths was 1,761,007 (18.4% of the total cancer deaths) in 2018 (2). Lung cancer can be classified into small cell lung cancer (SCLC, 15% of cases) and NSCLC (85% of cases) based on histopathologic diagnosis (3-5). For non-small cell lung cancer (NSCLC), the 5-year survival rates are about 4–17% mainly depending on stage (3-5). Mechanisms underlying NSCLC have been intensively investigated in the past decades. Abnormal expression or mutation of genes such as epidermal growth factor receptor (EGFR) and tumor suppressor protein 53 (TP53) play essential roles in carcinogenesis, progression, and metastasis of NSCLC (3-6). However, the molecular interaction mechanisms and dysregulated pathways remain unclear. High-throughput technologies such as microarray or RNA-seq allow researchers to detect the gene expression profiles at a global level (7-10). Based on this technology, new hub genes that play essential roles in the pathology of NSCLC have been identified (11,12). However, all of these pieces of research were based on a single dataset analysis, which might have led to biased results. Here, we have screened the hub genes based on multiple-microarrays and overall survival, and, in doing so, have made new discoveries.

Methods

Gene expression dataset

Four NSCLC mRNA microarray datasets (GSE18842, GSE19188, GSE27262, and GSE33532) were downloaded from the National Center of Biotechnology Information Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/), which is the most significant resource for high throughput data submitted by scientists under strict standards. These datasets were based on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array). Each dataset contained tumor samples and adjacent healthy tissue samples and were each validated by their submitter. The datasets and their accompanying sample counts are as follows: 46 tumor samples and 45 normal samples for GSE18842, 91 tumor samples and 65 normal samples for GSE19188, 25 tumor samples and 25 normal samples for GSE27262, and 80 tumor samples and 20 normal samples for GSE33532.

Screening differentially expressed genes (DEGs)

The R software (version 3.4.0) and Bioconductor packages were used to analyze the microarray datasets (13). The process for analysis was as follows: we imported CEL files using the affy package, assessed the microarray data quality using the simpleaffy package, preprocessed the raw data into expression metrics using the robust multi-array average (RMA) algorithm in the gcrma package, and used the genefilter package to filter the probe with low sensitivity before comparison (14-16); the t-test then analyzed the DEGs between the tumor and normal group samples based on the limma package (17); multiple testing was corrected by the Benjamini-Hochberg method to obtain the adjusted P-value; finally, the genes with adjusted P<0.001 and |log2fold-change (FC)|>1 were selected as the candidate DEGs. To further enhance the reliability of the DEG analysis, the overlapping DEGs co-existing in all four datasets were identified using FunRich software (version 3.1.3) (18).

Screening overlapping DEGs correlated with overall survival in the Kaplan-Meier plotter database

An online global survival database Kaplan-Meier plotter (http://kmplot.com/analysis/), which contains both the survival data and gene expression data of breast, gastric, ovarian and lung cancers, was used to evaluate the correlation between the prognostic significance and each DEGs of NSCLC (19). The patient samples were classified into high and low expression groups by the median expression value of a gene. In our study, we used the default parameters, which in brief, presented no subtypes restrictions in the “univariate” for Cox regression and the “exclude biased arrays” for array quality control. For each gene, the log-rank P value and median survival were calculated. The gene with log-rank P value <0.001 were considered to be statistically significant.

Function enrichment analysis of DEGs correlated with overall survival

Both Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis are widely used to identify functions and pathways for a large number of genes. Here, we used a professional, comprehensive R package clusterProfiler to investigate enriched GO biological processes and KEGG pathways of the overlapping DEGs correlated with overall survival. The cut-off criterion was set at Benjamini-Hochberg adjusted P<0.05 and q<0.05 (20).

PPI network construction and hub gene analysis

The protein-protein interactions played an essential role in regulating biological processes. These kinds of relationship can be presented by a PPI network in which vertices represent proteins, and edges represent the interaction between proteins. The densely connected regions may serve the enriched function clusters. The proteins are shown to have a high connection (interaction) with several other proteins suggesting a central regulatory role, likely to be regulatory “hubs” (21). To gain the protein-protein interaction information, the DEGs correlated with overall survival were imported into the STRING database to construct a PPI network. The confidence score cutoff was set as 0.9, and only the interaction information validated by experiment was adopted to increase the confidence. Then, the interaction information was imported into the Cytoscape software (version 3.6.1) (22). MCODE was used to explore densely connected regions in large protein-protein interaction networks and the scores >4 and nodes >5 were set as the cutoff criteria (23). ClusterProfiler package was used to analyze the functions of the clusters. CytoHubba was used to reveal the hub nodes in the regulation network (24). The overlapping genes in the top 10 calculated by degree and maximal clique centrality (MCC) were selected as hub genes. Then, we identified their expression level in their original datasets.

Validation of hub genes in the Cancer Genome Atlas (TCGA) and Human Protein Atlas (HPA) database

The mRNA level and overall survival of hub genes were validated by TCGA data. The RNA-seq data was downloaded and analyzed by TCGAbiolinks package in R (25,26). The overall survival analysis of TCGA data was performed using the GEPIA online tool (27). The protein level in healthy tissue and lung cancer tissue was validated using the immunohistochemical data of the HPA database (28).

Results

Screening DEGs correlated with overall survival in a K-M plotter database

To find DEGs of NSCLC, we selected four mRNA microarrays from the GEO database for further analysis. |logFC| >1 and adjust P value <0.001 were set as a strict cutoff to avoid false positive results. After analysis, we obtained 4,216 DEGs (1,873 up-regulated and 2,253 down-regulated) in GSE18842, 3,114 DEGs (1,134 up-regulated and 1,980 down-regulated) in GSE19188, 3,035 DEGs (1,334 up-regulated and 1,701 down-regulated) in GSE27262, and 3,689 DEGs (1,565 up-regulated and 2,124 down-regulated) in GSE33532. As shown in the gene expression heatmaps (), in each dataset these DEGs showed distinct expression patterns between normal samples and tumor samples. To obtain robust DEGs in NSCLC, we performed Venn diagrams () of up-regulated DEGs and down-regulated DEGs between each of the four datasets, and found 522 up-regulated DEGs and 989 down-regulated DEGs. These genes were considered as the DEGs of NSCLC for further analysis.
Figure 1

Screening the differentially expressed genes. (A,B,C,D) The differentially expressed genes profiles of four datasets. The green bar represents normal sample, and the orange bar represents tumor sample. (E,F) The Venn diagrams of up-regulated genes and down-regulated genes, respectively.

Screening the differentially expressed genes. (A,B,C,D) The differentially expressed genes profiles of four datasets. The green bar represents normal sample, and the orange bar represents tumor sample. (E,F) The Venn diagrams of up-regulated genes and down-regulated genes, respectively. To find out how these DEGs were correlated with overall survival, we put these DEGs into the K-M plotter database to analyze the median survival and log-rank p-value between the low expression cohort and the high expression cohort. DEGs with (log rank P) <0.001 were considered as DEGs correlated with overall survival (DEGOSs). After screening, we obtained 895 DEGOSs, among which 265 genes were up-regulated and 630 genes were down-regulated ( and online: http://fp.amegroups.cn/cms/tcr.2019.06.35-1.pdf). Importantly, high expression of up-regulated genes was significantly correlated with poor overall survival while the low expression of down-regulated genes was correlated with poor overall survival of patients ( and online: http://fp.amegroups.cn/cms/tcr.2019.06.35-1.pdf).
Table 1

Five representative up- or down-regulated DEGs correlated with overall survival

Gene symbolMedian survival (month)Log rank PHazard ratio
Low expression cohortHigh expression cohort
Upregulated genes
   ASPM9643.83<1E-161.76 (1.55–2.01)
   CDC209642.8<1E-161.82 (1.6–2.07)
   TPX296.242<1E-161.87 (1.64–2.12)
   KIF2C96.144<1E-161.78 (1.57–2.03)
   DLGAP594.543.83<1E-161.73 (1.52–1.97)
Downregulated genes
   CPED145.47125<1E-160.66 (0.56–0.77)
   SYNE14495.5<1E-160.68 (0.6–0.75)
   CBX742.2391<1E-160.53 (0.47–0.59)
   PTEN48124<1E-160.96 (0.82–1.12)
   PPM1K46134<1E-160.65 (0.56–0.77)

Function analysis of DEGOSs

To get insight into the function of DEGOSs, we performed GO enrichment analysis; the top 10 biological processes with statistical significance were shown in . The dysregulated genes of each pathway were listed in the . The upregulated genes enriched in the biological processes that associated with cell division, organelle division and chromosome division. The downregulated genes were enriched in the biological processes associated with angiogenesis and negative regulation of cell migration. The angiogenesis switch was governed by the countervailing factors that either induce or oppose angiogenesis (29,30). In this study, we found most angiogenesis inhibitors (MMRN2, RECK, ANGPT1, PECAM1, ROBO4 and S1PR1) and HIF 1α inhibitor (KLF2) were downregulated and enriched in the angiogenesis biological process (31-36). This indicated the angiogenesis inhibited switch was downregulated in the NSCLC.
Figure 2

The Gene Ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis of differentially expressed genes correlated with overall survival. (A,B) The top 10 upregulated and downregulated Gene ontology biological process terms with statistical significance. (C,D) The upregulated and downregulated KEGG pathways with statistical significance (adjusted P<0.05).

Table S1

The GO enrichment analysis

IDDescriptionP adjustGene ID
Upregulated
   GO: 0007059Chromosome segregation7.38E-37 NEK2/NUSAP1/BUB1B/CENPE/SPAG5/CCNE1/PTTG1/KIF18B/TOP2A/BIRC5/MKI67/CENPN/BRIP1/KIF14/NCAPG/KIF18A/RACGAP1/CENPF/ESPL1/MAD2L1/MIS18A/ZWINT/BLM/NDC80/RCC1/CCNB1/NDE1/TUBG1/NCAPD3/KIF23/BRCA1/ECT2/FEN1/CCNE2/CDCA8/CDC20/KIF2C/DLGAP5/BUB1/KIF4A/HJURP/TTK/PRC1/AURKB/TRIP13/CDT1/NCAPH/CDC6/SPC25/OIP5
   GO: 0140014Mitotic nuclear division6.04E-34 NEK2/NUSAP1/BUB1B/CENPE/SPAG5/CCNE1/PTTG1/KIF18B/MKI67/KIF14/NCAPG/KIF18A/CCNA2/RACGAP1/CENPF/ESPL1/MAD2L1/KIF11/AURKA/ZWINT/NDC80/RCC1/CCNB1/NDE1/CCNF/TUBG1/NCAPD3/KNTC1/KIF23/CCNE2/CDCA8/CDC20/TPX2/KIF2C/DLGAP5/BUB1/KIF4A/UBE2C/TTK/CCNB2/CHEK1/PRC1/AURKB/MYBL2/TRIP13/CDT1/NCAPH/CDC6
   GO: 0051301Cell division2.92E-33 ASPM/NEK2/NUSAP1/BUB1B/CENPE/SPAG5/KIF20A/CCNE1/PTTG1/KIF18B/TOP2A/BIRC5/NCAPG2/BRIP1/KIF14/NCAPG/CCNA2/RACGAP1/CENPF/ESPL1/MAD2L1/MIS18A/HELLS/KIF11/AURKA/ZWINT/BRCA2/BLM/NDC80/MDK/RCC1/CCNB1/NDE1/CKS2/CCNF/NCAPD3/KNTC1/E2F8/KIF23/PIMREG/ECT2/CKAP2/TIMELESS/CCNE2/CDCA8/CDC20/TPX2/KIF2C/BUB1/CEP55/KIF4A/UBE2C/CCNB2/PRC1/AURKB/CDCA3/CDT1/NCAPH/CDC6/SPC25/OIP5
   GO: 0000280Nuclear division3.08E-33 ASPM/NEK2/NUSAP1/BUB1B/CENPE/SPAG5/CCNE1/PTTG1/KIF18B/TOP2A/MKI67/BRIP1/KIF14/NCAPG/KIF18A/CCNA2/RACGAP1/CENPF/ESPL1/MAD2L1/KIF11/AURKA/ZWINT/BLM/NDC80/RCC1/CCNB1/NDE1/CKS2/CCNF/TUBG1/NCAPD3/KNTC1/KIF23/CCNE2/CDCA8/CDC20/TPX2/KIF2C/DLGAP5/BUB1/KIF4A/UBE2C/TTK/CCNB2/CHEK1/PRC1/AURKB/MYBL2/TRIP13/CDT1/NCAPH/CDC6
   GO: 0048285Organelle fission3.48E-31 ASPM/NEK2/NUSAP1/BUB1B/CENPE/SPAG5/CCNE1/PTTG1/KIF18B/TOP2A/MKI67/BRIP1/KIF14/NCAPG/KIF18A/CCNA2/RACGAP1/CENPF/ESPL1/MAD2L1/KIF11/AURKA/ZWINT/BLM/NDC80/RCC1/CCNB1/NDE1/CKS2/CCNF/TUBG1/NCAPD3/KNTC1/KIF23/CCNE2/CDCA8/CDC20/TPX2/KIF2C/DLGAP5/BUB1/KIF4A/UBE2C/TTK/CCNB2/CHEK1/PRC1/AURKB/MYBL2/TRIP13/CDT1/NCAPH/CDC6
   GO: 0098813Nuclear chromosome segregation3.33E-29 NEK2/NUSAP1/BUB1B/CENPE/SPAG5/CCNE1/PTTG1/KIF18B/TOP2A/BRIP1/KIF14/NCAPG/KIF18A/RACGAP1/CENPF/ESPL1/MAD2L1/ZWINT/BLM/NDC80/CCNB1/TUBG1/NCAPD3/KIF23/ECT2/FEN1/CCNE2/CDCA8/CDC20/KIF2C/DLGAP5/BUB1/KIF4A/TTK/PRC1/AURKB/TRIP13/CDT1/NCAPH/CDC6
   GO: 0000819Sister chromatid segregation3.73E-27 NEK2/NUSAP1/BUB1B/CENPE/SPAG5/PTTG1/KIF18B/TOP2A/KIF14/NCAPG/KIF18A/RACGAP1/CENPF/ESPL1/MAD2L1/ZWINT/NDC80/CCNB1/TUBG1/NCAPD3/KIF23/FEN1/CDCA8/CDC20/KIF2C/DLGAP5/BUB1/KIF4A/TTK/PRC1/AURKB/TRIP13/CDT1/NCAPH/CDC6
   GO: 0000070Mitotic sister chromatid segregation7.59E-27 NEK2/NUSAP1/BUB1B/CENPE/SPAG5/PTTG1/KIF18B/KIF14/NCAPG/KIF18A/RACGAP1/CENPF/ESPL1/MAD2L1/ZWINT/NDC80/CCNB1/TUBG1/NCAPD3/KIF23/CDCA8/CDC20/KIF2C/DLGAP5/BUB1/KIF4A/TTK/PRC1/AURKB/TRIP13/CDT1/NCAPH/CDC6
   GO: 0044770Cell cycle phase transition1.95E-24 NEK2/BUB1B/CENPE/GTSE1/CCNE1/TYMS/POLE2/MELK/KIF14/CCNA2/CENPF/MCM2/ESPL1/MAD2L1/AURKA/FGFR1OP/BLM/RPA3/NDC80/RCC1/RECQL4/CCNB1/RRM2/DONSON/NDE1/HMMR/CKS2/TUBG1/KNTC1/E2F8/PLK4/BRCA1/EZH2/CDKN2A/TIMELESS/CCNE2/CDC20/TPX2/DLGAP5/BUB1/UBE2C/TTK/CCNB2/CHEK1/AURKB/FOXM1/CDKN3/MCM4/TRIP13/CDT1/CDC6
   GO: 0044772Mitotic cell cycle phase transition3.96E-24 NEK2/BUB1B/CENPE/GTSE1/CCNE1/TYMS/POLE2/MELK/KIF14/CCNA2/CENPF/MCM2/ESPL1/MAD2L1/AURKA/FGFR1OP/BLM/RPA3/NDC80/RCC1/RECQL4/CCNB1/RRM2/DONSON/NDE1/HMMR/CKS2/TUBG1/KNTC1/E2F8/PLK4/BRCA1/EZH2/CDKN2A/CCNE2/CDC20/TPX2/DLGAP5/BUB1/UBE2C/TTK/CCNB2/AURKB/FOXM1/CDKN3/MCM4/TRIP13/CDT1/CDC6
Downregulated
   GO: 0001525Angiogenesis1.87E-06 TEK/MAP3K3/MMRN2/TMEM100/ETS1/PTPRM/SOX17/RECK/PPP1R16B/DCN/CX3CR1/SASH1/NPR1/TGFBR2/AGTR1/SLIT2/GATA2/GATA6/MEOX2/PDE3B/TCF21/JAM3/EMP2/KLF2/ANGPT1/STARD13/KDR/CDH5/FGF2/PTPRB/PECAM1/HGF/S1PR1/FGF18/ROBO4/PRKCB/GJA5/EDNRA/CAV1/ARHGAP24/CYR61/TAL1/CALCRL/ACVRL1/ENG/ANXA3/NR4A1/SEMA5A/HYAL1/EDN1/CX3CL1
   GO: 0010632Regulation of epithelial cell migration4.99E-06 TEK/EPB41L5/MAP3K3/MMRN2/ETS1/PTPRM/DCN/PRKCE/SASH1/MACF1/TGFBR2/SLIT2/GATA2/MEOX2/EMP2/RAB11A/ANGPT1/STARD13/KDR/FGF2/FGF18/CLASP2/PPM1F/BMPR2/PTPRG/ACVRL1/ANXA3/SEMA5A/HYAL1/EDN1
   GO: 0090130Tissue migration5.52E-06 TEK/EPB41L5/KANK2/MAP3K3/MMRN2/ETS1/PTPRM/DCN/FOXF1/PRKCE/SASH1/MACF1/TGFBR2/SLIT2/GATA2/MEOX2/EMP2/RAB11A/ANGPT1/STARD13/KDR/FGF2/PECAM1/ZEB2/FGF18/CLASP2/PPM1F/BMPR2/PTPRG/ACVRL1/ANXA3/NR4A1/SEMA5A/HYAL1/EDN1
   GO: 0010631Epithelial cell migration6.36E-06 TEK/EPB41L5/KANK2/MAP3K3/MMRN2/ETS1/PTPRM/DCN/PRKCE/SASH1/MACF1/TGFBR2/SLIT2/GATA2/MEOX2/EMP2/RAB11A/ANGPT1/STARD13/KDR/FGF2/PECAM1/ZEB2/FGF18/CLASP2/PPM1F/BMPR2/PTPRG/ACVRL1/ANXA3/NR4A1/SEMA5A/HYAL1/EDN1
   GO: 0090132Epithelium migration6.36E-06 TEK/EPB41L5/KANK2/MAP3K3/MMRN2/ETS1/PTPRM/DCN/PRKCE/SASH1/MACF1/TGFBR2/SLIT2/GATA2/MEOX2/EMP2/RAB11A/ANGPT1/STARD13/KDR/FGF2/PECAM1/ZEB2/FGF18/CLASP2/PPM1F/BMPR2/PTPRG/ACVRL1/ANXA3/NR4A1/SEMA5A/HYAL1/EDN1
   GO: 1904018Positive regulation of vasculature development2.78E-05 TEK/MAP3K3/TMEM100/ETS1/RAP1A/CFLAR/PPP1R16B/CX3CR1/SASH1/TGFBR2/AGTR1/GATA2/GATA6/KDR/CDH5/FGF2/HGF/FGF18/PRKCB/ACVRL1/ENG/ANXA3/SEMA5A/HYAL1/CX3CL1
   GO: 1901888Regulation of cell junction assembly3.69E-05 ARHGAP6/TEK/EPB41L5/DLC1/LIMCH1/RAP1A/PEAK1/MACF1/KDR/GPM6B/CLASP2/PPM1F/CAV1/ACVRL1/PRKCH/ACE/CLDN5
   GO: 2000146Negative regulation of cell motility3.69E-05 RGN/DLC1/MMRN2/PTPRM/ADARB1/LIMCH1/TBX5/SMAD7/RECK/DCN/CX3CR1/LRCH1/PRKG1/SLIT2/MEOX2/CXCL12/STARD13/SCAI/IL33/FGF2/PTGER4/CLASP2/PTPRG/MMP28/FBLN1/DUSP1/ACVRL1/ENG/PPARGC1A/NAV3/CX3CL1
   GO: 0051271Negative regulation of cellular component movement3.94E-05 RGN/DLC1/MMRN2/PTPRM/ADARB1/LIMCH1/TBX5/SMAD7/RECK/DCN/CX3CR1/LRCH1/PRKG1/SLIT2/MEOX2/CXCL12/STARD13/SCAI/IL33/FGF2/PTGER4/TGFBR3/CLASP2/PTPRG/MMP28/FBLN1/DUSP1/ACVRL1/ENG/PPARGC1A/NAV3/SEMA5A/CX3CL1
   GO: 1901342Regulation of vasculature development3.94E-05 TEK/MAP3K3/MMRN2/TMEM100/ETS1/PTPRM/RAP1A/CFLAR/PPP1R16B/DCN/CX3CR1/SASH1/NPR1/TGFBR2/AGTR1/GATA2/GATA6/MEOX2/PDE3B/EMP2/KLF2/STARD13/KDR/CDH5/FGF2/HGF/FGF18/PRKCB/ACVRL1/ENG/ANXA3/SEMA5A/HYAL1/CX3CL1
The Gene Ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis of differentially expressed genes correlated with overall survival. (A,B) The top 10 upregulated and downregulated Gene ontology biological process terms with statistical significance. (C,D) The upregulated and downregulated KEGG pathways with statistical significance (adjusted P<0.05). For a further exploration of the dysregulated pathways influenced by the DEGOSs, we performed the KEGG pathway enrichment ( and ). The upregulated genes were enriched in the KEGG pathways associated with DNA replication and cell cycle, while the downregulated genes were enriched in the KEGG pathways associated with cell adhesion and cell migration. This was consistent with the GO enrichment analysis.
Table S2

The KEGG analysis

IDDescriptionP adjustGene ID
Up-regulated
   hsa03440Homologous recombination2.08E-04 BRIP1/BRCA2/BLM/EME1/RPA3/BRCA1
   hsa00240Pyrimidine metabolism1.07E-03 NME1/TYMS/DTYMK/TK1/RRM2/DCTPP1
   hsa04914Progesterone-mediated oocyte maturation1.72E-02 CCNA2/MAD2L1/AURKA/CCNB1/BUB1/CCNB2
   hsa03030DNA replication9.69E-06 RNASEH2A/POLE2/MCM2/RPA3/FEN1/MCM4/RFC4
   hsa04115p53 signaling pathway7.90E-05 GTSE1/CCNE1/CCNB1/RRM2/CDKN2A/CCNE2/CCNB2/CHEK1
   hsa04218Cellular senescence2.23E-03 CCNE1/CCNA2/CCNB1/CDKN2A/CCNE2/CCNB2/CHEK1/FOXM1/MYBL2
   hsa03460Fanconi anemia pathway5.58E-08 BRIP1/FANCI/FANCD2/BRCA2/BLM/EME1/RPA3/RMI2/BRCA1/UBE2T
   hsa04114Oocyte meiosis7.90E-05 CCNE1/PTTG1/ESPL1/MAD2L1/AURKA/CCNB1/CCNE2/CDC20/BUB1/CCNB2
   hsa05166Human T-cell leukemia virus 1 infection2.94E-04 BUB1B/CCNE1/PTTG1/CCNA2/ESPL1/MAD2L1/SLC2A1/CDKN2A/CCNE2/CDC20/CCNB2/FGFR1OP
   hsa04110Cell cycle1.68E-11 BUB1B/CCNE1/PTTG1/CCNA2/MCM2/ESPL1/MAD2L1/CCNB1/CDKN2A/CCNE2/CDC20/BUB1/TTK/CCNB2/FGFR1OP/MCM4/BRIP1
Down-regulated
   hsa04270Vascular smooth muscle contraction1.41E-03 MYH11/PRKCE/GUCY1A2/NPR1/AGTR1/PRKG1/CACNA1D/PLA2G1B/ADCY4/PRKCB/EDNRA/PPP1R12B/CALCRL/PPP1R14A/PRKCH/EDN1
   hsa04924Renin secretion1.58E-03 ADRB2/GUCY1A2/NPR1/AGTR1/CACNA1D/PDE3B/PTGER4/EDNRA/ADRB1/ACE/EDN1
   hsa04514Cell adhesion molecules (CAMs)6.26E-03 JAM2/PTPRM/ITGA8/CADM1/JAM3/CDH5/ITGAL/PECAM1/ESAM/CLDN18/ICAM2/HLA-E/SELPLG/NFASC/CLDN5
   hsa04022cGMP-PKG signaling pathway6.26E-03 ADRB2/ATP1A2/PRKCE/GUCY1A2/NPR1/AGTR1/PRKG1/CACNA1D/PDE3B/PDE5A/ADCY4/EDNRB/PIK3R5/TRPC6/EDNRA/ADRB1
   hsa04670Leukocyte transendothelial migration1.41E-02 JAM2/RAP1A/JAM3/CXCL12/CDH5/ITGAL/PECAM1/ITK/PRKCB/ESAM/CLDN18/CLDN5

PPI network construction and cluster analysis

The PPI network provided a convenient method to visualize the protein-protein interactions and find the potential molecular mechanism. The PPI network was composed of DEGOSs in which we found 7 function clusters ranked by scores. We performed a GO analysis of each cluster to define its function ( and ). Cluster 1 and cluster 4 took part in the cell division mainly by chromosome segregation and organelle fusion. Cluster 2 seemed to have a function in blood coagulation. Cluster 3 and cluster 7 were functionally enriched in the G-protein coupled receptor signaling pathway. Cluster 4 played a role in the G2/M transition in the cell cycle. Cluster 5 had the only protein poly-ubiquitination with statistical significance. These clusters provided an overview of the potential molecular mechanisms in NSCLC by bioinformatics and may help us to understand the pathogenesis of NSCLC.
Figure 3

A–G represent the cluster 1–7. The nodes represent the protein. The edges represent the interaction between the proteins.

Table S3

The top 5 GO biological terms of each cluster

ClusterIDDescriptionP adjust
Cluster 1GO: 0007059Chromosome segregation1.01E-13
GO: 0140014Mitotic nuclear division1.01E-13
GO: 0000280Nuclear division2.81E-12
GO: 0048285Organelle fission6.48E-12
GO: 0000070Mitotic sister chromatid segregation2.40E-11
Cluster 2GO: 0007059Chromosome segregation1.01E-13
GO: 0140014Mitotic nuclear division1.01E-13
GO: 0000280Nuclear division2.81E-12
GO: 0048285Organelle fission6.48E-12
GO: 0000070Mitotic sister chromatid segregation2.40E-11
Cluster 3GO: 0035589G protein-coupled purinergic nucleotide receptor signaling pathway0.021666
GO: 0035590Purinergic nucleotide receptor signaling pathway0.021986
GO: 0035588G protein-coupled purinergic receptor signaling pathway0.021986
GO: 0050921Positive regulation of chemotaxis0.021986
GO: 0050927Positive regulation of positive chemotaxis0.021986
Cluster 4GO: 0010389Regulation of G2/M transition of the mitotic cell cycle5.27E-09
GO: 1902749Regulation of cell cycle G2/M phase transition7.15E-09
GO: 0000086G2/M transition of the mitotic cell cycle1.91E-08
GO: 0044839Cell cycle G2/M phase transition3.41E-08
GO: 0000226Microtubule cytoskeleton organization7.26E-08
Cluster 5GO: 0000209Protein polyubiquitination0.000605
GO: 0043687Post-translational protein modification0.066311
GO: 0006511Ubiquitin-dependent protein catabolic process0.096811
GO: 0019941Modification-dependent protein catabolic process0.096811
Cluster 6GO: 0000280Nuclear division3.10E-07
GO: 0048285Organelle fission3.10E-07
GO: 0140014Mitotic nuclear division9.30E-07
GO: 0051301Cell division9.30E-07
GO: 0030261Chromosome condensation9.69E-07
Cluster 7GO: 0007188Adenylate cyclase-modulating G protein-coupled receptor signaling pathway1.83E-08
GO: 0019932Second-messenger-mediated signaling1.83E-08
GO: 0007187G protein-coupled receptor signaling pathway coupled to cyclic nucleotide second messenger1.91E-08
GO: 0019935Cyclic-nucleotide-mediated signaling4.95E-07
GO: 0007189Adenylate cyclase-activating G protein-coupled receptor signaling pathway2.60E-06
A–G represent the cluster 1–7. The nodes represent the protein. The edges represent the interaction between the proteins.

Hub gene identification

Hub genes are highly interconnected genes and play central roles in the PPI network. They have the potential to be biomarkers and therapeutic targets. To obtain a robust result, we calculated the degree and MCC scores for each node. The top 10 in both algorithms were selected as hub genes and used to construct PPI networks to visualize the interaction. We identified six hub genes which were BUB1B, CCNB1, CENPE, KIF18A, MAD2L1, and NDC80 (). All these genes had a function associated with chromosome and spindle behavior of mitotic cell division and showed an up-regulated expression level in NSCLC tissue () (37-39).
Figure 4

A represents the six hub genes and their interaction.

Table S4

The logFC and P adjust of six hub genes in the datasets

DatasetsGSE18842GSE19188GSE27262GSE33532
LogFCP adjustLogFCP adjustLogFCP adjustLogFCP adjust
BUB1B4.731.92E-363.618.79E-323.371.58E-123.877.69E-17
CCNB14.724.17E-413.817.10E-353.171.11E-124.332.31E-22
CENPE3.398.97E-372.611.04E-271.581.30E-052.771.55E-11
KIF18A2.271.02E-181.577.68E-141.471.95E-061.594.44E-08
MAD2L14.641.61E-372.982.10E-222.817.32E-123.791.88E-14
NDC803.643.92E-323.171.12E-272.542.79E-093.782.17E-16
A represents the six hub genes and their interaction.

Validation of hub genes in TCGA and HPA

To validate whether the hub genes identified above were of universal significance, we analyzed their mRNA expression level and overall survival in TCGA lung cancer RNA-seq data and protein level in HPA immunohistochemical data. We found that all these hub genes were abnormally up-regulated expressed (), and the protein level was up-regulated consistently with mRNA level, as shown in . They were also correlated with overall survival (). This evidence gives support to the universal significance of our results.
Figure 5

The validation of the mRNA expression level and the protein expression level of hub genes. (A,B,C,D,E,F) The mRNA expression level of six hub genes in the Cancer Genome Atlas (TCGA); (G) the protein level of the four hub genes that were found in Human Protein Atlas (HPA).

Figure 6

The overall survival of six hub genes in the Cancer Genome Atlas (TCGA) database.

The validation of the mRNA expression level and the protein expression level of hub genes. (A,B,C,D,E,F) The mRNA expression level of six hub genes in the Cancer Genome Atlas (TCGA); (G) the protein level of the four hub genes that were found in Human Protein Atlas (HPA). The overall survival of six hub genes in the Cancer Genome Atlas (TCGA) database.

Discussion

In our study, we explored the hub genes and potential mechanisms based on the DEGs correlated with overall survival in NSCLC for the first time. We found that cell division, cell cycle, cell migration, DNA replication, and angiogenesis might be the major dysregulated functions in NSCLC. P53 signaling pathway and cGMP-PKG signaling pathway might be the major dysregulated signaling pathways. Furthermore, there were six genes associated with cell division that were identified as hub genes. Finally, we performed an independent validation of hub genes in TCGA and HPA and found that these hub genes did show abnormal expression patterns in NSCLC and correlated with overall survival of NSCLC patients. In our function analysis, we found the upregulated genes enriched in the cell division, cell cycle and chromosome segregation process. Spindle assembly checkpoint played important roles in these processes. And the hub genes were the key components of the spindle assembly checkpoint. The downregulated genes enriched in the negative regulation of cell migration and angiogenesis. In the cell migration process, cell adhesion molecules like JAM2 and PECAM1 were downregulated to break the cell junction and increased cell motility (33,40). In the angiogenesis process, key inhibitors of several angiogenic switch were downregulated. ANGPT1/TEK and ROBO4/SLIT2 were both the receptor/ligand system to promote the vascular stability in the physiological condition (32,35,36,41-43). Many clinical studies showed that ANGPT1 and TEK expression level was downregulated in the NSCLC tissue and loss of ANGPT1/TEK system increased metastasis and angiogenesis in the mice model (32,44,45). But the mechanism was still to be clarified. ROBO4/SLIT2 system suppressed the tumor growth and metastasis through the tumor angiogenesis in breast cancer (41). SLIT2 could suppress cell migration by the β-catenin and the AKT-GSK3β signaling pathway in colorectal cancer (42). The downregulation of SLIT2 expression was also observed in the clinical lung cancer patient cases (44,45). In vitro experiment showed that SLIT2 could promote the NSCLC cells metastasis to the brain by downregulating the CDH2. The KEGG pathway enrichment results were consistent with GO analysis. The NSCLC tissue showed upregulated pathways associated with cell division and DNA replication and downregulated pathways involved of cell adhesion and negative regulation of cell migration. The KEGG analysis also provided a new clue that renin secretion pathway might have some special roles in the NSCLC progress. Many reports showed ACE protein and mRNA level was decreased and associated with the angiogenesis and tumor growth in the NSCLC patients (46,47). But the molecular mechanism had not been reported. In the six hub genes, CENPE and KIF18A were identified for the first time. Both of these are kinesins superfamily members that take part in the mitotic spindle (48,49). The mitotic spindle is a popular drug target validated in cancer chemotherapy. Taxanes and vinca alkaloids, which target the tubulin in mitotic spindle, have been used in successful clinical application (49). Centromeric protein E (CENPE) is a kinesin-like motor protein that establishes and maintains the connection between spindle microtubules and chromosomes and shifts the chromosomes to the metaphase plate powered by ATPase activity. It is required for the transformation from the metaphase to the anaphase. Aurora B and protein phosphate 1 control the phosphorylation and dephosphorylation of CENPE to modulate its operation, which is essential for chromosome congression. BUB1B also controls its action at the kinetics as a mitotic checkpoint event (49,50). The protein level of CENPE should kept in balance to maintain cell division. Up-regulation of CENPE has been reported with a worse overall survival in breast cancer (51), but down-regulation of CENPE has been found in liver cancer (50). However, reports about CENPE in NSCLC are rare. Due to the importance of CENPE, it is considered a drug target in cancer therapy. There are three inhibitors of CENPE that showed good effects (50). GSK923295 is an allosteric inhibitor of CENPE kinesin motor ATPase activity that induced tumor cell apoptosis and tumor regression. In the phase I clinical study which involved 39 refractory cancer patients (including 4 NSCLC patients), GSK923295 exhibited dose-proportional pharmacokinetics with a low incidence of myelosuppression and neuropathy. PF-2271 is a highly specific CENPE motor activity which shows no inhibition of protein kinase and ATPase activity of highly related kinesins. It inhibits basal-like breast cancer cell survival without inhibition to normal breast epithelial cells. Despite this, there is no report about the effect of PF-2271 in NSCLC. Compound A is a CENPE ATPase inhibitor showed a time-dependent anti-proliferative activity. It has been proved to inhibit the proliferation of multiple lung cancer cell lines. KIF18A takes part in the chromosome congression by reducing the amplitude of preanaphase oscillations and slowing poleward movement during anaphase. It has been reported that KIF18A is overexpressed in colorectal cancer and its overexpression is also correlated with the grade, metastasis and worse survival in breast cancer (52-54). There still is no report about KIF18A. BTB-1 has been the only inhibitor of KIF18A known until now. However, it has shown a high specificity and inhibition effect to KIF18A (50). Based on the clues above and the high correlation with NSCLC survival, we think CENPE and KIF18A are worthy of further study and may serve as therapeutic targets in NSCLC. In summary, we identified some new pathways and two new hub genes in NSCLC based on the DEGs correlated with overall survival. The two hub genes, CENPE and KIF18A, are associated with solid tumors, and the inhibitor of CENPE in the phase I clinical study. Hub gene screening based on survival might be more clinically significant, and our method might be more powerful. However, hub gene function in NSCLC still needs further investigation.
  51 in total

1.  Identifying hub genes and dysregulated pathways in hepatocellular carcinoma.

Authors:  B Jin; W Wang; G Du; G-Z Huang; L-T Han; Z-Y Tang; D-G Fan; J Li; S-Z Zhang
Journal:  Eur Rev Med Pharmacol Sci       Date:  2015-02       Impact factor: 3.507

2.  Activation of Robo1 signaling of breast cancer cells by Slit2 from stromal fibroblast restrains tumorigenesis via blocking PI3K/Akt/β-catenin pathway.

Authors:  Po-Hao Chang; Wendy W Hwang-Verslues; Yi-Cheng Chang; Chun-Chin Chen; Michael Hsiao; Yung-Ming Jeng; King-Jen Chang; Eva Y-H P Lee; Jin-Yuh Shew; Wen-Hwa Lee
Journal:  Cancer Res       Date:  2012-07-23       Impact factor: 12.701

Review 3.  Identification of Commonly Dysregulated Genes in Non-small-cell Lung Cancer by Integrated Analysis of Microarray Data and qRT-PCR Validation.

Authors:  Zi-Qiang Tian; Zhen-Hua Li; Shi-Wang Wen; Yue-Feng Zhang; Yong Li; Jing-Ge Cheng; Gui-Ying Wang
Journal:  Lung       Date:  2015-04-08       Impact factor: 2.584

Review 4.  Genomic alterations in lung adenocarcinoma.

Authors:  Siddhartha Devarakonda; Daniel Morgensztern; Ramaswamy Govindan
Journal:  Lancet Oncol       Date:  2015-07       Impact factor: 41.316

5.  Chemogenetic evaluation of the mitotic kinesin CENP-E reveals a critical role in triple-negative breast cancer.

Authors:  Pei-Pei Kung; Ricardo Martinez; Zhou Zhu; Michael Zager; Alessandra Blasina; Isha Rymer; Jill Hallin; Meirong Xu; Christopher Carroll; John Chionis; Peter Wells; Kirk Kozminski; Jeffery Fan; Oivin Guicherit; Buwen Huang; Mei Cui; Chaoting Liu; Zhongdong Huang; Anand Sistla; Jennifer Yang; Brion W Murray
Journal:  Mol Cancer Ther       Date:  2014-06-13       Impact factor: 6.261

6.  Chromosome missegregation rate predicts whether aneuploidy will promote or suppress tumors.

Authors:  Alain D Silk; Lauren M Zasadil; Andrew J Holland; Benjamin Vitre; Don W Cleveland; Beth A Weaver
Journal:  Proc Natl Acad Sci U S A       Date:  2013-10-16       Impact factor: 11.205

7.  Expression analysis of angiogenesis-related genes in Bulgarian patients with early-stage non-small cell lung cancer.

Authors:  Svetlana Nikolova Metodieva; Dragomira Nikolaeva Nikolova; Radostina Vlaeva Cherneva; Ivanka Istalianova Dimova; Danail Borisov Petrov; Draga Ivanova Toncheva
Journal:  Tumori       Date:  2011 Jan-Feb

8.  cytoHubba: identifying hub objects and sub-networks from complex interactome.

Authors:  Chia-Hao Chin; Shu-Hwa Chen; Hsin-Hung Wu; Chin-Wen Ho; Ming-Tat Ko; Chung-Yen Lin
Journal:  BMC Syst Biol       Date:  2014-12-08

9.  GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses.

Authors:  Zefang Tang; Chenwei Li; Boxi Kang; Ge Gao; Cheng Li; Zemin Zhang
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

Review 10.  Novel endogenous angiogenesis inhibitors and their therapeutic potential.

Authors:  Nithya Rao; Yu Fei Lee; Ruowen Ge
Journal:  Acta Pharmacol Sin       Date:  2015-09-14       Impact factor: 6.150

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.