Literature DB >> 35117487

Comprehensive analysis reveals a four-gene signature in colorectal cancer.

Bin Zhao^1,2, Zheng Wan^1,2, Xiaohong Zhang^1,2, Yilin Zhao^1,2.

Abstract

BACKGROUND: Colorectal cancer (CRC) is one of the major malignant diseases of the gastrointestinal system around the world. However, the current therapeutic regimens were not always effective. This study was designed to identify and depict potential molecular biomarkers and correlated signal pathways in CRC.
METHODS: The gene expression profiles of GSE21510 were obtained on the Gene Expression Omnibus website, we filtered out 44 samples from the GSE21510 to identify different expression genes (DEGs) between CRC tissues and noncancerous tissues. Subsequently, the function and signal pathways enrichment analyses were implemented, the protein-protein interaction (PPI) networks of DEGs were to be carried out, and the hub genes were screened by MCODE built in Cytoscape software. Lastly, we have validated gene expressions and overall survival analyses of these hub genes in related datasets, such as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ), built in TCGA/GTEx database.
RESULTS: Results showed that a totally of 166 up-regulated genes and 260 down-regulated genes were identified and met the following criteria: |log2 fold change| ≥2 & adjusted P value <0.01. Here, we identified AURKA, BUB1, DLGAP5 and HMMR, which were associated with the regulation of mitotic cycle phase transition and oocyte meiosis pathways.
CONCLUSIONS: The findings of these four genes in this study may shed light on the mechanisms of these four genes as drug-sensitive therapeutic targets for the patients of CRC. 2020 Translational Cancer Research. All rights reserved.

Entities: Chemical

Keywords: Differentially expressed genes (DEGs); biomarkers; colon cancer; colorectal cancer (CRC); survival analyses

Year: 2020 PMID： 35117487 PMCID： PMC8799256 DOI： 10.21037/tcr.2020.01.18

Source DB: PubMed Journal: Transl Cancer Res ISSN： 2218-676X Impact factor: 1.241

Introduction

Colorectal cancer (CRC) is known as one of the most common non-skin cancers diagnosed both in men and women in the world. CRC often begins as a polyp inside the colon or rectum. Adenomas, a form of polyps, are innocent tumors within the tissue of the colon or rectum (1). Though most polyps will stay benign, some of them have the probabilities of converting into cancer as time goes on. There were reported more than 9.4 million new cases in 2015 and nearly 832,000 deaths in developed countries (2,3). As a heterogeneous disease, CRC is associated with gene aberration, the microenvironment of tumor initiation, progression and metastasis (4). Over the last decade, many molecular biomarkers and signal pathways associated with the occurrence and progression of CRC have been reported, which has been involved in clinical therapy. To date, owing to the high incidence and mortality in the CRC disease, uncovering the causes and the molecular characterization of CRC to discover the molecular biomarkers for initial diagnosis, prevention and personalized therapy is quite urgent and demanded. The recent high-throughput sequence techniques for the analyses of different gene expression and alternative splicing variants, like microarrays or the RNA-seq chip, are increasingly valued as promising techniques in medical oncology with great clinical applications, such as molecular diagnosis, prognosis prediction, drug targets discovery. Many gene expression profiling studies have been focused on CRC in the last decade (5), and hundreds of potential gene biomarkers have been obtained (6), which involved in different functional enrichments [including biological process (BP), cellular components (CC), molecular function (MF)], signal pathways and protein-protein interaction (PPI) networks. In this study, we have obtained the dataset (GSE21510) from National Center Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (NCBI-GEO) (website: https://www.ncbi.nlm.nih.gov/geo/), and we have chosen 25 CRC and 19 noncancerous tissue samples for differentially expressed genes (DEGs) analysis, gene ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, PPI networks complex construction, and hub genes exploration. What’s more, we have validated gene expression levels and overall survival (OS) analyses of these hub genes in related datasets [colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ)] built in the TCGA/GTEx database. Moreover, the filtered potential genes associated with CRC be recognized as biomarkers for diagnosis, prognosis and drug targets. shows the workflow of this study.

Figure 1

The pipeline of screening of differentially expressed genes (DEGs) and some descriptions of DEGs. (A) The workflow of research. (B) Volcano plot of DEGs. Red: up-regulated DEGs; Blue: down-regulated DEGs. (C) Heatmap of the 30 significantly DEGs (15 up-regulated DEGs and 15 down-regulated DEGs, respectively).

Methods

Data collection

The gene expression microarray data GSE21510 (7) were collected from the NCBI-GEO database (8,9). The GSE21510, taken the Affymetrix GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) as a reference, was submitted by Kaoru Mogushi et al., which featured 104 CRC patients. We choose 19 patients (CRC tissues, cancer group) and 25 patients (noncancerous tissues, normal group) to identify the genes and pathways.

Data preprocessing

After GSE21510 was obtained, probe identification numbers (IDs) were converted into gene symbols or ENTREZID. For multiple probes corresponding to the same one gene, their most significantly expressed value was treated as the gene expression value. Non-mRNA probes were discarded. Then, the gene expression values were normalized by using the Affy package, and RMA signal intensity was performed with log2 transformation and normalization (10,11).

Identification of DEGs

Linear models for microarray data (limma) is an R package applied to analyze gene expression matrix, especially when the linear models are constructed to assess the differentially expressed gene expression under the designed experiment condition (12). Limma package (http://bioconductor.org/packages/2.4/bioc/html/limma.html) built in R was applied to identify the DEGs between CRC tissues (cancer group) and noncancerous tissues (normal group). Significant DEGs were selected for further analyses with setting the |log2 fold change (FC)| ≥2 & the adjusted P value (adj.P.Val) <0.01 (13).

Function and signal pathway enrichment analysis

GO provides a controlled vocabulary of terms to elucidate a gene product’s characteristics via their annotation. GO terms reflect what is currently known about a gene in terms of BP, CC and MF (14,15). Moreover, KEGG (16) provides data resources of known biological pathways to annotate a gene or a set of genes/proteins with their respective KEGG pathways. In order to illustrate the function and signal pathway analysis of DEGs, GO and KEGG pathway enrichment analyses were carried out with using the clusterProfiler package and ReactomePA package (17,18) in R and P value <0.05 was considered significance.

PPI network and gene module analysis

Search Tool for the Retrieval of Interacting Genes (STRING) (19) is an open access database designed to evaluate the PPI messages of DEGs. STRING (version 10.5) covers 9.6 million proteins originated from 2,031 organisms. At first, we uploaded and mapped the list of DEGs to STRING website. Then PPIs of DEGs with a combined score >0.4 (medium confidence) and genes closely correlated with the other genes were chosen with the degree ≥10 (20). After that, PPI networks were constructed by using the Cytoscape software (21). The plug-in Molecular Complex Detection (MCODE) built in Cytoscape was applied to pick out the significant gene modules of the PPI networks. The parameters were set as follows: MCODE scores >3 and the count of nodes >4. Finally, we selected two significant gene modules (including 46 genes) from the PPI networks for further validation analyses.

Validation of four genes in TCGA/GTEx

To further screen for precise biomarkers, we have validated these 46 genes at gene expression level and OS time on web server GEPIA, which integrated COAD and READ datasets. The gene expression consistency and the survival analyses of these candidate genes were tested and evaluated in GEPIA. As for gene expression validation involved in 367 tumor samples (COAD: 275, READ: 92) and 667 normal samples (COAD: 349, READ: 318), the threshold with |log2 FC| ≥2 & P value <0.01 was considered statistically significant. For the OS analyses in integrated COAD & READ datasets, the 362 patients with available OS time data were sorted into low- and high-expression groups by the median transcripts per kilobase million (TPM), and significance was decided by the log-rank test with P<0.05.

Results

DEGs identification

Microarray data of 19 CRC tissues (cancer group) and 25 noncancerous tissues (normal group) were analyzed by the limma package built in R through the linear model and the contrast model to identify DEGs. A total of 426 DEGs were selected by the criteria of adjusted P value <0.01 & |log2 FC| ≥2, including 166 up-regulated genes and 260 down-regulated genes (, ). The hierarchical cluster analysis was done to show the most 15 significantly up-regulated genes and 15 down-regulated genes in . The top ten genes with the most significant expression were CLDN1, CEMIP, UGP2, EPB41L3, CDH3, ENC1, PPM1H, GLTP, SLC6A6 and SEMA6A ().

Table 1

The identified differentially expressed genes

DEGs	Gene symbol
Up-regulated	CLDN1, CEMIP, CDH3, ENC1, PPM1H, SLC6A6, ASCL2, VSNL1, FOXQ1, FABP6, PPAT, NUFIP1, NUDCD1, RAD54B, C2, HILPDA, MYC, CRNDE, NFE2L3, CKAP2, AJUBA, IRAK1BP1, CDK4, ATP11A, SLC7A5, EPHX4, LOC101060264, NEBL, PUS7, TRIB3, HOMER1, FAM60A, UTP14A, AXIN2, CSE1L, PPIL1, LYAR, DGAT2, SLC39A10, PAICS, CCDC113, TOP1MT, PROCR, MCM2, ZAK, ZC3HAV1L, HSP90AB1, TMPRSS3, UBE2C, LGR5, NUF2, RFC3, CDC25B, PRMT3, HELLS, DSCC1, PALD1, SLCO4A1, PLAU, ATAD2, BICD1, PARPBP, E2F7, TGFBI, PMAIP1, RPP40, AZGP1, DPEP1, KRT23, SPC25, MET, SP5, KIF20A, TRIP13, CDKN3, SLC12A2, PABPC1L, MMP1, ANLN, HS2ST1, CDK1, FAM92A1, CXCL3, SLC22A3, RNF43, ZNRF3, DUSP14, PSAT1, TPX2, ACSL4, AURKA, CKS2, MKI67, LDLRAD3, RAD51AP1, PHLDA1, SUPT16H, INHBA, SRPX2, MMP7, CXCL1, RASSF10, KIF4A, CXCL8, MMP3, TOP2A, BMP4, TESC, TTK, CTHRC1, HMMR, CDCA7, BUB1, DACH1, SLC38A5, CYP4X1, C2CD4A, GNG4, ELOVL5, CXCL2, FAM83D, CHI3L1, KLK10, TCFL5, LEF1, TACSTD2, DLGAP5, PRC1, ACSL6, NKD1, CADPS, NEK2, CENPK, MMP12, WT1, SLC35D3, CLDN2, PRR11, GINS1, COL11A1, CXCL10, CCNB1, CEP55, PTPRO, PLEKHB1, RGCC, CXCL11, SLCO1B3, GDF15, HS6ST2, RRM2, GAL, APCDD1, CXCL5, KRT6B, CKMT2, COL4A1, COL1A2, EDNRA, SPP1, TCN1, REG3A, AMIGO2, PPBP, WDR72, COL15A1
Down-regulated	UGP2, EPB41L3, GLTP, SEMA6A, SCARA5, PPAP2A, UGCG, IL6R, RELL1, TP53INP2, GCNT2, SPPL2A, ETFDH, ADH1B, SGK1, MXI1, CNTN3, NR3C2, MMP28, ABCA8, ZZEF1, C2orf88, RHOU, TMEM220, SYTL4, KLF4, AGPAT9, SMIM14, KAT2B, ABCG2, METTL7A, TMCC3, FRMD3, SLC2A13, PARM1, SMPD1, ARRDC4, TSPAN7, ABI3BP, RUNDC3B, SLC30A4, SLC4A4, GRAMD3, ADCY9, SRI, HPGD, GUCA2A, MIER3, PLCE1, SLCO2A1, LIFR, PCSK5, PRKACB, FAM107B, TMEM100, WDR78, SCIN, FAM46A, MT1M, GUCA2B, SLC51B, PDE9A, PLCL2, SPIB, C1orf115, NR5A2, CDHR5, USP2, CA7, PADI2, CD177, BEST4, CPNE8, ENTPD5, HOXD1, PDE3A, PHLPP2, ANO5, CXCL12, SLC30A10, TEX11, KIAA1211, SEMA6D, SFRP1, TLCD2, TMEM56, RMDN2, MAOA, APPL2, ITM2C, MXD1, RHOF, NPY1R, SCNN1B, LDHD, PDK4, BHLHE41, PTPRH, BMP2, ADTRP, CCDC68, ARHGAP44, CHP2, ARHGAP42, PAG1, MT1E, PKIB, MEIS3P1, MT1HL1, MT1X, HDAC9, SSPN, SMPDL3A, ENDOD1, CLDN23, EDIL3, EPB41L4A, OSBPL1A, MT2A, HRCT1, MT1H, GBA3, CMAHP, SLC17A4, VLDLR, ITM2A, MALL, NAAA, GDPD3, AHCYL2, AQP8, TMEM72, CHST5, THRB, TBC1D9, SLC1A1, HSD17B2, SRPX, FGL2, HAGLR, CLU, CDHR2, KIF16B, VSIG2, CCL28, CA4, BEST2, MT1G, TNFSF10, MT1F, VILL, F2RL1, CES2, LAMA1, SLC41A2, RNF125, EGLN3, DHRS11, RETSAT, EMP1, OGN, CA2, CHRDL1, TMEM171, CWH43, CEACAM7, ST6GALNAC6, B3GALT5, LRRC19, ADH1C, FMO5, LGALS2, PTPRR, PRSS12, ARL14, LRRC66, PTGDR, SPON1, ZBTB7C, ANPEP, PIGR, EDN3, DHRS9, ZG16, HHLA2, PROM2, BCAS1, AKR1B10, CLCA4, LINC01133, BTNL8, MUC12, MALAT1, SLC16A9, TUBAL3, UGT2A3, HEPACAM2, NXPE4, CLIC6, TTC22, FAM134B, MS4A12, CA1, TRPM6, SI, CEACAM1, CYBRD1, IQGAP2, SLC16A14, SLC26A2, LYPD8, FOXP2, MYLK, HMGCS2, CFD, RASSF6, HSD11B2, GHR, AKAP7, DEFB1, DSC2, SULT1B1, PCK1, B3GNT7, NXPE1, MOGAT2, MEIS1, C4orf19, GGT6, IL1R2, MGP, SCNN1G, FHL1, LEPREL1, PLAC8, ENPP3, CLDN8, GCG, TSPAN1, ABCB1, NR1H4, MFAP5, FCGBP, VIP, RARRES1, SLC51A, SPINK5, CAPN13, INSL5, MEP1A, IGJ, GCNT3, ISX, MUC2, SYNPO2, AGR3, SFRP2, CLCA1, ITLN1, HSD3B2

DEGs

Gene symbol

Up-regulated

CLDN1, CEMIP, CDH3, ENC1, PPM1H, SLC6A6, ASCL2, VSNL1, FOXQ1, FABP6, PPAT, NUFIP1, NUDCD1, RAD54B, C2, HILPDA, MYC, CRNDE, NFE2L3, CKAP2, AJUBA, IRAK1BP1, CDK4, ATP11A, SLC7A5, EPHX4, LOC101060264, NEBL, PUS7, TRIB3, HOMER1, FAM60A, UTP14A, AXIN2, CSE1L, PPIL1, LYAR, DGAT2, SLC39A10, PAICS, CCDC113, TOP1MT, PROCR, MCM2, ZAK, ZC3HAV1L, HSP90AB1, TMPRSS3, UBE2C, LGR5, NUF2, RFC3, CDC25B, PRMT3, HELLS, DSCC1, PALD1, SLCO4A1, PLAU, ATAD2, BICD1, PARPBP, E2F7, TGFBI, PMAIP1, RPP40, AZGP1, DPEP1, KRT23, SPC25, MET, SP5, KIF20A, TRIP13, CDKN3, SLC12A2, PABPC1L, MMP1, ANLN, HS2ST1, CDK1, FAM92A1, CXCL3, SLC22A3, RNF43, ZNRF3, DUSP14, PSAT1, TPX2, ACSL4, AURKA, CKS2, MKI67, LDLRAD3, RAD51AP1, PHLDA1, SUPT16H, INHBA, SRPX2, MMP7, CXCL1, RASSF10, KIF4A, CXCL8, MMP3, TOP2A, BMP4, TESC, TTK, CTHRC1, HMMR, CDCA7, BUB1, DACH1, SLC38A5, CYP4X1, C2CD4A, GNG4, ELOVL5, CXCL2, FAM83D, CHI3L1, KLK10, TCFL5, LEF1, TACSTD2, DLGAP5, PRC1, ACSL6, NKD1, CADPS, NEK2, CENPK, MMP12, WT1, SLC35D3, CLDN2, PRR11, GINS1, COL11A1, CXCL10, CCNB1, CEP55, PTPRO, PLEKHB1, RGCC, CXCL11, SLCO1B3, GDF15, HS6ST2, RRM2, GAL, APCDD1, CXCL5, KRT6B, CKMT2, COL4A1, COL1A2, EDNRA, SPP1, TCN1, REG3A, AMIGO2, PPBP, WDR72, COL15A1

Down-regulated

UGP2, EPB41L3, GLTP, SEMA6A, SCARA5, PPAP2A, UGCG, IL6R, RELL1, TP53INP2, GCNT2, SPPL2A, ETFDH, ADH1B, SGK1, MXI1, CNTN3, NR3C2, MMP28, ABCA8, ZZEF1, C2orf88, RHOU, TMEM220, SYTL4, KLF4, AGPAT9, SMIM14, KAT2B, ABCG2, METTL7A, TMCC3, FRMD3, SLC2A13, PARM1, SMPD1, ARRDC4, TSPAN7, ABI3BP, RUNDC3B, SLC30A4, SLC4A4, GRAMD3, ADCY9, SRI, HPGD, GUCA2A, MIER3, PLCE1, SLCO2A1, LIFR, PCSK5, PRKACB, FAM107B, TMEM100, WDR78, SCIN, FAM46A, MT1M, GUCA2B, SLC51B, PDE9A, PLCL2, SPIB, C1orf115, NR5A2, CDHR5, USP2, CA7, PADI2, CD177, BEST4, CPNE8, ENTPD5, HOXD1, PDE3A, PHLPP2, ANO5, CXCL12, SLC30A10, TEX11, KIAA1211, SEMA6D, SFRP1, TLCD2, TMEM56, RMDN2, MAOA, APPL2, ITM2C, MXD1, RHOF, NPY1R, SCNN1B, LDHD, PDK4, BHLHE41, PTPRH, BMP2, ADTRP, CCDC68, ARHGAP44, CHP2, ARHGAP42, PAG1, MT1E, PKIB, MEIS3P1, MT1HL1, MT1X, HDAC9, SSPN, SMPDL3A, ENDOD1, CLDN23, EDIL3, EPB41L4A, OSBPL1A, MT2A, HRCT1, MT1H, GBA3, CMAHP, SLC17A4, VLDLR, ITM2A, MALL, NAAA, GDPD3, AHCYL2, AQP8, TMEM72, CHST5, THRB, TBC1D9, SLC1A1, HSD17B2, SRPX, FGL2, HAGLR, CLU, CDHR2, KIF16B, VSIG2, CCL28, CA4, BEST2, MT1G, TNFSF10, MT1F, VILL, F2RL1, CES2, LAMA1, SLC41A2, RNF125, EGLN3, DHRS11, RETSAT, EMP1, OGN, CA2, CHRDL1, TMEM171, CWH43, CEACAM7, ST6GALNAC6, B3GALT5, LRRC19, ADH1C, FMO5, LGALS2, PTPRR, PRSS12, ARL14, LRRC66, PTGDR, SPON1, ZBTB7C, ANPEP, PIGR, EDN3, DHRS9, ZG16, HHLA2, PROM2, BCAS1, AKR1B10, CLCA4, LINC01133, BTNL8, MUC12, MALAT1, SLC16A9, TUBAL3, UGT2A3, HEPACAM2, NXPE4, CLIC6, TTC22, FAM134B, MS4A12, CA1, TRPM6, SI, CEACAM1, CYBRD1, IQGAP2, SLC16A14, SLC26A2, LYPD8, FOXP2, MYLK, HMGCS2, CFD, RASSF6, HSD11B2, GHR, AKAP7, DEFB1, DSC2, SULT1B1, PCK1, B3GNT7, NXPE1, MOGAT2, MEIS1, C4orf19, GGT6, IL1R2, MGP, SCNN1G, FHL1, LEPREL1, PLAC8, ENPP3, CLDN8, GCG, TSPAN1, ABCB1, NR1H4, MFAP5, FCGBP, VIP, RARRES1, SLC51A, SPINK5, CAPN13, INSL5, MEP1A, IGJ, GCNT3, ISX, MUC2, SYNPO2, AGR3, SFRP2, CLCA1, ITLN1, HSD3B2

To investigate functions and signal pathway enrichment of identified DEGs, we further analyzed DEGs using the clusterProfiler package and ReactomePA package in R with criteria of P<0.05. show the significant ten BP, CC and MF enrichment terms, respectively. Moreover, shows the top 30 significant GO terms.

Figure 2

The gene ontology and signal pathway enrichments. (A) The top ten functional enrichment analysis of DEGs in biological process, cellular component and molecular function group, respectively; (B) the top 30 significantly enriched GO terms of DEGs; (C) the significantly enriched signal pathways of DEGs in CRC. DEG, differentially expressed gene; GO, gene ontology; SLC, solute carrier. As shown in , in BP term, the up-regulated genes were mainly enriched in the nuclear division, mitotic nuclear division and organelle fission, while the down-regulated genes were focused on the detoxification of copper ion, stress response to copper ion and detoxification of inorganic compound. In CC term, the up-regulated genes were mainly enriched in the spindle pole, spindle, chromosome and centromeric region, while the down-regulated genes were focused on the microvillus membrane, apical part of cell and apical plasma membrane. In MF term, the up-regulated genes were mainly enriched in the CXCR chemokine receptor binding, chemokine activity and chemokine receptor binding, while the down-regulated genes were focused on the oxidoreductase activity, phosphoric diester hydrolase activity and oxidoreductase activity.

Table 2

The top three gene ontology and pathway enrichment terms of up-regulated and down-regulated genes, respectively

Terms	Category	Description	FDR	Count
Up-regulated genes
GO: 0000280	BP	Nuclear division	4.27E-07	21
GO: 0140014	BP	Mitotic nuclear division	5.16E-07	17
GO: 0048285	BP	Organelle fission	9.75E-07	21
GO: 0000922	CC	Spindle pole	0.000108	10
GO: 0005819	CC	Spindle	0.000108	14
GO: 0000775	CC	Chromosome, centromeric region	0.000574	10
GO: 0045236	MF	CXCR chemokine receptor binding	2.53E-10	8
GO: 0008009	MF	Chemokine activity	1.84E-06	8
GO: 0042379	MF	Chemokine receptor binding	1.28E-05	8
R-HSA-380108	KEGG	Chemokine receptors bind chemokines	7.09E-06	8
R-HSA-1442490	KEGG	Collagen degradation	3.65E-05	8
R-HSA-1474228	KEGG	Degradation of the extracellular matrix	0.001122	9
Down-regulated genes
GO: 0010273	BP	Detoxification of copper ion	5.97E-09	8
GO: 1990169	BP	Stress response to copper ion	5.97E-09	8
GO: 0061687	BP	Detoxification of inorganic compound	1.10E-08	8
GO: 0031528	CC	Microvillus membrane	5.58E-05	6
GO: 0045177	CC	Apical part of cell	5.58E-05	19
GO: 0016324	CC	Apical plasma membrane	5.58E-05	17
GO: 0016614	MF	Oxidoreductase activity, acting on CH-OH group of donors	0.00075	11
GO: 0008081	MF	Phosphoric diester hydrolase activity	0.00075	9
GO: 0016616	MF	Oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor	0.00075	10
R-HSA-5661231	KEGG	Metallothioneins bind metals	1.59E-08	7
R-HSA-5660526	KEGG	Response to metal ions	7.97E-08	7
R-HSA-2672351	KEGG	Stimuli-sensing channels	0.000575	10

GO, gene ontology; FDR, false discovery rate; BP, biological process; CC, cellular components; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes.

GO, gene ontology; FDR, false discovery rate; BP, biological process; CC, cellular components; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes. shows the significant pathways in which the most signal pathways were enriched in the metallothioneins bind metals, response to metal ions and chemokine receptors bind chemokines (). The up-regulated genes were mainly enriched in the chemokine receptors bind chemokines, collagen degradation and degradation of the extracellular matrix (), and the down-regulated genes were enriched in the metallothioneins bind metals, response to metal ions and stimuli-sensing channels ().

Module screening from the PPI network

Firstly, 426 DEGs were uploaded into the STRING website and analyzed by Cytoscape software. A total of 269 DEGs with score >0.4 (medium confidence) were picked out for the construction of the PPI networks (). Then, two significant gene modules were clustered via MCODE APP built in Cytoscape. Module 1 was made up of 31 up-regulated genes/nodes and 427 edges (), while module 2 consisted of 15 genes/nodes (10 up-regulated and 5 down-regulated genes) and 105 edges ().

Figure 3

The protein-protein interaction (PPI) networks construction and significant gene modules analysis. (A) The PPI networks of DEGs (orange and blue represents the most two significant gene modules, respectively); (B) module 1 consists of 31 nodes/genes (orange indicates an up-regulated gene); (C) module 2 consists of 15 nodes/genes (orange indicates an up-regulated gene and light blue indicates a down-regulated gene). Genes from those two gene modules were chosen for validation in COAD and READ built in TCGA/GTEx datasets. We further found that only the four gene expressions of AURKA, BUB1, DLGAP5 and HMMR in COAD and READ datasets have consistency with that in GSE21510 (). In addition, their OS is significantly different between high expression and low expression (). GO and KEGG pathways show that these four candidate genes are significantly enriched in the regulation of mitotic cycle phase transition and oocyte meiosis ( and ).

Figure 4

The validation of the final potential four genes and its functional annotation. (A) Validation of the gene expression of AURKA, BUB1, DLGAP5 and HMMR in COAD & READ datasets. The cutoff: |log2 fold change (FC)| ≥2, and P<0.01 (* indicates P<0.01). (B) Overall survival (OS) analysis of the AURKA, BUB1, DLGAP5 and HMMR in COAD & READ datasets. (C) Chord plot for functional enrichments of four genes. COAD, colon adenocarcinoma; READ, rectum adenocarcinoma; HR, hazard ratio; TPM, transcripts per kilobase million.

Table 3

Gene ontology and pathways enrichment of identified four genes

Terms	Category	Description	FDR	Gene	Count
GO: 1901990	BP	Regulation of mitotic cell cycle phase transition	0.00025	AURKA, BUB1, DLGAP5, HMMR	4
GO: 0007088	BP	Regulation of mitotic nuclear division	0.00073	AURKA, BUB1, DLGAP5	3
GO: 0000280	BP	Nuclear division	0.001	AURKA, BUB1, DLGAP5	3
GO: 0030071	BP	Regulation of mitotic metaphase/anaphase transition	0.0031	BUB1, DLGAP5	2
GO: 0045840	BP	Positive regulation of mitotic nuclear division	0.0031	AURKA, DLGAP5	2
GO: 0032436	BP	Positive regulation of proteasomal ubiquitin-dependent protein catabolic process	0.0058	AURKA, DLGAP5	2
GO: 1903047	BP	Mitotic cell cycle process	0.0058	AURKA, BUB1, DLGAP5	3
GO: 0051781	BP	Positive regulation of cell division	0.0066	AURKA, DLGAP5	2
GO: 0000819	BP	Sister chromatid segregation	0.0076	BUB1, DLGAP5	2
GO: 0140014	BP	Mitotic nuclear division	0.0082	AURKA, DLGAP5	2
GO: 0007093	BP	Mitotic cell cycle checkpoint	0.0089	AURKA, BUB1	2
GO: 0010389	BP	Regulation of G2/M transition of mitotic cell cycle	0.0089	AURKA, HMMR	2
GO: 0140013	BP	Meiotic nuclear division	0.0089	AURKA, BUB1	2
GO: 1901991	BP	Negative regulation of mitotic cell cycle phase transition	0.0091	AURKA, BUB1	2
GO: 0051276	BP	Chromosome organization	0.0128	AURKA, BUB1, DLGAP5	3
GO: 0051301	BP	Cell division	0.0453	AURKA, BUB1	2
GO: 0031616	CC	Spindle pole centrosome	0.00041	AURKA, DLGAP5	2
GO: 0000780	CC	Condensed nuclear chromosome, centromeric region	0.00062	AURKA, BUB1	2
GO: 0005813	CC	Centrosome	0.0038	AURKA, DLGAP5, HMMR	3
GO: 0043232	CC	Intracellular non-membrane-bounded organelle	0.0395	AURKA, BUB1, DLGAP5, HMMR	4
hsa04114	KEGG	Oocyte meiosis	0.0028	AURKA, BUB1	2
hsa04914	KEGG	Progesterone-mediated oocyte maturation	0.0028	AURKA, BUB1	2

GO, gene ontology; FDR, false discovery rate; BP, biological process; CC, cellular components; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Discussion

In this study, our purposes were to expound the potential genetic biomarkers and pathways through comparing the array datasets between cancer group (CRC tissues) and Normal group (noncancerous tissues), and 166 up-regulated and 260 down-regulated DEGs were identified. Then the function (GO) and signal pathway (KEGG) annotation analyses have been performed. Moreover, the PPI networks of DEGs were constructed and 269 DEGs/nodes were connected to 1,169 edges, and finally the most two significant modules were chosen from the PPIs, from which 46 central nodes/genes were selected to validate its gene expressions and OS time in TCGA/GTEx. As we knew that mutations in the Wnt signaling pathway and inflammatory bowel disease are the two major causes of CRC (22). Through bioinformatics analyses, we have identified 46 central nodes/genes, among them, the first significant gene module () consists of 31 genes, including TOP2A, CDK1, CCNB, MYC, AURKA, BUB1, etc., and the second significant gene module () consists of 15 genes, including CXCL12, CCL28, CXCL2, GNG4, CXCL1, INSL5, etc. However, some of genes in these two significant gene modules, associated with CRC, have been researched and identified in the past years. Finally, we screened four genes (AURKA, BUB1, DLGAP5 and HMMR), which have been consistent with their gene expression level in tumor and normal patients of COAD & READ. Besides, we discarded the other 42 genes which didn’t meet the threshold we set. Aurora A kinase (AUKRA) is encoded by the AURKA gene and a member of the serine/threonine kinases family (23,24). AURKA has been shown to interact with Wnt and Ras-MAPK signaling in CRC (25). What’s more, it was reported that AURKA has associated with CRC liver metastasis (CRLCM) (26,27). Budding uninhibited by benzimidazoles (BUB1) is also a member of the serine/threonine-protein kinase family (28). Over 90 percentages of all human solid tumors have a common feature that mutations occur in the spindle checkpoints (29). Jaffrey et al. have suggested that mutations in BUB1 can lead to chromosome instability in cancer cell lines (30). Disks large-associated protein 5 (DLGAP5) is a kinetochore protein that plays a role in stabilizing microtubules in chromosomes, controlling spindle dynamics, promoting interkinetochore tension and executing efficient kinetochore capture (31). However, Schneider MA’s team has predicted that AURKA and DLGAP5 could have a correlation with poor prognosis in non-small cell lung cancer patients (32). By regulating DLGAP5 gene expression, it could enhance the efficacy of epirubicin for invasive breast cancer (33). Hyaluronan-mediated motility receptor (HMMR), known as receptor for hyaluronan mediated motility (RHAMM), has a high-level gene expression and correlation with poor outcome in breast cancer (34). What’s more, HMMR may be associated with the risk of breast cancer patients who have BRCA1 mutation (35). As shown in , you will be surprised to find that the low TPM group of these four genes have low percent survival than that in high TPM group, which are contrary to what we expected. But the survival curve of these four genes was statistically significant, and it is essential for us to seek the reasons why it occurs. The above four genes were chosen via comprehensive bioinformatics analyses and mainly enriched in the regulation of mitotic cycle phase transition and oocyte meiosis pathway. However, further molecular and cellular experiments are required to verify the function of these four gene biomarkers in CRC.

34 in total

1. Clinical significance of osteoprotegerin expression in human colorectal cancer.

Authors: Shunsuke Tsukamoto; Toshiaki Ishikawa; Satoru Iida; Megumi Ishiguro; Kaoru Mogushi; Hiroshi Mizushima; Hiroyuki Uetake; Hiroshi Tanaka; Kenichi Sugihara
Journal: Clin Cancer Res Date: 2011-01-26 Impact factor: 12.531

2. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.

Authors: Daniel R Rhodes; Jianjun Yu; K Shanker; Nandan Deshpande; Radhika Varambally; Debashis Ghosh; Terrence Barrette; Akhilesh Pandey; Arul M Chinnaiyan
Journal: Proc Natl Acad Sci U S A Date: 2004-06-07 Impact factor: 11.205

Review 3. Cyclin-dependent protein serine/threonine kinase inhibitors as anticancer drugs.

Authors: Robert Roskoski
Journal: Pharmacol Res Date: 2018-11-30 Impact factor: 7.658

Review 4. Colorectal Cancer: Epidemiology, Disease Mechanisms and Interventions to Reduce Onset and Mortality.

Authors: Veronica Aran; Ana Paula Victorino; Luiz Claudio Thuler; Carlos Gil Ferreira
Journal: Clin Colorectal Cancer Date: 2016-02-13 Impact factor: 4.481

5. Stromal contribution to the colorectal cancer transcriptome.

Authors: Claudio Isella; Andrea Terrasi; Sara Erika Bellomo; Consalvo Petti; Giovanni Galatola; Andrea Muratore; Alfredo Mellano; Rebecca Senetta; Adele Cassenti; Cristina Sonetto; Giorgio Inghirami; Livio Trusolino; Zsolt Fekete; Mark De Ridder; Paola Cassoni; Guy Storme; Andrea Bertotti; Enzo Medico
Journal: Nat Genet Date: 2015-02-23 Impact factor: 38.330

6. Cytoscape 2.8: new features for data integration and network visualization.

Authors: Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal: Bioinformatics Date: 2010-12-12 Impact factor: 6.937

7. AURKA, DLGAP5, TPX2, KIF11 and CKAP5: Five specific mitosis-associated genes correlate with poor prognosis for non-small cell lung cancer patients.

Authors: Marc A Schneider; Petros Christopoulos; Thomas Muley; Arne Warth; Ursula Klingmueller; Michael Thomas; Felix J F Herth; Hendrik Dienemann; Nikola S Mueller; Fabian Theis; Michael Meister
Journal: Int J Oncol Date: 2017-01-02 Impact factor: 5.650

8. Nucleolar and Spindle Associated Protein 1 (NUSAP1) Inhibits Cell Proliferation and Enhances Susceptibility to Epirubicin In Invasive Breast Cancer Cells by Regulating Cyclin D Kinase (CDK1) and DLGAP5 Expression.

Authors: Xi Zhang; Yuliang Pan; Huiqun Fu; Juan Zhang
Journal: Med Sci Monit Date: 2018-11-26

9. Aurora kinase A (AURKA) expression in colorectal cancer liver metastasis is associated with poor prognosis.

Authors: J A C M Goos; V M H Coupe; B Diosdado; P M Delis-Van Diemen; C Karga; J A M Beliën; B Carvalho; M P van den Tol; H M W Verheul; A A Geldof; G A Meijer; O S Hoekstra; R J A Fijneman
Journal: Br J Cancer Date: 2013-10-08 Impact factor: 7.640

10. A prognostic classifier for patients with colorectal cancer liver metastasis, based on AURKA, PTGS2 and MMP9.

Authors: Jeroen A C M Goos; Veerle M H Coupé; Mark A van de Wiel; Begoña Diosdado; Pien M Delis-Van Diemen; Annemieke C Hiemstra; Erienne M V de Cuba; Jeroen A M Beliën; C Willemien Menke-van der Houven van Oordt; Albert A Geldof; Gerrit A Meijer; Otto S Hoekstra; Remond J A Fijneman
Journal: Oncotarget Date: 2016-01-12