Liuli Wang1, Deming Liu2, Shuo Liu3, Tianyi Liao3, Yajun Jiao2, Xianglai Jiang2, Yongfeng Wang4, Yaqiong Chen5, Haizhong Ma6, Hui Cai7. 1. The First Clinical Medical College of Lanzhou University, No. 199, Donggang WestRoad, Chengguan District, Lanzhou, Gansu 730000, China; Department of General Surgery, Gansu Provincial Hospital, Lanzhou 730000, China; Key Laboratory of Molecular Diagnosis and Precision Treatment of Surgical tumor, Gansu Provincial Hospital, Lanzhou 730000, China. 2. Ningxia Medical University, Ningxia 750004, China. 3. The First Clinical Medical College of Lanzhou University, No. 199, Donggang WestRoad, Chengguan District, Lanzhou, Gansu 730000, China. 4. Department of Clinical Medicine, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China. 5. Medical Department of Gansu Provincial Hospital, Lanzhou 730000, China. 6. Department of Quality Control, Gansu Provincial Hospital, Lanzhou 730000, China. 7. The First Clinical Medical College of Lanzhou University, No. 199, Donggang WestRoad, Chengguan District, Lanzhou, Gansu 730000, China; Department of General Surgery, Gansu Provincial Hospital, Lanzhou 730000, China; Key Laboratory of Molecular Diagnosis and Precision Treatment of Surgical tumor, Gansu Provincial Hospital, Lanzhou 730000, China. Electronic address: caialonteam@163.com.
Abstract
BACKGROUND AND OBJECTIVES: Colorectal cancer (CRC) is one of the most common malignant tumors worldwide with high incidence and mortality rate, while colorectal liver metastasis (CRLM) is one of the major causes of cancer-related deaths. Therefore, the present study aims to identify the hub gene associated with CRC carcinogenesis and liver metastasis, and then explore its diagnostic and prognostic value as well as the potential regulation mechanism. METHODS: The overlapping differential co-expression genes among CRC, CRLM, and normal tissues were explored on the GSE49355 and GSE81582 datasets from the Gene Expression Omnibus (GEO) database by integrated bioinformatics analysis. Then, the hub prognostic genes were selected from the overlapping genes by univariate Cox proportional hazard analysis and online database Gene Expression Profiling Interactive Analysis 2 (GEPIA2). Subsequently, the clinical value of the hub genes was evaluated in the TCGA and GSE39582 cohorts. Finally, the underlying mechanisms of the hub gene regulating CRC carcinogenesis and metastasis were explored by Gene function annotation and DNA methylation analysis. RESULTS: Inositol mono-phosphatase 2 (IMPA2) was identified as the hub gene associated with CRC carcinogenesis and liver metastasis. IMPA2 had an excellent diagnostic efficiency, and its expression was significantly decreased in CRC and liver metastasis samples, being positively correlated with poor prognosis. Moreover, its low expression was associated with AJCC stage III+IV, T4, N1+2, and M1. In addition, our results revealed that the potential mechanisms used by IMPA2 to mediate CRC carcinogenesis and metastasis could be associated with lipid metabolism and epithelial mesenchymal transition (EMT). Finally, IMPA2 expression could be regulated by DNA methylation. CONCLUSIONS: IMPA2 was identified and reported for the first time as a hub gene biomarker in the diagnosis and prognosis of CRC, which could regulate CRC carcinogenesis and liver metastasis through the regulation of lipid metabolism, EMT, and DNA methylation.
BACKGROUND AND OBJECTIVES: Colorectal cancer (CRC) is one of the most common malignant tumors worldwide with high incidence and mortality rate, while colorectal liver metastasis (CRLM) is one of the major causes of cancer-related deaths. Therefore, the present study aims to identify the hub gene associated with CRC carcinogenesis and liver metastasis, and then explore its diagnostic and prognostic value as well as the potential regulation mechanism. METHODS: The overlapping differential co-expression genes among CRC, CRLM, and normal tissues were explored on the GSE49355 and GSE81582 datasets from the Gene Expression Omnibus (GEO) database by integrated bioinformatics analysis. Then, the hub prognostic genes were selected from the overlapping genes by univariate Cox proportional hazard analysis and online database Gene Expression Profiling Interactive Analysis 2 (GEPIA2). Subsequently, the clinical value of the hub genes was evaluated in the TCGA and GSE39582 cohorts. Finally, the underlying mechanisms of the hub gene regulating CRC carcinogenesis and metastasis were explored by Gene function annotation and DNA methylation analysis. RESULTS: Inositol mono-phosphatase 2 (IMPA2) was identified as the hub gene associated with CRC carcinogenesis and liver metastasis. IMPA2 had an excellent diagnostic efficiency, and its expression was significantly decreased in CRC and liver metastasis samples, being positively correlated with poor prognosis. Moreover, its low expression was associated with AJCC stage III+IV, T4, N1+2, and M1. In addition, our results revealed that the potential mechanisms used by IMPA2 to mediate CRC carcinogenesis and metastasis could be associated with lipid metabolism and epithelial mesenchymal transition (EMT). Finally, IMPA2 expression could be regulated by DNA methylation. CONCLUSIONS: IMPA2 was identified and reported for the first time as a hub gene biomarker in the diagnosis and prognosis of CRC, which could regulate CRC carcinogenesis and liver metastasis through the regulation of lipid metabolism, EMT, and DNA methylation.
Colorectal cancer (CRC) is one of the most common malignant tumors worldwide, with high incidence of morbidity and mortality. Although the diagnostic methods and comprehensive treatments have been improved over the past decades, the mortality rate of CRC is still ranked second and the morbidity rate is ranked third worldwide [1]. As the Global Cancer Incidence, Mortality and Prevalence (GLOBOCAN 2020) statistics reported more than 1.9 million new CRC cases and 935,000 deaths in 2020 [1]. One of the major causes of cancer-related death in patients with CRC is colorectal liver metastasis (CRLM) [2]. Approximately 25–30% of patients with CRC already have CRLM at the time of the initial diagnosis and up to 60% develop CRLM during the course of treatment [3]. CRLM leads to the death of more than half of CRC patients and contributes to a low 5-year overall survival rate of less than 20% in patients with liver metastasis, which is significant lower compared with that of patients without liver metastasis [4,5]. Therefore, it is crucial to find relevant genes working as biomarkers of CRC carcinogenesis and liver metastasis, as well as discover their potential molecular mechanism, providing an additional approach for an effective prevention and treatment to improve the prognosis and quality of life of patients with CRC.Gene expression profile analysis based on bioinformatics has been used to identify the key genes associated with progression and prognosis of tumors thanks to the popularization and gradual development of gene microarray and RNA sequencing [6]. One important analysis to identify appropriate key genes is the evaluation of differentially expressed genes (DEGs), which provides the possibility to study the molecular mechanisms underlying genome regulation and identify quantitative changes in their expression between tumors and normal tissues [7,8]. However, previous studies focused more on the role of a single DEG than on the complex connections between DEGs [9], [10], [11], [12]. Weighted gene co-expression network analysis (WGCNA) is another novel and effective method to identify the significant modules of highly correlated genes and interested modules associated with clinical traits [13,14]. In the WGCNA network, highly correlated genes are clustered into a module, and the relationships between gene modules and clinical traits are determined to identify significant modules. Therefore, the integration analysis of WGCNA and DEGs allows the identification of the differentially co-expressed genes as the novel candidate biomarkers.In the present study, the differentially co-expressed genes related to CRC and CRLM were identified on the two datasets of the Gene Expression Omnibus (GEO) database using the integrated analysis of WGCNA and DEGs. Then, the key prognostic genes were identified by univariate Cox proportional hazard analysis and the public database Gene Expression Profiling Interactive Analysis 2 (GEPIA2). The expression pattern, prognostic value, and correlation with clinical characteristics were verified in The Cancer Genome Atlas (TCGA) database and other GEO datasets. In addition, the potential molecular mechanism of CRC progression and metastasis were further explored through the functional enrichment analysis and DNA methylation. Therefore, the present study contributed to the identification of novel biomarkers associated with the diagnosis and prognosis of CRC in patients with liver metastases, as well as to the discovery of their molecular mechanism of action.
Materials and methods
GEO data set and data processing
Gene expression raw microarray profiles of CRC were evaluated in five independent datasets (accession nos. GSE49355 [15], GSE81582 [16], GSE39582 [17], GSE41258 [18], and GSE17536 [19]) from the GEO database (www.ncbi.nlm.nih.gov/geo). GSE49355 included 57 samples (19 CRLM, 20 primary CRC and 18 normal tissues); GSE81582 included 51 samples (19 CRLM, 23 primary CRC and 9 normal tissues); GSE39582 included 585 tumor samples; GSE41258 included 287 samples (47 CRLM, 186 primary CRC and 54 normal tissues); and GSE17536 included 177 tumor samples. GSE49355 and GSE81582 datasets were used to construct WGCNA and DEGs analyses for the present study. GSE39582, GSE41258, and GSE17536 datasets were used as the training cohort for the selection of hub genes and the validation cohort for the expression, clinical correlation, diagnostic performance, and prognostic value of the hub genes. The expression profiles of the above three datasets were obtained using the “GEOquery” package [20] and the normalization for the microarray data was completed using the “limma” package [21]. Probes were converted to the gene symbols based on a manufacture-provided annotation file, and duplicated probes for the same gene were removed by determining the median expression of all its corresponding probes.
TCGA data set
Transcriptome data of RNA-seq and paired clinical data were downloaded from the TCGA-COAD dataset. The exclusion criteria were the following: 1) patients with unknown survival status, survival time, American Joint Committee on Cancer (JACC) stage, and Tumor Node Metastasis (TNM) stage; 2) patients who died within a follow-up period of 30-day. Finally, a total of 434 samples (including 393 colon tumor tissues and 41 normal tissues) from the TCGA dataset were used as another validation cohort.This study did not require ethical approval since all data were downloaded from the GEO and TCGA, which are publicly available databases. The flowchart of the present research is shown in Fig. 1.
The genes in the GSE49355 and GSE81582 datasets were identified for the construction of the gene co-expression networks. A weighted gene co-expression network was constructed using the “WGCNA” package by choosing 6 and 8 as the soft thresholds for GSE49355 and GSE81582, respectively [22]. This package ensured that the network satisfied a scale-free topology (R2 > 0.85) on the basis of the linear regression model fitting index obtained from the function ‘pickSoftThreshold’ operation. The weighted adjacency matrix was transformed into a topological overlap matrix (TOM) to estimate network connectivity, and the hierarchical clustering method was used to construct the cluster tree structure of the TOM matrix. Different branches of the cluster tree represent different gene modules, and different colors represent different modules. A correlation analysis was performed based on each module eigengene to identify modules that were significantly associated with the sample traits. Then, the modules with high correlation coefficient, which were considered as the candidates relevant to clinical traits, were selected for subsequent analysis.
Identification of DEGs and interaction with the modules of interest
The “limma” package was used to evaluate the GSE49355 and GSE81582 datasets and identify the DEGs among CRLM, primary CRC, and normal tissues. Genes with the cut-off criteria of |logFC| ≥ 1 and p value < 0.05 were considered as the DEGs. The DEGs of the GSE49355 and GSE81582 datasets were visualized as a volcano plot using the “ggplot” package. Subsequently, the overlapping genes between DEGs and co-expression genes that were extracted from the co-expression networks were used to identify potential prognostic genes, which were presented as a Venn diagram using the “VennDiagram” package.
Selection and validation of key genes
The 117 overlapping genes obtained above were screened in the GSE39582 cohort using the univariate Cox proportional hazard analysis to select the key genes with a prognostic value. The 4 genes with a p value < 0.01 were selected as the candidate genes. Then, the public database GEPIA2 (http://GEPIA2.cancer-pku.cn/) [23] was used to further examine the reliability of the selected candidate genes in CRC and inositol mono-phosphatase 2 (IMPA2) was finally identified as the key gene with a prognostic value. Furthermore, the expression patterns, clinical correlation, and survival rates associated to IMPA2 in CRC were validated in the TCGA and GEO datasets. The receiver operating characteristic (ROC) curve was performed to verify the diagnostic performance of IMPA2 using the “pROC” package. Differences in overall survival (OS) and disease-free survival (DFS) were evaluated by the Kaplan-Meier survival curve using the “survival” package.
Functional enrichment and GSEA analysis
The genes positively and negatively correlated with IMPA2 were analyzed in the GSE39582 dataset using the “cor.test” function (stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test.html) in R [24]. In this correlation analysis, the genes with the absolute value of a Pearson's correlation coefficient ≥ 0.3 and p value < 0.05 were included. Then, the Gene Ontology Term (GO), pathway enrichment, and Gene Set Enrichment Analysis (GSEA) were performed to elucidate the potential biological functions of IMPA2. The GO and pathway analysis of the genes correlated to IMPA2 were performed using Metascape (http://metascape.org/), and GSEA analysis was performed using the “clusterProfiler” package with a cut-off criterion of p < 0.05 [25] and was visualized using the “ggplot2” package [26].
Correlation between IMPA2 expression and DNA methylation
The correlation analysis was performed to further investigate the correlation between IMPA2 expression and DNA methylation using the MEXPRESS database (http://mexpress.be/), an online tool for the integration and visualization of gene expression, DNA methylation and clinical data in the TCGA dataset [27,28]. Moreover, the online tool cBioPortal (http://www.cbioportal.org/) [29] analyzing the copy number alterations (CNA) and gene mutations of IMPA2 in the colorectal adenocarcinoma (TCGA, PanCancer Atlas, 526 samples) was used to investigate whether the correlation between IMPA2 expression and DNA methylation contributes to the differences in genetic alterations.
Results
Identification of the WGCNA modules
A gene co-expression network was constructed using WGCNA to identify significant gene modules. First, a hierarchical clustering tree was generated in the GSE49355 and GSE81582 cohorts (Supplementary Fig. S1). Then, the scale independent value exceeded 0.85 with a low average connectivity when the soft threshold power was 6 and 8 in the above datasets, respectively (Fig. 2A and Fig. 2D). Finally, a total of 18 modules in the GSE49355 cohort (Fig. 2B) and 21 modules in the GSE81582 cohort (Fig. 2E) were identified, in which a gray module was excluded because it was not assigned to any cluster. In addition, the heatmap of module-trait relationships was plotted to evaluate the association between each module and the three clinical traits (CRLM, primary CRC, and normal tissues). The brown module (including 947 genes) in the GSE49355 dataset and green module (including 631 genes) in the GSE81582 dataset were closely correlated with CRLM (brown module: r = 0.90, p = 8e - 22; green module: r = 0.79, p = 6e - 12) (Fig. 2C and F).
Fig. 2
WGCNA for construction and identification of the interested modules. (A) Construction of scale-free network in the GSE49355 dataset; (B) Hierarchical clustering tree of genes based on topological overlap different color branches of the cluster tree represent different modules; (C) Module-trait relationships. Each row corresponds to a color module and each column correlates to a clinical trait (CRLM, primary CRC and normal tissues). Each cell contains the corresponding correlation and P-value; (D) Construction of scale-free network in the GSE81582 dataset; (E) Hierarchical clustering tree of genes based on topological overlap different color branches of the cluster tree represent different modules; (F) Module-trait relationships. Each row corresponds to a color module and each column correlates to a clinical trait (CRLM, primary CRC and normal tissues). Each cell contains the corresponding correlation and p-value. Abbreviations: WGCNA, weighted gene co-expression network analysis; CRC, colorectal cancer; CRLM, colorectal liver metastasis.
WGCNA for construction and identification of the interested modules. (A) Construction of scale-free network in the GSE49355 dataset; (B) Hierarchical clustering tree of genes based on topological overlap different color branches of the cluster tree represent different modules; (C) Module-trait relationships. Each row corresponds to a color module and each column correlates to a clinical trait (CRLM, primary CRC and normal tissues). Each cell contains the corresponding correlation and P-value; (D) Construction of scale-free network in the GSE81582 dataset; (E) Hierarchical clustering tree of genes based on topological overlap different color branches of the cluster tree represent different modules; (F) Module-trait relationships. Each row corresponds to a color module and each column correlates to a clinical trait (CRLM, primary CRC and normal tissues). Each cell contains the corresponding correlation and p-value. Abbreviations: WGCNA, weighted gene co-expression network analysis; CRC, colorectal cancer; CRLM, colorectal liver metastasis.
Identification of overlapping genes between the DEGs and co-expression modules
A total of 1587 DEGs screened from the GSE49355 dataset and 1605 DEGs screened from the GSE81582 dataset were found using |log2FC| >1 and p < 0.05 as the threshold. The DEGs of the two datasets were depicted in a volcano plot (Fig. 3A–B). Then, 117 overlapping genes from the intersection of the DEGs and co-expression modules were identified and selected for further analysis by Venn diagram analysis (Fig. 3C).
Fig. 3
DEGs analysis and the key gene selection (A) Volcano plot of DEGs in the GSE49355; (B) Volcano plot of DEGs in the GSE81582; (C) The Venn diagram of genes among DEGs and co-expression modules; (D) The forest plot of univariate Cox proportional hazard analysis and 4 genes were considered as candidate genes associated with prognosis; (E-H) OS analysis for HSD11B2, IMPA2, PIGR, and PXMP2 in the GEPIA2 database. (I-L) DFS analysis for HSD11B2, IMPA2, PIGR, and PXMP2 in the GEPIA2 database. Abbreviations: DEG, differentially expressed genes; OS, overall survival; DFS, diseases-free survival; GEPIA2, Gene Expression Profiling Interactive Analysis 2; HSD11B2, hydroxysteroid 11-beta dehydrogenase 2; IMPA2, inositol Mono-phosphatase 2; PIGR, polymeric immunoglobulin receptor; PXMP2, peroxisomal membrane protein 2; HR, hazard ratio; CI, confidence interval.
DEGs analysis and the key gene selection (A) Volcano plot of DEGs in the GSE49355; (B) Volcano plot of DEGs in the GSE81582; (C) The Venn diagram of genes among DEGs and co-expression modules; (D) The forest plot of univariate Cox proportional hazard analysis and 4 genes were considered as candidate genes associated with prognosis; (E-H) OS analysis for HSD11B2, IMPA2, PIGR, and PXMP2 in the GEPIA2 database. (I-L) DFS analysis for HSD11B2, IMPA2, PIGR, and PXMP2 in the GEPIA2 database. Abbreviations: DEG, differentially expressed genes; OS, overall survival; DFS, diseases-free survival; GEPIA2, Gene Expression Profiling Interactive Analysis 2; HSD11B2, hydroxysteroid 11-beta dehydrogenase 2; IMPA2, inositol Mono-phosphatase 2; PIGR, polymeric immunoglobulin receptor; PXMP2, peroxisomal membrane protein 2; HR, hazard ratio; CI, confidence interval.
Identification of IMPA2 as the key gene
In this step, all the 117 overlapping genes contributed to the construction of the univariate Cox proportional hazards mode. Four candidate genes such as hydroxysteroid 11-beta dehydrogenase 2 (HSD11B2), inositol mono-phosphatase 2 (IMPA2), polymeric immunoglobulin receptor (PIGR), and peroxisomal membrane protein 2 (PXMP2) were identified (Fig. 3D, p < 0.01), which were further validated using the online GEPIA2 dataset. Among the above 4 candidate genes, only IMPA2 was significantly related to OS and DFS, and was previously rarely reported in the CRC (Fig. 3E–L). Therefore, IMPA2 was finally selected as the key gene to comprehensively explore its value in CRC.
Verification of IMPA2 expression patterns and clinical correlation
In this part, the expression and clinical correlation of IMPA2 were analyzed among the CRC patients in the TCGA, GSE39582, and GSE41258 datasets. The mRNA expression of IMPA2 in the GSE41258 dataset had a decline trend among normal tissues, primary tumors, and liver metastases. IMPA2 expression was lower in the liver metastases but higher in the normal tissues when compared to the primary tumors (Fig. 4A). Similar results were found in the TCGA (Fig. 4B) and GSE39582 datasets (Fig. 4G). In addition, the correlation between clinical characteristics (including AJCC stage and TNM stage) and the expression of IMPA2 was analyzed. The analysis of the TCGA dataset revealed that IMPA2 expression was remarkably decreased in AJCC stage III+IV compared to stage I+II (Fig. 4C, p = 0.028). As regards the TNM stage, patients in T4, N1+2, and M1 stage showed a significantly lower IMPA2 expression than those in T2 and T3 (Fig. 4D, p < 0.05), N0 (Fig. 4E, p = 0.014), and M0 (Fig. 4F, p = 0.03). Similar results were obtained from the GSE39582 dataset (Fig. 4H–K).
Fig. 4
Expression patterns of IMPA2 in CRC. (A) IMPA2 expression among liver metastases, primary tumors and normal tissues in the GSE41258 dataset; (B) IMPA2 expression between tumors and normal tissues in the TCGA cohort; (C-F) Correlation between IMPA2 expression and clinical characteristics, which showed that IMPA2 expression were significantly associated with AJCC Stage (C), T stage (D), N stage (E), and M stage (F); (G-K) Similar results were observed from the GSE39582 cohort. Abbreviations: CRC, colorectal cancer; TCGA, The Cancer Genome Atlas; AJCC, American Joint Committee on Cancer; IMPA2, inositol Mono-phosphatase 2.
Expression patterns of IMPA2 in CRC. (A) IMPA2 expression among liver metastases, primary tumors and normal tissues in the GSE41258 dataset; (B) IMPA2 expression between tumors and normal tissues in the TCGA cohort; (C-F) Correlation between IMPA2 expression and clinical characteristics, which showed that IMPA2 expression were significantly associated with AJCC Stage (C), T stage (D), N stage (E), and M stage (F); (G-K) Similar results were observed from the GSE39582 cohort. Abbreviations: CRC, colorectal cancer; TCGA, The Cancer Genome Atlas; AJCC, American Joint Committee on Cancer; IMPA2, inositol Mono-phosphatase 2.
Verification of the prognostic performance and prognostic value of IMPA2 in CRC
The ROC curve was generated to verify the diagnostic performance of IMPA2 based on the GSE41258 database. The AUC showed that IMPA2 had an excellent diagnostic efficiency for CRC patients (Fig. 5A, B). The patients were then divided into low-expression group and high-expression group based on the optimal cutoff value of IMPA2 expression in the GSE39582 dataset to further investigate the prognostic value of IMPA2 in CRC. The OS and DFS analyses of IMPA2 were performed by the Kaplan-Meier survival curve using the “survival” package. The survival analysis indicated that CRC patients in the low-expression group had a significant worse OS and DFS than those in the high-expression group (Fig. 5C, D), which was consistent with the GEPIA2 analysis above.
Fig. 5
Verification of the prognostic performance and prognostic value of IMPA2 in CRC. (A-B) ROC curves and AUC statistics were used to evaluate the capacity to discriminate CRC from normal controls (A), and CRLM from CRC (B) with excellent specificity and sensitivity in the GSE41258 dataset. (C-D) Low IMPA2 expression was significantly correlated with OS and DFS in the GSE39582 cohort, which was consistent with survival analysis in the GEPIA2 database; (E-F) Low expression IMPA2 was an independent unfavorable prognostic factor in TCGA cohort by univariate (E) and multivariate (F) Cox proportional hazard analysis; (G-H) Low expression IMPA2 was an independent unfavorable prognostic factor in the GSE17358 dataset by univariate (G) and multivariate (H) Cox proportional hazard analysis. Abbreviations: CRC, colorectal cancer; CRLM, colorectal liver metastasis; IMPA2, inositol Mono-phosphatase 2; ROC, receiver operating characteristic; AUC, area under the curve; OS, overall survival; DFS, disease-free survival; GEPIA2, Gene Expression Profiling Interactive Analysis 2; TCGA, The Cancer Genome Atlas; HR, hazard ratio; CI, CI, confidence interval; WD, Well differentiated; MD, Moderately differentiated; PD, Poorly differentiated.
Verification of the prognostic performance and prognostic value of IMPA2 in CRC. (A-B) ROC curves and AUC statistics were used to evaluate the capacity to discriminate CRC from normal controls (A), and CRLM from CRC (B) with excellent specificity and sensitivity in the GSE41258 dataset. (C-D) Low IMPA2 expression was significantly correlated with OS and DFS in the GSE39582 cohort, which was consistent with survival analysis in the GEPIA2 database; (E-F) Low expression IMPA2 was an independent unfavorable prognostic factor in TCGA cohort by univariate (E) and multivariate (F) Cox proportional hazard analysis; (G-H) Low expression IMPA2 was an independent unfavorable prognostic factor in the GSE17358 dataset by univariate (G) and multivariate (H) Cox proportional hazard analysis. Abbreviations: CRC, colorectal cancer; CRLM, colorectal liver metastasis; IMPA2, inositol Mono-phosphatase 2; ROC, receiver operating characteristic; AUC, area under the curve; OS, overall survival; DFS, disease-free survival; GEPIA2, Gene Expression Profiling Interactive Analysis 2; TCGA, The Cancer Genome Atlas; HR, hazard ratio; CI, CI, confidence interval; WD, Well differentiated; MD, Moderately differentiated; PD, Poorly differentiated.Moreover, the clinical characteristics and IMPA2 were systematically analyzed by univariate and multivariate Cox proportional hazard analysis to confirm the independent prognostic value of IMPA2 in CRC patients. The results from the TCGA dataset revealed that the low expression IMPA2 was an independent unfavorable prognostic factor (Fig. 5E, F). Similar results were found in the GSE17536 dataset (Fig. 5G, H).
Functional enrichment analysis of IMPA2 co-expressed genes
The genes positively and negatively associated with IMPA2 were analyzed in the GSE39583 dataset to explore the potential role and regulation mechanism of IMPA2 in CRC, and 1013 IMPA2 co-expressed genes were identified. After screening GO and pathway enrichment analysis, up to 100 enriched clusters with IMPA2 co-expressed genes were found (Supplementary Fig. S2). The top 20 enriched clusters, originating from three categories (Reactome gene sets, GO biological process and COSUM) are shown in Fig. 6A. The IMPA2-associated genes were mainly enriched in metabolism processes, including metabolism of lipids, carbohydrate metabolic process, small molecule catabolic process, lipid biosynthetic process, organic hydroxy compound metabolic process, amino sugar and nucleotide sugar metabolism, valine, leucine, and isoleucine degradation, multicellular organismal homeostasis, phospholipid metabolism, and regulation of hormone levels. The GSEA analysis was performed for those co-expressed genes to further identify the molecular signaling pathways in CRC. Epithelial mesenchymal transition (EMT), estrogen response late, fatty acid metabolism, and oxidative phosphorylation were differentially enriched in the gene sets both positively and negatively correlated with IMPA2 (Fig. 6B).
Fig. 6
Functional enrichment of IMPA2-correlated co-expression genes. (A) GO and pathway enrichment analysis of IMPA2 co-expressed genes. The top 20 pathway enrichment clusters based on Metascape analysis of IMPA2 co-expressed genes carried out with GO Biological Processes, Reactome Gene Sets and CORUM. Length of bars represent log10 (p-value) determined by the best-scoring term within each cluster; (B) GSEA for IMPA2 co-expressed genes in the GSE39582 dataset. Abbreviations: GO, Gene Ontology Term; GSEA, Gene Set Enrichment Analysis; IMPA2, inositol Mono-phosphatase 2.
Functional enrichment of IMPA2-correlated co-expression genes. (A) GO and pathway enrichment analysis of IMPA2 co-expressed genes. The top 20 pathway enrichment clusters based on Metascape analysis of IMPA2 co-expressed genes carried out with GO Biological Processes, Reactome Gene Sets and CORUM. Length of bars represent log10 (p-value) determined by the best-scoring term within each cluster; (B) GSEA for IMPA2 co-expressed genes in the GSE39582 dataset. Abbreviations: GO, Gene Ontology Term; GSEA, Gene Set Enrichment Analysis; IMPA2, inositol Mono-phosphatase 2.
Correlation of IMPA2 expression with DNA methylation
In this part, the correlation between IMPA2 expression and DNA methylation in CRC was explored using the MEXPRESS database. The tumor tissues (primary, recurrent, and metastatic) had lower IMPA2 expression than normal tissues (Wilcoxon rank-sum test, p = 4.188e-14) (Fig. 7A). It was clearly shown that IMPA2 expression was inversely associated with DNA methylation in its four promoter regions located close to the transcription start site (TSS) (Pearson correlation coefficients ranging from −0.247 to −0.231; p < 0.01). However, a positive correlation existed between IMPA2 expression and DNA methylation in its another promoter region (Pearson r = 0.177; p < 0.01). Furthermore, the CNA and gene mutations of IMPA2 were analyzed by the cBioPortal database to investigate whether the lower expression of IMPA2 in CRC was caused by genetic alterations. Only 6 (1.1%, including 2 DNA amplification and 4 missense mutation) of the 526 patients with CRC were identified with existing genetic alterations, suggesting that the lower IMPA2 expression was regulated by DNA methylation rather than CNA and gene mutations (Fig. 7B).
Fig. 7
Genetic features of IMPA2 in CRC. (A) Visualization of the association between IMPA2 expression and DNA methylation in TCGA-COAD cohort using the MEXPRESS tool. The sample are ordered by IMPA2 expression; (B) Analyzation of IMPA2 gene alteration in 526 CRC patients using cBioPortal database. Abbreviations: CRC, colorectal cancer; TCGA, The Cancer Genome Atlas; COAD, Colon adenocarcinoma; IMPA2, inositol Mono-phosphatase 2.
Genetic features of IMPA2 in CRC. (A) Visualization of the association between IMPA2 expression and DNA methylation in TCGA-COAD cohort using the MEXPRESS tool. The sample are ordered by IMPA2 expression; (B) Analyzation of IMPA2 gene alteration in 526 CRC patients using cBioPortal database. Abbreviations: CRC, colorectal cancer; TCGA, The Cancer Genome Atlas; COAD, Colon adenocarcinoma; IMPA2, inositol Mono-phosphatase 2.
Discussion
CRLM is the major cause of poor prognosis and tumor recurrence. It is well known that liver is the most common metastatic organ of CRC, and only 10% of patients with CRLM can benefit from surgical treatment. However, the recurrence rate after surgical resection is as high as 50–75% [30], and recurrence is the leading cause of cancer-related death [31]. Therefore, it is of utmost importance the identification of novel biomarkers for the diagnosis and prediction of CRC with liver metastasis. In the present study, a total of 117 differentially co-expressed genes associated with the carcinogenesis and liver metastasis of CRC were identified from the GSE49355 and GSE41258 datasets using the integrated analysis of WGCNA and DEGs. Then, 4 candidate genes were identified by constructing the univariate Cox proportional hazards mode. Furthermore, the online database GEPIA2 was further used to estimate the key genes with prognostic value. Finally, IMPA2 was identified as a potential gene to evaluate the risk of liver metastasis in CRC and to predict the prognosis of CRC patients.The IMPA2 gene is a protein-coding gene for myo-inostitol monophosphatase [32], which is one of the key enzymes in inositol phosphate metabolism. It catalyzes the dephosphorylation of inositol monophosphate and plays an important role in phosphatidylinositol signaling [33,34]. Previous studies suggested that IMPA2 is associated with neuropsychiatric diseases, such as schizophrenia [35], epilepsy [36], bipolar disorder [37], Huntington's disease [38] and ischemic stroke [39]. Some studies have recently shown that IMPA2 expression is closely related to the progression and metastasis of malignant tumors. IMPA2 expression is predominantly downregulated in clear cell renal cell carcinoma (ccRCC) compared to normal tissues, and the downregulation of IMPA2 is strongly correlated with an unfavorable prognosis in ccRCC patients [40]. Importantly, IMPA2 expression is inversely associated with the metastatic potential of ccRCC cells. Moreover, Zhang et al. [41] indicated that IMPA2 expression is upregulated in cervical cancer. However, the role of IMPA2 in CRC has been rarely studied. In the present study, the role of IMPA2 in CRC progression, metastasis, and prognosis was comprehensively investigated.The expression, diagnostic and prognostic value of IMPA2 was evaluated by bioinformatics analysis. The GEO and TCGA analyses indicated that the expression of IMPA2 was significantly decreased in CRC and liver metastases compared to that in the normal tissues. This result suggested that IMPA2 may be negatively correlated with CRC progression, however, it is needed to further validated by laboratory experiments. Moreover, IMPA2 expression was inversely correlated with AJCC and TNM stages. The ROC curve indicated that IMPA2 had an excellent diagnostic efficiency for CRC. The prognostic value of IMPA2 in CRC was then validated in the GSE39582 dataset by the Kaplan-Meier survival curve using the “survival” package. The results suggested that the high expression of IMPA2 was positively correlated with favorable OS and DFS in CRC patients. The univariate and multivariate Cox proportional hazard modes were constructed among clinical characteristics and the expression of IMPA2 in the TCGA and GSE17356 datasets to further confirm the independent prognostic value of IMPA2 in CRC patients. Interestingly, the results indicated that the hazard ratio of the low IMPA2 expression group was significantly higher than that in the high IMPA2 expression group, indicating that the low IMPA2 expression was an independent unfavorable prognostic factor.The functional and pathway enrichment analyses for IMPA2-associated genes were performed to elucidate the potential role and regulation mechanism of IMPA2 in CRC. Metascape and GSEA analyses suggested that IMPA2-correlated genes were associated with the metabolism of lipids and EMT. Lipid metabolism (including lipid synthesis, lipid utilization, and lipolysis) is associated with the development and progression of various cancers, contributing to tumorigenesis and metastasis by activating specific cell signaling pathways [42,43]. Lipid metabolism is often drastically altered during the transformation of the cells to a malignant phenotype. Tumor cells increase the synthesis of fatty acids for the synthesis of more lipids necessary for the plasma membrane of the newborn cells, as well as energy production and signal transduction to support the rapid proliferation and metastasis of tumor cells [44]. An increasing amount of studies demonstrated that the abnormal lipid metabolism is significantly associated with the occurrence and malignant transformation of CRC [45], [46], [47], [48]. The EMT is an important process that occurs prior to tumor metastasis, and is considered as a required condition for tumor invasion and metastasis [49,50]. It is well-documented that lipid metabolism provides the energy and adequate environment for the EMT process of tumor cells and contributes to the metastasis of tumors [51]. Several studies investigated the correlation between EMT induction and lipid metabolism in cancer. For instance, Jiang et al. [52] suggested that PRRX1‐induced EMT in salivary adenoid cystic carcinoma activates the metabolic reprogramming of free fatty acids to promote invasion and metastasis. In addition, Wang et al. [53] indicated that apolipoprotein C-II induced EMT and lipid metabolism to promote gastric cancer peritoneal metastasis through the PI3K/AKT/mTOR pathway. However, the correlation among IMPA2, EMT, and lipid metabolism and the underlying mechanisms regulating their link together to induce the metastasis of CRC have not been investigated to date.DNA methylation regulates gene expression in cancer and usually induces suppressor gene silencing and oncogene activation through hyper/hypomethylation [54], [55], [56]. The present study revealed that DNA methylation in most promoter regions was associated with low IMPA2 expression. Taking into consideration that the low IMPA2 expression could be caused by genetic alterations, the CNA and gene mutations of IMPA2 were analyzed, and the result showed that only 6 (1.1%) of the 526 patients with CRC were identified as carrying genetic alterations, indicating that the low IMPA2 expression was regulated by DNA methylation rather than genetic alterations. These findings suggested that DNA methylation could be a potential mechanism regulating IMPA2 expression.Our study has some limitations. Firstly, the present study is a retrospective study, since all the data included in this study were obtained from publicly available databases. Therefore, prospective trials should be performed to further evaluate the diagnostic and prognostic value of IMPA2. Secondly, the number of samples in the normal and liver metastasis groups was quite smaller than that in the primary CRC group regarding the evaluation of the expression of IMPA2, which could result in a statistical problem. Thus, a consistent sample size in the CRLM, primary CRC, and normal groups should be considered to further confirm IMPA2 expression in our future research. In addition, further studies including clinical, in vivo, and in vitro experiments are needed to explore the underlying molecular mechanisms of IMPA2 for clinical applications.
Conclusions
In conclusion, IMPA2 was identified as the hub gene with a diagnostic and prognostic value, and its expression was significantly decreased in CRC and CRLM samples, while that low IMPA2 expression was significantly associated with poor prognosis. In addition, the potential mechanism of IMPA2 regulating tumorigenesis and liver metastasis of CRC could be associated with lipid metabolism, EMT, and DAN methylation.
Fundings
This study was supported by the (Grant Numbers ZYYDDFFZZJ-1), Natural Science Foundation of Gansu Province, China (Grant Numbers 18JR3RA052) and National Key Research and Development Program (grant numbers 2018YFC1311500).
Authors: Paul J Bloch; Andrew E Weller; Glenn A Doyle; Thomas N Ferraro; Wade H Berrettini; Rachel Hodge; Falk W Lohoff Journal: Prog Neuropsychopharmacol Biol Psychiatry Date: 2010-08-26 Impact factor: 5.067
Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971
Authors: Honor Hugo; M Leigh Ackland; Tony Blick; Mitchell G Lawrence; Judith A Clements; Elizabeth D Williams; Erik W Thompson Journal: J Cell Physiol Date: 2007-11 Impact factor: 6.384
Authors: Ethan Cerami; Jianjiong Gao; Ugur Dogrusoz; Benjamin E Gross; Selcuk Onur Sumer; Bülent Arman Aksoy; Anders Jacobsen; Caitlin J Byrne; Michael L Heuer; Erik Larsson; Yevgeniy Antipin; Boris Reva; Arthur P Goldberg; Chris Sander; Nikolaus Schultz Journal: Cancer Discov Date: 2012-05 Impact factor: 39.397