Literature DB >> 34178621

Integrative Analysis of Identifying Methylation-Driven Genes Signature Predicts Prognosis in Colorectal Carcinoma.

Hao Huang1, Jinming Fu1, Lei Zhang1, Jing Xu1, Dapeng Li1, Justina Ucheojor Onwuka1, Ding Zhang1, Liyuan Zhao1, Simin Sun1, Lin Zhu1, Ting Zheng1, Chenyang Jia1, Binbin Cui2, Yashuang Zhao1.   

Abstract

BACKGROUND: Aberrant DNA methylation is a critical regulator of gene expression and plays a crucial role in the occurrence, progression, and prognosis of colorectal cancer (CRC). We aimed to identify methylation-driven genes by integrative epigenetic and transcriptomic analysis to predict the prognosis of CRC patients.
METHODS: Methylation-driven genes were selected for CRC using a MethylMix algorithm and LASSO regression screening strategy, and were further used to construct a prognostic risk-assessment model. The Cancer Genome Atlas (TCGA) database was obtained as the training set for both the screening of methylation-driven genes and the effect of genes signature on CRC prognosis. Then, the prognostic genes signature was validated in three independent expression arrays of CRC data from Gene Expression Omnibus (GEO).
RESULTS: We identified 143 methylation-driven genes, of which the combination of BATF, PHYHIPL, RBP1, and PNPLA4 expression levels was screened as a better prognostic model with the best area under the curve (AUC) (AUC = 0.876). Compared with patients in the low-risk group, CRC patients in the high-risk group had significantly poorer overall survival in the training set (HR = 2.184, 95% CI: 1.404-3.396, P < 0.001). Similar results were observed in the validation set. Moreover, VanderWeele's mediation analysis indicated that the effect of methylation on prognosis was mediated by the levels of their expression (HRindirect = 1.473, P = 0.001, Proportion mediated, 69.10%).
CONCLUSIONS: We identified a four-gene prognostic signature by integrative analysis and developed a risk-assessment model that is significantly associated with patients' survival. Methylation-driven genes might be a potential prognostic signature for CRC patients.
Copyright © 2021 Huang, Fu, Zhang, Xu, Li, Onwuka, Zhang, Zhao, Sun, Zhu, Zheng, Jia, Cui and Zhao.

Entities:  

Keywords:  colorectal cancer; integrative analysis; methylation-driven genes; overall survival; prognostic risk model

Year:  2021        PMID: 34178621      PMCID: PMC8231008          DOI: 10.3389/fonc.2021.629860

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   6.244


Introduction

Colorectal cancer (CRC) is the most common malignant tumor of the digestive system (1). Although recent advances in diagnostic and therapeutic modalities for CRC have greatly improved in survival with early colorectal carcinoma, the 5-year overall survival (OS) rates remain low in the late stage of CRC (2, 3). According to the SEER database (1973–2014, 2017 release), the 5-year survival rate for stage IV patients with metastases is only 11% (4). Nowadays the tumor-node-metastasis (TNM) staging system is identified as the gold standard to determine the prognosis of CRC patients. However, the effects and prognosis of CRC patients in the same stage using the same treatment are very different, demonstrating that there is the heterogeneity of tumor prognosis in the same stage and thus, the traditional TNM staging system fails to reflect tumor heterogeneity and assess the prognosis of CRC patients accurately (5, 6). Therefore, more effective prognostic biomarkers are needed to evaluate CRC prognosis. DNA methylation is one of the most frequently occurring epigenetic modifications, which plays a crucial role in regulating gene expression and genome function (7). A series of studies have reported significant biomarkers for predicting the prognosis of CRC patients at different omics levels, including DNA methylation (8), microRNAs (9), gene expression (10), and proteins (11). These studies are based on single-level OMICS to consider the complicated process of tumor development (12). While the multi-OMICS may understand the biological behavior of tumors more systematically in multiple dimensions to further reveal complex molecular mechanisms in different phenotypic manifestations and discover molecular candidates with prognostic values (13). Recent studies have a trend of integrating omics to better screen potential prognostic biomarkers (14, 15). Currently, there is a driven regulation mode for selective recognition of hypermethylated or hypomethylated genes that can regulate gene expression and form specific tissue types during development (16). This mode may identify methylation-driven genes, which serve as a key indicator in the development, progression, and prognosis of tumors. At present, studies on methylation-driven genes to evaluate the prognosis of patients have been reported in the bladder (17), hepatocellular (18), and gastric cancers (19). Therefore, it is imperative to combine the profiles of DNA methylation and expression to identify CRC-related methylation-driven genes and evaluate the prognosis of CRC patients. Here, CRC-related specific methylation-driven genes were based on the MethylMix algorithm. These genes were selected by the profiles of genome-wide DNA methylation and gene expression from The Cancer Genome Atlas (TCGA) and were validated from ArrayExpress databases. We further constructed a prognostic model to predict the overall survival (OS) of CRC patients in TCGA datasets and validated this model by Gene Expression Omnibus (GEO) datasets. The time-dependent receiver operating characteristic (ROC) curves and nomograms were utilized to estimate the capability of prediction for the prognostic model in two datasets.

Materials and Methods

Study Population and Data Preprocessing

All the subjects used in this study were obtained from publicly available databases, including TCGA, GEO, and ArrayExpress databases. Methylation-driven genes for CRC were identified by the profiles of DNA methylation and gene expression from TCGA (N = 431), including 386 CRC tissues and corresponding 45 adjacent non-tumor tissue samples. Then these candidate genes were validated further from ArrayExpress databases (N = 214) where contain 214 CRC tissue samples. A prognostic risk-assessment model was developed based on TCGA datasets (N = 367) and was validated the model by Gene Expression Omnibus (GEO) datasets (N = 355) of three-independent gene expression arrays [GSE17536 (N = 177), GSE17537 (N = 55), and GSE72970 (N = 123)], where the CRC clinical information included sex, age, TNM stage, and survival. Level 3 methylation data were obtained from the TCGA Methylation 450k Bead chip by the function of the DownloadMethylationData in a TCGA-Assembler 2 Bioconductor package (18, 20). According to the function of the CalculateSingleValueMethylationData, the average value of all CpG sites in the promoter region between the transcription start site (TSS) 200 and TSS 1,500 bps was calculated. Meanwhile, RNA-seq expression data were also collected from TCGA database. The RNA-Seq data were normalized by function ProcessRNASeqData.

Identification and Validation of Methylation-Driven Genes for CRC

MethylMix is an R package using the analysis of the correlation between methylation level and gene expression level (21). According to the Bioconductor package MethylMix, we integrated DNA methylation data of the tumor tissue samples and normal tissue samples, and gene expression data of CRC tissue samples in TCGA datasets to screen most likely specific driven genes for CRC. The highly correlated genes were selected for further analyses. We compared the DNA methylation status in tumor versus normal patients by Wilcoxon rank-sum test. Absolute log fold change (FC) ≥0, correlation coefficient (Cor) < −0.5 and adjusted P < 0.05 were used as screening conditions. Finally, we screened out 143 methylation-driven genes for further analyses according to the requirements of the MethylMix algorithm. To further narrow the predictors, a least absolute shrinkage and selection operator (LASSO) regression was used to narrow the range of methylation-driven genes. A strong correlation often exists between the variables, indicating that high dimensionality and collinearity. And this LASSO model method could decrease the characteristic dimension. Then, a multivariable Cox regression model to select driven genes that were most closely associated with survival was constructed and six methylation-driven genes were retained (22, 23). Moreover, a total of 214 CRC patients contained both DNA methylation and expression data were collected from patients for surgery at the Royal Brisbane and Women’s Hospital in Brisbane, Australia, a consecutive manner between 2009 and 2012 (24). We analyzed these six methylation-driven genes whose correlation between the methylation levels of promoter probes and those gene expressions to further validate whether are the candidate methylation-driven genes. The correlation between methylation level in the promoter region and their corresponding gene expression level was calculated by Pearson’s rank. The data have been stored at EMBL-EBI (https://www.ebi.ac.uk/arrayexpress/) from the ArrayExpress database. The accession numbers are E-MTAB-7036 (methylation) and E-MTAB-8148 (expression).

Construction and Validation of a Prognostic Risk-Assessment Model

To better assess the prognostic predictive power of those methylation-driven genes, we construct a prognostic risk-score model by multivariable Cox analysis: In which, N represents the number of methylation-driven genes; Exp is the expression level of every driven gene, and Coef is the coefficient of multivariable Cox regression analysis in the model. Risk score (RS) is a multimode weighted sum of the prognostic risk value of each sample. Six methylation-driven genes could combine 2n−1 = 63 signatures, therefore, every CRC patient has 63 prognostic risk scores. In the training set, the hazard ratios (HR) and the area under curves (AUCs) values from the prognostic score of the 63 signatures were analyzed. We constructed the best prognostic risk model by comparing each AUC value in 63 signatures. To validate the predictive capability of the best risk-assessment model, we obtained three gene expression arrays of human CRC datasets [GSE17536 (N = 177), GSE17537 (N = 55), and GSE72970 (N = 123)] from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), serving as a validation cohort (N = 355) (25–27). To minimize batch effects from different microarray platforms, samples in three different datasets were selected from the same chip platform (Affymetrix Human Genome U133 Plus 2.0 Array) and normalized with by Bioconductor package Sva (28, 29).

Gene Set Enrichment Analysis (GSEA)

To explore the potential biological function and promising signaling pathways correlated with the methylation of driven genes, GSEA was conducted to analyze the biological function of four genes using the Java GSEA v4.0.1 software (http://software.broadinstitute.org/gsea/datasets.jsp). The files of ontology gene sets were collected from the Gene Ontology (GO) (c5.all.v7.1.symbols) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (c2.cp.kegg.v7.1.symbols) databases. The screening conditions of significant pathways and biological functions were the absolute value of normalized enrichment score (NES) >1, P-value <0.05, and false discovery rate (FDR) q value <0.05.

Statistical Analysis

The median cut-off value divided CRC patients into high-risk and low-risk groups. The analysis of time-dependent ROC curves and Kaplan-Meier survival analysis were utilized to compare the survival rates at different follow-up time points and the difference of the OS between the two groups for CRC patients. Then, univariable and multivariable Cox regression analyses were performed to illustrate whether the methylation signature model is serving as an independent indicator. Before conducting multivariable Cox regression models, we successfully estimated the assumption by the equal-proportional hazards assumption. Moreover, in order to evaluate further the survival probability of individual patient’s outcome events, the clinical factors (age, gender, and TNM staging) and risk score of genes signature were used to build the nomogram by utilizing the rms and the Hmisc packages in R. In the nomogram, each patient had a score for predicting each survival probability, and a higher number of total points represented a worse outcome for the patient. Calibration curves were calculated to estimate the efficiency of the nomogram. VanderWeele’s mediation analysis was utilized to explore whether the effect of the methylation signature on prognosis is affected by their mRNA expression (30). The total effect of methylation on prognosis (HRTotal) was split into two effects, including the direct effect (HRDirect) which represents the direct effect of the methylation on prognosis, and the indirect effect (HRIndirect) that indicates the prognostic effect of methylation mediated through gene expression. All analyses were performed with the R Statistical Program (version 3.6.1). P-value <0.05 were considered statistically significant.

Results

Clinical Characteristics of the Patients

The clinical information of CRC patients contained a training cohort (N = 367) that was extracted from the TCGA database and a validation cohort (N = 355) that was obtained from GEO datasets (GSE17536, GSE17537, and GSE72970). The patients’ characteristics are summarized in .
Table 1

Summary of patient demographics and clinical characteristics.

CharacteristicsGroupsPatients
Total (N = 722)Training set (N = 367)Testing set (N = 355)
No.%No.%No.%
Age at diagnosis
Median65.364.463.7
Range21.0–97.031.0–90.021.0–94.0
<65 years35449.017246.918251.3
≥65 years36851.019553.117348.7
Gender
Male39454.619954.219554.9
Female32845.416845.816045.1
TNM stage
I8611.95515.0318.7
II21629.914138.47521.1
III20828.811731.99125.6
IV21229.45414.715844.5
Vital status
Living45863.428778.217148.2
Dead26436.68021.818451.8
Summary of patient demographics and clinical characteristics.

Identification and Validation of CRC Methylation−Driven Genes

By the MethylMix algorithm, we identified 143 methylation-driven genes that were transcriptionally regulated with methylation status. The process of determining and analyzing methylation-driven genes signature is displayed in . These genes are summarized in and . After screening out these 143 methylation-driven genes, we included these genes in the LASSO model. We found that when the λ value is 0.038, the cross-validation error coefficient of the model is lowest, and the corresponding genes are ten (ANXA9, BATF, PHYHIPL, RBP1, PNPLA4, FCGBP, GIPC2, FGC2, FAM131A, and SERPINA1) ( ). Then, 10 genes obtained by the LASSO regression model were incorporated into the multivariable Cox model. And finally obtained six methylation-driven genes (ANXA9, BATF, PHYHIPL, RBP1, PNPLA4, and SERPINA1) ( ). We further validated the correlation between methylation level of probes in the promoter region and corresponding gene expression level in a total of 214 patients from the ArrayExpress database. Due to the partially missing in the methylation 450K bead chip data, we validated only four methylation-driven genes (ANXA9, BATF, RBP1, and SERPINA1). However, the stable results of candidate genes were similar to training sets ( ).
Figure 1

Identification of methylation-driven genes in CRC patients. (A) Heat map of 143 CRC-related methylation-driven genes. The color change from green to red illustrates a trend from hypomethylation to hypermethylation. |log FC|≥0, adjusted P < 0.05, and Cor <−0.5. CRC, colorectal cancer; FC, fold change. (B) Selection of driven genes in the LASSO model. (C) Tuning parameter (λ) selection in the LASSO model used cross-validation via the maximum criteria. The dotted vertical lines were drawn at the optimal values using the maximum criteria and the one standard error of the maximum criteria.

Identification of methylation-driven genes in CRC patients. (A) Heat map of 143 CRC-related methylation-driven genes. The color change from green to red illustrates a trend from hypomethylation to hypermethylation. |log FC|≥0, adjusted P < 0.05, and Cor <−0.5. CRC, colorectal cancer; FC, fold change. (B) Selection of driven genes in the LASSO model. (C) Tuning parameter (λ) selection in the LASSO model used cross-validation via the maximum criteria. The dotted vertical lines were drawn at the optimal values using the maximum criteria and the one standard error of the maximum criteria.

Construction and Validation of the Prognostic Risk-Assessment Model in the Training and Testing Sets

According to the risk score of the prognostic model in the training set, these six methylation-driven genes have 26−1 = 63 possible combinations and relevant prognostic risk scores. By calculating AUC values of 63 signatures, we found that the expression signature consisted of BATF, PHYHIPL, PNPLA4, and RBP1 was served as a better prognostic signature ( ). The prognostic risk score of these combined four genes was determined as follows: Risk score = (0.253 × expression level of BATF) + (0.147 × expression level of PHYHIPL) + (−0.183 × expression level of PNPLA4) + (−0.172 × expression level of RBP1) ( ). The AUC value of four methylation-driven genes signature was 0.876, demonstrating a better capability of prediction with the 9-year OS of CRC patients. The Kaplan-Meier survival analysis demonstrated that CRC patients in the high-risk group had poorer survival than those in the low-risk group (HR = 2.184, 95% CI: 1.404–3.396, P < 0.001) ( ). Moreover, we further analyzed the difference of expression levels of four genes in tumor and normal tissues and found that the expression level of PHYHIPL (P = 0.002) in CRC tissues is lower than that of normal tissue. While the expression level of BATF in normal tissue is lower than that of CRC tissue (P = 0.002). However, the expression levels of PNPLA4 and RBP1 are not significantly different between CRC tissue and normal tissue ( ).
Table 2

Identified four methylation-driven genes in the prognostic signature and their multivariable Cox associated with prognosis.

Gene symbolCoefficient a HRHR (95% Low)HR (95% High) P-value a
BATF 0.2531.2881.0881.5260.003
PHYHIPL 0.1471.1581.0461.2820.005
PNPLA4 −0.1830.8330.6911.0030.053
RBP1 −0.1720.8420.7320.9680.015

Derived from the multivariable Cox regression analysis in the training set.

Figure 2

Construction of four-gene risk score model in the TCGA dataset. (A) Distribution of risk scores in the high-risk and low-risk groups. (B) Survival overview in two high-risk and low-risk groups. (C) Heatmap of the four-gene expression profiles corresponding risk scores in the high-risk and low-risk groups in the TCGA database. (D) Comparison of OS between the high-risk and low-risk groups. OS, overall survival.

Identified four methylation-driven genes in the prognostic signature and their multivariable Cox associated with prognosis. Derived from the multivariable Cox regression analysis in the training set. Construction of four-gene risk score model in the TCGA dataset. (A) Distribution of risk scores in the high-risk and low-risk groups. (B) Survival overview in two high-risk and low-risk groups. (C) Heatmap of the four-gene expression profiles corresponding risk scores in the high-risk and low-risk groups in the TCGA database. (D) Comparison of OS between the high-risk and low-risk groups. OS, overall survival. To validate the predictive capability of the expression prognostic genes signature, the same prognostic model was used to calculate the risk scores of a total of 355 CRC patients in the independent testing set of the GEO database. The Kaplan-Meier survival analysis showed CRC patients in the high-risk group had significantly poorer survival than those in the low-risk group (HR = 1.963, 95% CI: 1.456–2.647, P < 0.001) ( ). These results were similar to those in the training set. Furthermore, we built the mediation model underlying the mediation pathway of methylation, mRNA expression, and OS by VanderWeele’s mediation analysis ( ). The effect of the methylation signature of combined four genes on prognosis was mostly mediated by their corresponding mRNA expression (HRindirect = 1.473, 95% CI: 1.165–1.862, P = 0.001, Proportion mediated, 69.10%). After excluding the methylation and expression of each gene, the result of sensitivity analysis retained statistically significant in the indirect effect ( ).
Figure 3

Mediation analysis for methylation-driven prognostic signature through mRNA expression. (A) Diagram of a mediation model. (B) The risk score of four methylation-driven genes’ methylation level was considered as “exposure” (scoremethylation); the mediator was the linear combination of the corresponding four genes’ expression level (scoreexpression) (Overall model). Total prognostic effect in the hazard ratio (HR) was described as direct effect (HRdirect), indirect effect (HRindirect), corresponding 95% CI, and the proportion of effect mediated (M%). Furthermore, sensitivity analyses were performed by excluding each gene, respectively, which retained statistical significance for the mediation effect. CI, confidence interval.

Mediation analysis for methylation-driven prognostic signature through mRNA expression. (A) Diagram of a mediation model. (B) The risk score of four methylation-driven genes’ methylation level was considered as “exposure” (scoremethylation); the mediator was the linear combination of the corresponding four genes’ expression level (scoreexpression) (Overall model). Total prognostic effect in the hazard ratio (HR) was described as direct effect (HRdirect), indirect effect (HRindirect), corresponding 95% CI, and the proportion of effect mediated (M%). Furthermore, sensitivity analyses were performed by excluding each gene, respectively, which retained statistical significance for the mediation effect. CI, confidence interval.

Assessment of the Predictive Performance of the Expression Prognostic Model by Time-Dependent ROC Curves and the Nomogram

According to a time-dependent ROC curves analysis, in the training set, we observed that their AUC values were 0.626 at 3 years, 0.670 at 5 years, and 0.885 at 10 years, respectively ( ). We further observed AUC values in the testing set, with 3-, 5-, and 8-year were 0.695, 0.716, and 0.803, respectively ( ). Then, we investigated whether the risk score of genes signature was used as an independent indictor for CRC patients by univariable and multivariable Cox analyses, and found that the prognostic score was an independent prognostic factor in the training set (high-risk group vs low-risk group, HR = 2.221, 95% CI: 1.382–3.571, P = 0.001). However, the result in the testing set was a little bit low (high-risk group vs low-risk group, HR = 1.436, 95% CI: 1.051–1.962, P = 0.023) ( ).
Figure 4

Predictive OS performance of the signature using time-dependent ROC analysis and the nomogram in training and validation sets. (A) Time-dependent ROC curves analysis for the 3-, 5-, and 10-year OS prediction by signature in the training set. (B) Time-dependent ROC curves analysis for the 3-, 5-, and 8-year OS prediction by signature in the testing set. (C) Nomogram to predict the 1-, 5-, and 10-year OS of CRC patients in the training set. (D) Calibration curves of 5-year OS nomogram model in the training set. (E) Nomogram to predict the 1-, 3-, and 5-year OS of CRC patients in the testing set. (F) Calibration curves of 5-year OS nomogram model in the testing set. The gray line represents the ideal predictive model, and the red line represents the observed model.

Table 3

Univariable and multivariable Cox regression analyses of the four methylation-driven genes signature and survival of CRC patients in the training and testing sets.

VariablesTraining set (N = 367)Testing set (N = 355)
95% CI95% CI
HRLowerUpper P HRLowerUpper P
Univariable analysis
Age
 ≥65 years vs <65 years2.1701.3283.5470.0020.9380.7021.2530.664
Sex
 Male vs female1.4490.9232.2740.1070.9580.7171.2820.774
TNM stage
 III+IV vs I + II2.7651.7414.3910.0004.2512.7426.5910.000
Four genes signature
 High risk vs low risk2.3511.4723.7550.0001.9631.4562.6470.000
Multivariable analysis
Age
 ≥65 years vs<65 years2.3551.4213.9030.0011.2700.9421.7120.117
Sex
 Male vs female1.1230.7121.7710.6180.9420.7021.2640.690
TNM stage
 III+IV vs I + II3.2912.0495.2860.0003.9672.5086.2740.000
Four genes signature
 High risk vs low risk2.2211.3823.5710.0011.4361.0511.9620.023
Predictive OS performance of the signature using time-dependent ROC analysis and the nomogram in training and validation sets. (A) Time-dependent ROC curves analysis for the 3-, 5-, and 10-year OS prediction by signature in the training set. (B) Time-dependent ROC curves analysis for the 3-, 5-, and 8-year OS prediction by signature in the testing set. (C) Nomogram to predict the 1-, 5-, and 10-year OS of CRC patients in the training set. (D) Calibration curves of 5-year OS nomogram model in the training set. (E) Nomogram to predict the 1-, 3-, and 5-year OS of CRC patients in the testing set. (F) Calibration curves of 5-year OS nomogram model in the testing set. The gray line represents the ideal predictive model, and the red line represents the observed model. Univariable and multivariable Cox regression analyses of the four methylation-driven genes signature and survival of CRC patients in the training and testing sets. We further built a nomogram, including the risk score of signature and clinical factors (age, gender, and TNM stage). The nomogram served as an individual’s prognostic predictor to predict the probability of overall survival with 1-, 5-, and 10-year for CRC patients ( ). Moreover, in the training set, calibration plots demonstrated that the nomogram had similar predictive performance compared with an ideal model in predicting the 5-year OS for CRC patients ( ). Similar results were observed in the testing set ( ) (Concordance-index: 0.747 in the training set and 0.707 in the testing set). Additionally, compared with the TNM staging system, the nomogram had a higher C-index in predicting the OS for CRC patients in the training and testing sets ( ).

Subgroup Analyses of the Prognostic Performance of the Methylation-Driven Genes Signature

To determine whether our model was highly applicable and precisely predict the OS of CRC patients, we performed subgroup analyses based on different clinical characteristics (age, gender, and TNM stage). The prognostic effect of the genes signature in different age groups, female groups, TNM stage groups revealed that CRC patients in the high-risk group had significantly poorer survival than those in the low-risk group (P < 0.001). However, in the male, similar results could not be observed in the training set ( ). Similar results were also observed in the testing set ( ).

Comparison of Prognostic Risk Model With Other Prognostic Biomarkers in CRC

The ROC curves analysis for other prognostic biomarkers was analyzed just as our expression prognostic risk model, the results indicated that the AUC value of our four-gene signature was better than that of other known prognostic biomarkers (AUC = 0.794). The AUC values of these biomarkers are summarized in and . These results revealed that our genes signature had better predictive performance in predicting the long-term OS of CRC patients.

Functional Enrichment Analysis of Four Methylation-Driven Genes

We further explored the biological functions of the four genes by GSEA 4.0.1 software and found that the expression level of BATF may be related to the “regulation of viral process” and “non-small cell lung cancer.” The expression level of PHYHIPL may be related to the function of “blastocyst growth” and “WNT signaling pathway.” However, the FDR value is more than 0.25, there may be false-positive results. Moreover, we found that the expression level of PNPLA4 may be related to the function of “peroxisome” in both GO and KEGG functional enrichment. The expression level of RBP1 may be related to the “morphogenesis of a polarized epithelium” and the “WNT signaling pathway.” However, the FDR value is 1.000, there may be false-positive results ( ).

Discussion

Because CRC patients with the same pathological staging often differ in survival, a new prognostic assessment model is required to indicate biological heterogeneity, appropriately guide clinical assessment and intervention, and individualize treatment (6). Previous studies have indicated that DNA methylation, an epigenetic modification, regulates gene expression in the development and progression of cancer (31). Moreover, the comprehensive analysis of DNA methylation and gene expression data can better analyze the regulatory function of methylation and effectively predict the prognosis of tumor patients (32). Therefore, methylation-driven genes may be identified as potential prognostic biomarkers with involvement in pathogenesis (17, 33). Besides, the development and progression of tumors involve the process of a complex regulatory network. Compared with a single biomarker, integrating multiple biomarkers into a combined model could better assess the prognostic value (34). We construct a prognostic model based on four methylation-driven genes and provide a comprehensive prospect for both basic research and clinical applications of methylation-driven genes. In this study, we used different statistical analyses and the LASSO penalized model obtaining 143 methylation-driven genes. Four out of them (BATF, PHYHIPL, PNPLA4, and RBP1) were identified as genes associated with CRC prognosis, which were selected to develop a prognostic score model and validated the model in external testing set. The results showed that the prognostic score was significantly associated with the OS of CRC patients, demonstrating that CRC patients in the high-risk group have significantly poorer survival than those in the low-risk group. The AUC value based on genes signature was 0.874 in predicting the 9-year of OS for CRC patients in the training set. We further revealed that the risk score of prognostic signature could serve as an independent indictor of patient survival without the effect of age, gender, and TNM stage. Besides, the nomogram was generated to predict the survival probability of individual patients’ models, thus evaluating the probability of outcome events. The calibration plots indicated that the predicted survival was close to the actual survival status (C-index: 0.747). These results revealed the obvious predictive capability of genes signature on the prognosis of CRC patients. Moreover, in the stratified analysis, our prognostic model performed well stability for predicting the survival of CRC patients in different age, female, and TNM stage groups in the training and testing sets. However, the males’ group in the training set could not distinguish between low- and high-risk groups. Since this is the first study of methylation-driven genes for CRC, large sample sizes may be necessary to further analyze in the future. Additionally, a comparison of our prognostic signature with other prognostic biomarkers revealed that it had a higher predictive performance with OS of CRC patients. After a series of analyses, our study provides four prognostic genes. Among these genes, three (BATF, PHYHIPL, and RBP1) have been reported as cancer-associated genes. BATF, a transcription factor, belongs to a highly conserved member of activator protein 1 (AP-1) and a family of the basic leucine zipper ATF-like transcription factor (BATF) (35). A series of studies suggest that BATF may influence the development of different types of cancer, including non-small cell lung cancer (NSCLC), lymphoma, and multiple myeloma (36, 37). Such as, BATF might active NSCLC cell proliferation and apoptosis in BATF-silenced A549 cells (38). In addition, BATF is a gene that inhibits T cell function, inhibitory receptors can cause T cell exhaustion by upregulating BATF (39). Recently a study has found that increased expression of BATF, a significant positive correlation that existed with PDCD1 expression, may suppress CD8+ T function and affect the development of colorectal cancer (40). Phytanoyl-CoA 2-hydroxylase-interacting protein-like gene (PHYHIPL), a protein-encoding gene, may correlate with the prostatic small cell carcinoma (41). Not much is known about the function of PHYHIPL now. Previous findings from TCGA database reported that the downregulation of PHYHIPL is associated with poor OS, demonstrating that this gene is involved in the development of Glioblastoma multiforme (GBM) (42). RBP1 (Retinol Binding Protein 1), is also named Cellular Retinol Binding Protein 1 (CRBP1) and is located in the cytogenetic region 3q23 (43). RBP1 is considered a chaperone-like molecule to regulate the phase of retinol signaling and affect the proliferation and differentiation of epithelial cells (44). Recent studies have found that the expression of RBP1 has been reported in many tumor cells, including breast carcinoma (45), lung adenocarcinoma (46), tongue squamous cell carcinoma (47), and cervical cancer (48). Recent studies suggest that RBP1 hypermethylation and low expression level are associated with a poor prognosis in various cancer. For example, in EBV-associated gastric carcinoma, hypermethylation of RBP1 in the promoter region, correlated with the upregulation of RBP1, which demonstrated that patients with CpG island methylator phenotype-high (CIMP-H) have poorer survival than those with CIMP-low in gastric carcinoma (49). PNPLA4 (Patatin Like Phospholipase Domain Containing 4) belongs to a member of the patatin-like family of phospholipases, which may be involved in adipocyte triglyceride homeostasis of HeLa cells (50). Although the function of this gene is still not well known, we observed a significant negative correlation between methylation and expression level of PNPLA4. Therefore, PNPLA4 may indicate a novel CRC biomarker, and further experiments are required to validate this finding. To the best of our knowledge, this is the first predictive risk model of CRC based on methylation-driven genes. These four genes have not been previously reported on the underlying mechanism of them and studied as a prognostic biomarker in CRC patients. Our study provides a foundation for further exploration into the functions of the four genes. Other strengths include that, compared with previous studies based on methylation-driven genes in other cancers, our study firstly utilized different testing sets to separately validate methylation-driven genes and prognostic models from multi-public datasets. Additionally, we acknowledge several possible limitations to the present study. Firstly, the development and evaluation of this prognostic model were based on publicly available datasets. To further confirm this model, large sample sizes, multicenter, and prospective clinical cohorts may be necessary for the future. Secondly, studies are needed to further verify the biological mechanisms behind the values of these genes for CRC. Regardless, our results showed a significantly consistent association of the signature with OS in different datasets, demonstrating that it serves as a potential prognostic biomarker for CRC. In summary, we identified 143 methylation-driven genes by integrative analysis of both methylation and expression profiles and selected four of them (BATF, PHYHIPL, RBP1, and PNPLA4) to construct a prognostic risk model. This study reveals that a four-gene methylation-driven prognostic signature accurately predicts the OS of CRC patients and could be a promising marker for improving the clinical prognostic evaluation of CRC patients. DNA methylation-driven genes may be a potentially useful novel biomarker for predicting CRC prognosis.

Data Availability Statement

The original contributions presented in the study are included in the article/ . Further inquiries can be directed to the corresponding authors.

Author Contributions

HH and JF performed research and drafted the manuscript. LZhang, JX, and DL collected the data, analyzed the data. TZ, CJ, and JO re-analysis results and interpretation. DZ, LYZ, and SS performed the figures, edited the data. YZ, BC, and LZhu revised the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  50 in total

1.  Cancer statistics, 2018.

Authors:  Rebecca L Siegel; Kimberly D Miller; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2018-01-04       Impact factor: 508.702

2.  Safe Feature Screening for Generalized LASSO.

Authors:  Shaogang Ren; Shuai Huang; Jieping Ye; Xiaoning Qian
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2017-11-22       Impact factor: 6.226

Review 3.  Analysis of DNA methylation in cancer: location revisited.

Authors:  Alexander Koch; Sophie C Joosten; Zheng Feng; Tim C de Ruijter; Muriel X Draht; Veerle Melotte; Kim M Smits; Jurgen Veeck; James G Herman; Leander Van Neste; Wim Van Criekinge; Tim De Meyer; Manon van Engeland
Journal:  Nat Rev Clin Oncol       Date:  2018-07       Impact factor: 66.675

4.  Claudin gene expression profiles and clinical value in colorectal tumors classified according to their molecular subtype.

Authors:  Sara Cherradi; Pierre Martineau; Céline Gongora; Maguy Del Rio
Journal:  Cancer Manag Res       Date:  2019-02-13       Impact factor: 3.989

5.  High expression of cellular retinol binding protein-1 in lung adenocarcinoma is associated with poor prognosis.

Authors:  Elena Doldo; Gaetana Costanza; Amedeo Ferlosio; Eugenio Pompeo; Sara Agostinelli; Guido Bellezza; Donatella Mazzaglia; Alessandro Giunta; Angelo Sidoni; Augusto Orlandi
Journal:  Genes Cancer       Date:  2015-11

6.  Engagement of CD99 Reduces AP-1 Activity by Inducing BATF in the Human Multiple Myeloma Cell Line RPMI8226.

Authors:  Minchan Gil; Hyo-Kyung Pak; Seo-Jeong Park; A-Neum Lee; Young-Soo Park; Hyangsin Lee; Hyunji Lee; Kyung-Eun Kim; Kyung Jin Lee; Dok Hyun Yoon; Yoo-Sam Chung; Chan-Sik Park
Journal:  Immune Netw       Date:  2015-10-26       Impact factor: 6.303

7.  CRBP-1 over-expression is associated with poor prognosis in tongue squamous cell carcinoma.

Authors:  Yue Chen; Tian Tian; Min-Jie Mao; Wei-Ye Deng; Hao Li
Journal:  BMC Cancer       Date:  2018-05-02       Impact factor: 4.430

8.  The AP-1-BATF and -BATF3 module is essential for growth, survival and TH17/ILC3 skewing of anaplastic large cell lymphoma.

Authors:  Nikolai Schleussner; Olaf Merkel; Mariantonia Costanza; Huan-Chang Liang; Franziska Hummel; Chiara Romagnani; Pawel Durek; Ioannis Anagnostopoulos; Michael Hummel; Korinna Jöhrens; Antonia Niedobitek; Patrick R Griffin; Roberto Piva; Henrike L Sczakiel; Wilhelm Woessmann; Christine Damm-Welk; Christian Hinze; Dagmar Stoiber; Bernd Gillissen; Suzanne D Turner; Eva Kaergel; Linda von Hoff; Michael Grau; Georg Lenz; Bernd Dörken; Claus Scheidereit; Lukas Kenner; Martin Janz; Stephan Mathas
Journal:  Leukemia       Date:  2018-03-28       Impact factor: 11.528

9.  A merged lung cancer transcriptome dataset for clinical predictive modeling.

Authors:  Su Bin Lim; Swee Jin Tan; Wan-Teck Lim; Chwee Teck Lim
Journal:  Sci Data       Date:  2018-07-24       Impact factor: 6.444

10.  Distinct epigenetic features of tumor-reactive CD8+ T cells in colorectal cancer patients revealed by genome-wide DNA methylation analysis.

Authors:  Rui Yang; Sijin Cheng; Nan Luo; Ranran Gao; Kezhuo Yu; Boxi Kang; Li Wang; Qiming Zhang; Qiao Fang; Lei Zhang; Chen Li; Aibin He; Xueda Hu; Jirun Peng; Xianwen Ren; Zemin Zhang
Journal:  Genome Biol       Date:  2019-12-31       Impact factor: 13.583

View more
  5 in total

1.  Evidence of Omics, Immune Infiltration, and Pharmacogenomics for BATF in a Pan-Cancer Cohort.

Authors:  Chenguang Jia; Yihui Ma; Mengyang Wang; Wen Liu; Feng Tang; Jincao Chen
Journal:  Front Mol Biosci       Date:  2022-04-29

2.  Gastric Cancer Pre-Stage Detection and Early Diagnosis of Gastritis Using Serum Protein Signatures.

Authors:  Shahid Aziz; Faisal Rasheed; Rabaab Zahra; Simone König
Journal:  Molecules       Date:  2022-04-30       Impact factor: 4.927

3.  Comprehensive Analysis of DNA Methylation and Transcriptome to Identify PD-1-Negative Prognostic Methylated Signature in Endometrial Carcinoma.

Authors:  Lu Cao; Xiaoqian Ma; Pengfei Rong; Juan Zhang; Min Yang; Wei Wang
Journal:  Dis Markers       Date:  2022-05-18       Impact factor: 3.464

4.  Diagnostic and prognostic role of basic leucine zipper transcription factor in kidney renal clear cell carcinoma.

Authors:  Hui Zhang; Hui Zhang; Yiming Hu; Bin Huang; Junxing Chen; Lingwu Chen
Journal:  Transl Androl Urol       Date:  2022-02

5.  Integrated Analysis of DNA methylation and transcriptome profile to identify key features of age-related macular degeneration.

Authors:  Zhijie Wang; Yinhua Huang; Feixue Chu; Kai Liao; Zekai Cui; Jiansu Chen; Shibo Tang
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.