| Literature DB >> 35671506 |
Ran Hu1,2,3, Xianghong Jasmine Zhou1,3, Wenyuan Li1,3.
Abstract
Developing cancer prognostic models using multiomics data is a major goal of precision oncology. DNA methylation provides promising prognostic biomarkers, which have been used to predict survival and treatment response in solid tumor or plasma samples. This review article presents an overview of recently published computational analyses on DNA methylation for cancer prognosis. To address the challenges of survival analysis with high-dimensional methylation data, various feature selection methods have been applied to screen a subset of informative markers. Using candidate markers associated with survival, prognostic models either predict risk scores or stratify patients into subtypes. The model's discriminatory power can be assessed by multiple evaluation metrics. Finally, we discuss the limitations of existing studies and present the prospects of applying machine learning algorithms to fully exploit the prognostic value of DNA methylation.Entities:
Keywords: DNA methylation; cancer prognosis; feature selection; high dimensionality; prognostic model
Mesh:
Substances:
Year: 2022 PMID: 35671506 PMCID: PMC9419965 DOI: 10.1089/cmb.2022.0002
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.549
Summary of Reviewed Studies
| Publication | Cancer type | Methylation data type | Sample size | Feature selection method | Candidate marker | Prognostic model |
|---|---|---|---|---|---|---|
|
| ||||||
| Guo et al. ( | Bladder cancer | Illumina 450K array | 18 paired tumor and adjacent normal tissue samples for feature selection; 357 tumor samples for prognosis analysis | Differentially expressed CpG sites, Cox regression, LASSO, SVM-RFE | 8 CpGs | Cox regression, nomogram |
| Zhang et al. ( | Breast cancer | Illumina 450K array | 776 tumor samples | Univariate and multivariate Cox regression | 6 CpGs | Cox regression |
| Ren et al. ( | Breast cancer | Illumina 450K and 27K array | 8000 normal samples for feature selection; 1076 tumor samples and 122 adjacent normal tissue samples for prognosis analysis | Penalized regression | 353 CpGs | Elastic net regression, Cox regression |
| Hao et al. ( | Breast cancer, lung cancer | Illumina 450K array | 520 tumor samples, 65 adjacent normal tissue samples from breast cancer patients; 585 tumor samples, 49 adjacent normal tissue samples from lung cancer patients | Univariable Cox regression, LASSO, boosting | 29 CpGs from LASSO and 11 CpGs from boosting for breast cancer; 75 CpGs from LASSO and 52 CpGs from boosting for lung cancer | Cox regression |
| Li et al. ( | Colorectal cancer | Illumina 450K array | 307 tumor samples, 38 adjacent normal tissue samples | DMCs located near TSS and have significantly negative correlations with DEGs, SIS, and stepwise regression | 10 CpGs | Cox regression |
| Guo et al. ( | Cutaneous melanoma | Illumina 450K array | 461 tumor samples | Univariate and multivariate Cox regression | 4 CpGs | Cox regression |
| Dai et al. ( | Gastric cancer | Illumina 450K array | 395 tumor samples | CpG sites significantly associated with OS, DSS, PFI, and DFI, univariate Cox regression | 7 CpGs | Cox regression |
| Peng et al. ( | Gastric cancer | Illumina 450K array | 363 tumor samples | CpG sites have significantly negative correlations with DEGs, univariate and multivariate Cox regression | 10 CpGs | Cox regression |
| Hu and Zhou ( | Ovarian cancer, breast cancer, and glioblastoma multiforme | Illumina 450K array | 605 ovarian tumor samples, 343 breast tumor samples, and 295 glioblastoma multiforme samples | Top 10% CpG sites with the largest degrees in the DNA methylation interaction network, univariable Cox regression | 76 CpGs for ovarian cancer, 69 CpGs for breast cancer, 88 CpGs for glioblastoma multiforme | Cox regression |
|
| ||||||
| Jiang et al. ( | ALL, AML | Illumina 450K array | 194 AML samples, 136 ALL samples, and 754 normal blood samples | DMCs, univariate Cox regression | 93 CpGs for ALL, 39 CpGs for AML | 2-means clustering |
| 23 CpGs for ALL, 20 CpGs for AML | Nearest shrunken centroids | |||||
| Yang et al. ( | Colon adenocarcinoma | Illumina 450K and 27K array | 424 tumor samples | CpGs in promoter regions, univariate and multivariate Cox regression | 356 CpGs | Consensus clustering |
| 18 CpGs | Cox regression | |||||
| Feng et al. ( | Ovarian cancer | Illumina 450K array | 108 tumor samples | DMCs, GO, and KEGG pathway enrichment analysis | 8 CpGs | Binary logistic regression |
| Yin, Zhang et al. ( | Ovarian cancer | Illumina 27K array | 571 tumor samples | Univariate and multivariate Cox regression | 250 CpGs | Consensus clustering |
| Yin, Kong et al. ( | Pancreatic cancer | Illumina 450K array | 178 tumor samples | Univariate and multivariate Cox regression | 4227 CpGs | Consensus clustering |
|
| ||||||
| Zuccato et al. ( | Chordoma | Illumina EPIC array | 68 tumor samples | Most variably methylated CpG sites | 15,000 CpGs | Consensus clustering, Cox regression |
| cfMeDIP-seq | 12 matched plasma samples from patients | Top DMRs between two clusters | 500 DMRs | Random forest | ||
| Luo et al. ( | Colorectal cancer | Targeted bisulfite sequencing | 801 plasma samples from patients | Univariable Cox regression, LASSO | 5 CpGs | Cox regression, nomogram |
| 45 CpGs | Iterative consensus clustering | |||||
| Xu et al. ( | Hepatocellular carcinoma | Targeted bisulfite sequencing | 1049 plasma samples from patients | Univariable Cox regression, LASSO | 8 CpGs | Cox regression |
ALL, acute lymphocytic leukemia; AML, acute myelogenous leukemia; cfMeDIP-seq, cell-free methylated DNA immunoprecipitation-sequencing; DEGs, differentially expressed genes; DFI, disease-free interval; DMCs, differentially methylated CpG sites; DSS, disease-specific survival; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, least absolute shrinkage and selection operator; OS, overall survival; PFI, progression-free interval; RFE, recursive feature elimination; SIS, sure independence screening; SVM-RFE, support vector machine recursive feature elimination; TSS, transcription start site.
FIG. 1.Overview of prognostic analysis of high-dimensional DNA methylation data. The left column provides the three-step procedure of prognostic analysis. First, feature selection methods identify a subset of informative markers that are associated with survival. Second, with the candidate markers, prognostic models either predict risk scores or stratify patients into subtypes. Finally, evaluation metrics are used to evaluate a model's discriminatory power. In each step, the computational methods are visualized and categorized in the right two columns. AUC, area under the ROC curve; C-index, concordance index; DEGs, differentially expressed genes; DMCs, differentially methylated CpG sites; EWAS, epigenome-wide association study; LASSO, least absolute shrinkage and selection operator; tROC, time-dependent receiver operating characteristic.