| Literature DB >> 36249027 |
Fangfang Jian1, FeiMing Huang2, Yu-Hang Zhang3, Tao Huang4,5, Yu-Dong Cai2.
Abstract
Cervical and anal carcinoma are neoplastic diseases with various intraepithelial neoplasia stages. The underlying mechanisms for cancer initiation and progression have not been fully revealed. DNA methylation has been shown to be aberrantly regulated during tumorigenesis in anal and cervical carcinoma, revealing the important roles of DNA methylation signaling as a biomarker to distinguish cancer stages in clinics. In this research, several machine learning methods were used to analyze the methylation profiles on anal and cervical carcinoma samples, which were divided into three classes representing various stages of tumor progression. Advanced feature selection methods, including Boruta, LASSO, LightGBM, and MCFS, were used to select methylation features that are highly correlated with cancer progression. Some methylation probes including cg01550828 and its corresponding gene RNF168 have been reported to be associated with human papilloma virus-related anal cancer. As for biomarkers for cervical carcinoma, cg27012396 and its functional gene HDAC4 were confirmed to regulate the glycolysis and survival of hypoxic tumor cells in cervical carcinoma. Furthermore, we developed effective classifiers for identifying various tumor stages and derived classification rules that reflect the quantitative impact of methylation on tumorigenesis. The current study identified methylation signals associated with the development of cervical and anal carcinoma at qualitative and quantitative levels using advanced machine learning methods.Entities:
Keywords: DNA methylation; anal carcinoma; cervical carcinoma; classification rule; machine learning
Year: 2022 PMID: 36249027 PMCID: PMC9557006 DOI: 10.3389/fonc.2022.998032
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Flow chart of the entire analysis process. The 485,512 methylation probes in the anal or cervical carcinoma dataset are filtered by Boruta and ranked according to feature importance by using three feature ranking algorithms, namely, MCFS, LightGBM, and LASSO. Afterward, each of three feature lists is fed into the incremental feature selection (IFS) computational framework containing two efficient classification algorithms (decision tree, random forest) to extract essential methylations, construct efficient classifiers and classification rules.
Figure 2IFS curves to show the performance (weighted F1) of decision tree (DT) and random forest (RF) under different feature subsets in the anal and cervical carcinoma datasets. (A) IFS curves for the anal carcinoma dataset. (B) IFS curves for the cervical carcinoma dataset.
Performance of the optimal classifiers on anal carcinoma dataset.
| Feature ranking algorithm | Classification algorithm | Number of features | ACC | MCC | Macro F1 | Weighted F1 |
|---|---|---|---|---|---|---|
| MCFS | DT | 17 | 0.993 | 0.975 | 0.981 | 0.993 |
| RF | 15 | 1.000 | 1.000 | 1.000 | 1.000 | |
| LightGBM | DT | 6 | 0.993 | 0.975 | 0.981 | 0.993 |
| RF | 5 | 1.000 | 1.000 | 1.000 | 1.000 | |
| LASSO | DT | 215 | 0.993 | 0.975 | 0.981 | 0.993 |
| RF | 13 | 1.000 | 1.000 | 1.000 | 1.000 |
Figure 3Performance of the optimal classifiers on three stages for anal or cervical carcinoma datasets. (A) Performance on the anal carcinoma dataset. (B) Performance on the cervical carcinoma dataset.
Performance of the optimal classifiers on cervical carcinoma dataset.
| Feature ranking algorithm | Classification algorithm | Number of features | ACC | MCC | Macro F1 | Weighted F1 |
|---|---|---|---|---|---|---|
| MCFS | DT | 4 | 1.000 | 1.000 | 1.000 | 1.000 |
| RF | 4 | 1.000 | 1.000 | 1.000 | 1.000 | |
| LightGBM | DT | 18 | 0.964 | 0.948 | 0.965 | 0.964 |
| RF | 19 | 1.000 | 1.000 | 1.000 | 1.000 | |
| LASSO | DT | 19 | 0.964 | 0.948 | 0.965 | 0.964 |
| RF | 5 | 1.000 | 1.000 | 1.000 | 1.000 |
Classification rules on anal carcinoma.
| Index | Condition | Result |
|---|---|---|
| Rule 1 | (cg01550828>0.0817) and (cg18954144>0.8291) | Tumor |
| Rule 2 | cg01550828 ≤ 0.0817 | AIN3 |
| Rule 3 | (cg01550828>0.0817) and (cg18954144 ≤ 0.8291) and (cg01550828>0.4363) | Normal |
| Rule 4 | (cg01550828>0.0817) and (cg18954144 ≤ 0.8291) and (cg01550828 ≤ 0.4363) | Tumor |
Classification rules on cervical carcinoma.
| Index | Condition | Result |
|---|---|---|
| Rule 1 | cg10417457 ≤ 0.4173 | Normal |
| Rule 2 | (cg10417457>0.4173) and (cg02871554>0.6087) | CIN3 |
| Rule 3 | (cg10417457>0.4173) and (cg02871554 ≤ 0.6087) | Tumor |