| Literature DB >> 36230858 |
John Adeoye1, Chi Ching Joan Wan1, Li-Wu Zheng1, Peter Thomson2, Siu-Wai Choi1, Yu-Xiong Su1.
Abstract
This study aims to examine the feasibility of ML-assisted salivary-liquid-biopsy platforms using genome-wide methylation analysis at the base-pair and regional resolution for delineating oral squamous cell carcinoma (OSCC) and oral potentially malignant disorders (OPMDs). A nested cohort of patients with OSCC and OPMDs was randomly selected from among patients with oral mucosal diseases. Saliva samples were collected, and DNA extracted from cell pellets was processed for reduced-representation bisulfite sequencing. Reads with a minimum of 10× coverage were used to identify differentially methylated CpG sites (DMCs) and 100 bp regions (DMRs). The performance of eight ML models and three feature-selection methods (ANOVA, MRMR, and LASSO) were then compared to determine the optimal biomarker models based on DMCs and DMRs. A total of 1745 DMCs and 105 DMRs were identified for detecting OSCC. The proportion of hypomethylated and hypermethylated DMCs was similar (51% vs. 49%), while most DMRs were hypermethylated (62.9%). Furthermore, more DMRs than DMCs were annotated to promoter regions (36% vs. 16%) and more DMCs than DMRs were annotated to intergenic regions (50% vs. 36%). Of all the ML models compared, the linear SVM model based on 11 optimal DMRs selected by LASSO had a perfect AUC, recall, specificity, and calibration (1.00) for OSCC detection. Overall, genome-wide DNA methylation techniques can be applied directly to saliva samples for biomarker discovery and ML-based platforms may be useful in stratifying OSCC during disease screening and monitoring.Entities:
Keywords: DNA methylation; biomarkers; diagnosis; epigenomics; oral cancer; oral potentially malignant disorders
Year: 2022 PMID: 36230858 PMCID: PMC9563273 DOI: 10.3390/cancers14194935
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Characteristics of patients included for RRBS analysis.
| Variables | OSCC (%) | OPMD (%) | Total | ||
|---|---|---|---|---|---|
| Age | Median (IQR) | 65 (57–72) | 65.5 (53.8–72.8) | 65 (57–72) | 0.986 a |
| Sex | Female | 9 (52.9) | 8 (50.0) | 17 (51.5) | 0.866 b |
| Male | 8 (47.1) | 8 (50.0) | 16 (48.5) | ||
| Site affected | Buccal | 3 (17.6) | 12 (75.0) | 15 (45.5) | 0.011 b |
| Palate | 1 (5.9) | 1 (6.3) | 2 (6.1) | ||
| Tongue | 8 (47.1) | 3 (18.8) | 11 (33.3) | ||
| Gingiva | 5 (29.4) | 0 | 5 (15.2) | ||
| Risk habit category | NSND | 11 (64.7) | 7 (43.8) | 18 (54.5) | 0.227 b |
| SD | 6 (35.3) | 9 (56.3) | 15 (45.5) | ||
| Charlson comorbidity index | Median (IQR) | 1 (0–2.5) | 0 | 0 (0–1) | 0.046 a |
| Family history of cancer | Yes | 4 (23.5) | 4 (25.0) | 8 (24.2) | 0.922 b |
| No | 13 (76.5) | 12 (75.0) | 25 (75.8) | ||
| Hypertension | Yes | 5 (29.4) | 2 (12.5) | 7 (21.2) | 0.235 b |
| No | 12 (70.6) | 14 (87.5) | 26 (78.8) | ||
| Tumor stage | Stage I/II | 6 (35.6) | |||
| Stage III/IV | 11 (64.7) | ||||
| Tumor grade | Well differentiated | 5 (29.4) | |||
| Moderately differentiated | 9 (52.9) | ||||
| Poorly differentiated | 3 (17.6) | ||||
a Mann-Whitney U test; b Pearson’s Chi Square test/Fisher’s exact analysis.
Figure 1Base-pair resolution analysis for identification and description of DMCs. (A) Heatmap comprising the methylation percentages of differential CpG sites for all samples (B) Volcano plot of weighted mean methylation difference for hypermethylated (red) and hypomethylated (blue) DMCs (C) Autosomal annotation of DMCs (D) Genomic annotation of DMCs (E) CpG island annotation of DMCs.
Figure 2Gene ontology (GO-BP) and KEGG pathway analysis enrichment of genes associated with DMCs. (A) List of biological processes based on the DMC count after GO-BP analysis. (B) List of significant biological processes associated with DMC genes after GO-BP analysis. (C) List of biological pathways enriched for DMC-associated genes.
Figure 3Performance of machine learning models for predicting OSCC using DMC as features. (A) Initial models comprising all 1745 DMCs. (B) Selected features by the three feature-selection methods and their concordance. (C) Machine learning models based on ANOVA-selected DMC sets for predicting OSCC. (D) Machine learning models based on MRMR-selected DMC sets for predicting OSCC. (E) Machine learning models based on LASSO-selected DMC sets for predicting OSCC. (F) Machine learning models based on six consensual DMCs for predicting OSCC.
Figure 4100 bp regional methylation analysis for description of DMRs. (A) Heatmap comprising the methylation percentages of aberrantly methylated regions in all samples. (B) Autosomal annotation of DMCs. (C) Genomic annotation of DMRs. (D) CpG island annotation of DMRs.
Figure 5Performance of machine learning models for predicting OSCC using DMRs as features. (A) Initial models comprising all 105 DMRs. (B) Machine learning models based on ANOVA-selected DMR sets for predicting OSCC. (C) Machine learning models based on MRMR-selected DMR sets for predicting OSCC. (D) Machine learning models based on LASSO-selected DMR sets for predicting OSCC. (E) Selected DMR features by the three feature-selection methods and their concordance. (F) Machine learning models based on eight consensual DMRs for predicting OSCC.