| Literature DB >> 30233644 |
Chenji Wang1, Weilin Pu2,3, Dunmei Zhao1, Yinghui Zhou1, Ting Lu1, Sidi Chen4, Zhenglei He1, Xulong Feng1, Ying Wang5, Caihua Li5, Shilin Li2, Li Jin2,3, Shicheng Guo6, Jiucun Wang2,3, Minghua Wang1.
Abstract
DNA methylation-based biomarkers were suggested to be promising for early cancer diagnosis. However, DNA methylation-based biomarkers for esophageal squamous cell carcinoma (ESCC), especially in Chinese Han populations have not been identified and evaluated quantitatively. Candidate tumor suppressor genes (N = 65) were selected through literature searching and four public high-throughput DNA methylation microarray datasets including 136 samples totally were collected for initial confirmation. Targeted bisulfite sequencing was applied in an independent cohort of 94 pairs of ESCC and normal tissues from a Chinese Han population for eventual validation. We applied nine different classification algorithms for the prediction to evaluate to the prediction performance. ADHFE1, EOMES, SALL1 and TFPI2 were identified and validated in the ESCC samples from a Chinese Han population. All four candidate regions were validated to be significantly hyper-methylated in ESCC samples through Wilcoxon rank-sum test (ADHFE1, P = 1.7 × 10-3; EOMES, P = 2.9 × 10-9; SALL1, P = 3.9 × 10-7; TFPI2, p = 3.4 × 10-6). Logistic regression based prediction model shown a moderately ESCC classification performance (Sensitivity = 66%, Specificity = 87%, AUC = 0.81). Moreover, advanced classification method had better performances (random forest and naive Bayes). Interestingly, the diagnostic performance could be improved in non-alcohol use subgroup (AUC = 0.84). In conclusion, our data demonstrate the methylation panel of ADHFE1, EOMES, SALL1 and TFPI2 could be an effective methylation-based diagnostic assay for ESCC.Entities:
Keywords: DNA methylation; biomarker; diagnosis; esophageal squamous cell carcinoma (ESCC); targeted bisulfite sequencing (TGS)
Year: 2018 PMID: 30233644 PMCID: PMC6133993 DOI: 10.3389/fgene.2018.00356
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
The methylation status of the 6 CpG sites in the TCGA dataset and the validation dataset.
| CpGsite | Gene | Position(hg19) | Relation to CpG_Island | McaMa | McoMa | Sensc | Specc | AUCc | ||
|---|---|---|---|---|---|---|---|---|---|---|
| TCGA | cg20295442 | chr8:67344665 | Island | 0.26 | 0.15 | 0.18 | 0.42 | 0.85 | 0.61 | |
| cg20912169 | chr8:67344720 | Island | 0.26 | 0.14 | 0.22 | 0.46 | 0.85 | 0.60 | ||
| cg22383888 | chr3:27764816 | N_shore | 0.53 | 0.22 | 0.77 | 0.92 | 0.87 | |||
| cg04550052 | chr16:51184355 | Island | 0.46 | 0.22 | 0.79 | 0.85 | 0.78 | |||
| cg04698114 | chr16:51184379 | Island | 0.47 | 0.22 | 0.77 | 0.85 | 0.77 | |||
| cg12973591 | chr7:93519473 | Island | 0.33 | 0.15 | 0.06 | 0.63 | 0.88 | 0.65 | ||
| Validation | cg20295442 | chr8:67344665 | Island | 0.18 | 0.09 | 0.28 | 0.95 | 0.63 | ||
| cg20912169 | chr8:67344720 | Island | 0.17 | 0.07 | 0.30 | 0.94 | 0.64 | |||
| cg22383888 | chr3:27764816 | N_shore | 0.31 | 0.11 | 0.55 | 0.94 | 0.77 | |||
| cg04550052 | chr16:51184355 | Island | 0.29 | 0.13 | 0.44 | 0.91 | 0.67 | |||
| cg04698114 | chr16:51184379 | Island | 0.34 | 0.16 | 0.47 | 0.96 | 0.72 | |||
| cg12973591 | chr7:93519473 | Island | 0.25 | 0.08 | 0.49 | 0.89 | 0.69 | |||
The mean methylation status of the 4 genomic regions in the validation datasets.
| Genomic Regiona | No. CpG sitesb | CpGsite Included | Gene | McaMc | McoMc | log10(OR)e | 95% CIe | Sensf | Specf | AUCf | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| chr8:67344610-67344805 | 24 | cg20295442, cg20912169 | 0.24 | 0.15 | 2.20 | 1.00–3.72 | 0.29 | 0.94 | 0.64 | ||
| chr3:27764697-27764940 | 8 | cg22383888 | 0.38 | 0.24 | 3.88 | 2.51–5.51 | 0.69 | 0.77 | 0.78 | ||
| chr16:51184268-51184468 | 18 | cg04550052, cg04698114 | 0.37 | 0.19 | 2.41 | 1.51–3.51 | 0.53 | 0.90 | 0.74 | ||
| chr7:93519367-93519503 | 13 | cg12973591 | 0.28 | 0.13 | 3.82 | 2.26–5.89 | 0.50 | 0.91 | 0.71 | ||
Diagnosis accuracy, sensitivity and specificity of different classification models with fivefold cross-validation.
| Methods | Train | Test | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | |
| Logistic Regression | 0.683 | 0.873 | 0.773 | 0.645 | 0.830 | 0.732 |
| Random Forest | 0.726 | 0.739 | 0.732 | 0.741 | 0.734 | |
| Supporting Vector Machine | 0.635 | 0.907 | 0.764 | 0.599 | 0.881 | 0.731 |
| Naive Bayes | 0.539 | 0.718 | 0.532 | 0.709 | ||
| Neural Network | 0.701 | 0.841 | 0.768 | 0.667 | 0.794 | 0.726 |
| Linear Discriminant Analysis | 0.617 | 0.906 | 0.754 | 0.594 | 0.894 | |
| Mixture Discriminant Analysis | 0.618 | 0.868 | 0.736 | 0.564 | 0.843 | 0.695 |
| Flexible Discriminant Analysis | 0.616 | 0.907 | 0.754 | 0.594 | 0.894 | |
| Gradient Boosting Machine | 0.856 | 0.699 | 0.728 | 0.713 | ||