| Literature DB >> 29270239 |
Weilin Pu1, Chenji Wang2, Sidi Chen1, Dunmei Zhao2, Yinghui Zhou2, Yanyun Ma3, Ying Wang4, Caihua Li4, Zebin Huang4, Li Jin1, Shicheng Guo5, Jiucun Wang1, Minghua Wang2.
Abstract
Background: DNA methylation has been implicated as a promising biomarker for precise cancer diagnosis. However, limited DNA methylation-based biomarkers have been described in esophageal squamous cell carcinoma (ESCC).Entities:
Keywords: Biomarker; DNA methylation; Diagnosis; Esophageal squamous cell carcinoma; Targeted bisulfite sequencing
Mesh:
Substances:
Year: 2017 PMID: 29270239 PMCID: PMC5732523 DOI: 10.1186/s13148-017-0430-7
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Fig. 1Flow chart of the study design. Candidate biomarkers were selected from the high-throughput DNA methylation microarrays from the TCGA project and further validated with the ESCC methylation data from the GEO dataset, as well as PBMC and PBL from healthy controls. In addition, the PBL and PBMC methylation datasets from healthy samples were also utilized for biomarker filtering. Based on our preliminary screening, the candidate methylation biomarkers for ESCC were then further validated with targeted bisulfite sequencing in independent Chinese Han ESCC patients
Characteristics of the ESCC patients included in this study
| Characteristics | Patient distribution |
|---|---|
| Age | 64 (IQR = 57 to 70) |
| Sex | |
| Male | 69 |
| Female | 25 |
| Cigarette usea | |
| Yes | 58 |
| No | 36 |
| Alcohol useb | |
| Yes | 34 |
| No | 58 |
| T stagec | |
| T2 | 14 |
| T3 | 72 |
| T4 | 5 |
| N stagec | |
| N0 | 44 |
| N1 | 38 |
| N2 | 7 |
| N3 | 3 |
| M stagec | |
| M0 | 90 |
| M1 | 1 |
ESCC esophageal squamous cell carcinoma
aYes represents the former and current smokers
bYes represents individuals who presently consume or formerly consumed alcoholic beverages
cTNM stages were assessed by the seventh edition of the TNM classification criteria
Fig. 3The mean methylation status of each genomic region and bisulfite conversion efficiency between ESCC tumors and normal tissues as well as the overall ROC (Receiver Operating characteristics) curve. a represents the bisulfite conversion efficiency between ESCC and adjacent normal tissues. Bisulfite conversion efficiency was calculated by using the number of transformed C to T divided by the number of C in each sample. b–f represent the mean methylation status of the genomic regions covering STK3, cg19396867, cg20655070, ZNF418, and ZNF542, respectively. Each point represents mean methylation percentage in a genomic region of a sample. The boxplot showed overall methylation percentage of different groups in each genomic region. g represents the overall ROC curve, which was calculated through a logistic regression model, incorporating the mean methylation percentage of the five genomic regions as the variables and without the adjustment for gender, age, and smoking status and alcohol status
Fig. 2The methylation status of the CpG sites in the five genomic regions. a–e represent the methylation status of the CpG sites in regions covering STK3, cg19396867, cg20655070, ZNF418, and ZNF542, respectively. The x-axis represents the genomic positions of the CpG sites in the targeted regions. The y-axis represents the mean methylation percentage in the ESCC tumor tissues as well as the normal tissues for each of the CpG sites. The error bar represents the confidence interval of the methylation percentage in the ESCC tumor tissues as well as the normal tissues for each of the CpG sites
The methylation status of the five CpG sites in the TCGA dataset and the validation dataset
| CpG site | Gene | Position | Relation to CpG_Island | McaMb | McoMb |
| log10(OR)d | 95% CId | Sens | Spec | AUC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TCGA | cg15830431 | STK3 | chr8:99952591 | Island | 0.28 | 0.09 | 2.20E−04 | 4.11 | 1.91–7.43 | 0.65 | 0.94 | 0.82 |
| cg19396867 | NAa | chr19:40314862 | N_Shore | 0.45 | 0.20 | 3.60E−04 | 1.85 | 0.78–3.21 | 0.85 | 0.75 | 0.79 | |
| cg20655070 | NAa | chr19:40315011 | Island | 0.44 | 0.19 | 1.71E−03 | 1.61 | 0.67–2.72 | 0.64 | 0.88 | 0.75 | |
| cg26671652 | ZNF418 | chr19:58446312 | N_Shore | 0.35 | 0.16 | 5.77E−04 | 1.95 | 0.67–3.61 | 0.86 | 0.75 | 0.78 | |
| cg27062795 | ZNF542 | chr19:56879613 | Island | 0.43 | 0.17 | 3.60E−04 | 2.93 | 1.65–4.44 | 0.86 | 0.81 | 0.80 | |
| Validation | cg15830431 | STK3 | chr8:99952591 | Island | 0.20 | 0.07 | 1.25E−06 | 3.04 | 1.82–4.53 | 0.66 | 0.77 | 0.71 |
| cg19396867 | NA | chr19:40314862 | N_Shore | 0.37 | 0.12 | 2.71E−11 | 2.83 | 1.93–3.91 | 0.65 | 0.88 | 0.80 | |
| cg20655070 | NA | chr19:40315011 | Island | 0.31 | 0.09 | 8.04E−10 | 3.01 | 2.02–4.22 | 0.62 | 0.89 | 0.77 | |
| cg26671652 | ZNF418 | chr19:58446312 | N_Shore | 0.32 | 0.11 | 4.82E−11 | 3.20 | 2.18–4.39 | 0.58 | 0.93 | 0.79 | |
| cg27062795 | ZNF542 | chr19:56879613 | Island | 0.43 | 0.12 | 1.23E−12 | 2.55 | 1.77–3.50 | 0.72 | 0.82 | 0.83 |
The sensitivity and specificity, as well as AUC, were both with a logistic regression prediction model without adjustment for gender, age, and smoking status and alcohol status
Sens sensitivity, Spec specificity, AUC area under the curve
aNA indicated that the CpG site is located outside of the coding region of the gene
bMcaM represents the mean methylation percentage of the cases, and the McoM represents the mean methylation percentage of the controls
c P value is calculated through the Wilcoxon rank-sum test followed by FDR (false discovery rate) adjustment for multiple correction
dOR and 95% CI were determined by logistic regression
The mean methylation status of the five genomic regions in the validation datasets
| Genomic regiona | No. CpG sitesb | CpG site included | Gene | McaMc | McoMc |
| log10(OR)e | 95% CIe | Sens | Spec | AUC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| chr8:99952469-99952722 | 19 | cg15830431 | STK3 | 0.35 | 0.16 | 4.20E−09 | 2.82 | 1.83–4.03 | 0.64 | 0.82 | 0.76 |
| chr19:40314817-40314928 | 6 | cg19396867 | NA | 0.36 | 0.12 | 9.60E−11 | 2.90 | 1.97–4.03 | 0.61 | 0.90 | 0.79 |
| chr19:40314939-40315133 | 17 | cg20655070 | NA | 0.31 | 0.12 | 1.80E−09 | 3.61 | 2.42–5.06 | 0.60 | 0.90 | 0.77 |
| chr19:58446187-58446437 | 19 | cg26671652 | ZNF418 | 0.50 | 0.26 | 1.10E−13 | 3.46 | 2.52–4.54 | 0.74 | 0.86 | 0.84 |
| chr19:56879517-56879735 | 25 | cg27062795 | ZNF542 | 0.41 | 0.14 | 5.20E−13 | 2.81 | 1.94–3.86 | 0.71 | 0.84 | 0.83 |
The sensitivity, specificity as well as the AUC were both with a logistic regression prediction model without adjustment for gender, age and smoking status and alcohol status
Sens sensitivity, Spec specificity, AUC area under the curve
aGenomic region represents the genomic coverage of the reads with targeted bisulfite sequencing, and the genomic coordinates shown here is based on the hg19 version of the genome
bNo. CpG sites represents the number of the CpG sites in each region
cMcaM represents the mean methylation percentage of the cases in each region, which consists of several CpG sites, while the McoM represents the mean methylation percentage of the controls in each region
d P value is calculated through the Wilcoxon rank-sum test following with FDR (false discovery rate) adjustment for multiple correction
eOR and 95% CI were conducted through logistic regression
Diagnosis accuracy, sensitivity, and specificity of different classification models with five-fold cross-validation
| Methods | Train | Test | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | |
| Logistic regression | 0.75 | 0.89 | 0.82 | 0.73 | 0.86 | 0.79 |
| Random forest | 0.73 | 0.77 | 0.75 | 0.73 | 0.78 | 0.75 |
| Supporting vector machine | 0.74 | 0.89 | 0.82 | 0.73 | 0.87 | 0.80 |
| Naïve Bayes | 0.63 | 0.89 | 0.76 | 0.63 | 0.88 | 0.75 |
| Neural network | 0.76 | 0.87 | 0.81 | 0.72 | 0.81 | 0.76 |
| Linear discriminant analysis | 0.73 | 0.88 | 0.80 | 0.71 | 0.87 | 0.79 |
| Mixture discriminant analysis | 0.74 | 0.89 | 0.81 | 0.71 | 0.84 | 0.77 |
| Flexible discriminant analysis | 0.73 | 0.88 | 0.80 | 0.71 | 0.87 | 0.79 |
The mean methylation percentage of each genomic region was considered as the independent variable for constructing the models, which means that all of the models were based on these five independent variables without adjustment for gender, age, smoking status, and alcohol status. Sensitivity, specificity, and classification accuracy were the mean value in five-fold cross-validations with 1000 replications