| Literature DB >> 25657825 |
Shicheng Guo1, Fengyang Yan2, Jibin Xu3, Yang Bao4, Ji Zhu5, Xiaotian Wang2, Junjie Wu6, Yi Li2, Weilin Pu2, Yan Liu7, Zhengwen Jiang7, Yanyun Ma2, Xiaofeng Chen8, Momiao Xiong9, Li Jin1, Jiucun Wang1.
Abstract
BACKGROUND: DNA methylation was suggested as the promising biomarker for lung cancer diagnosis. However, it is a great challenge to search for the optimal combination of methylation biomarkers to obtain maximum diagnostic performance.Entities:
Keywords: Batch effect elimination; Biomarker; DNA methylation; Diagnosis; Non-small cell lung cancer
Year: 2015 PMID: 25657825 PMCID: PMC4318209 DOI: 10.1186/s13148-014-0035-3
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Figure 1Sketch of the study design and pipeline. Candidate biomarkers were selected from meta-analysis to multiple high-throughput DNA methylation microarrays. The significant or best feature combination was screened in an independent validation study of non-small cell lung cancer (NSCLC) with the methylation status determined single nucleotide primer extension technique (MSD-SNuPET) technique.
Figure 2treatment and methylation status determined single nucleotide primer extension technique (MSD-SNuPET). Principal component analysis was applied to show the efficiency of the elimination of ComBat. A, B, A total of 120 probe sets with DNA methylation values after background and quantile normalization in a set of 352 non-small cell lung cancer (NSCLC) and 106 normal samples. X and Y axes represent the first and second principal components (PC1 and PC2), respectively. C-I were validation of the methylation status of the five candidate markers in an independent samples. Y-axis represents absolute DNA methylation percentage from MSD- SNuPET. LINE-1 and Reference were taken as the positive and negative control for MSD- SNuPET.
Characteristics of patients
|
| |
|---|---|
| Age | 40 (IQR = 15 to 65) |
| Sex | |
| Male | 120 |
| Female | 30 |
| Smoke Statusa | |
| Non-smokers (never) | 41 |
| Smokers (ever) | 96 |
| Histology | |
| Adenocarcinoma | 53 |
| Squamous cell carcinoma | 63 |
| Othersb | 34 |
| Stagec | |
| I (IA,IB) | 42 (10,32) |
| II (IIA,IIB) | 48 (16,32) |
| III (IIIA,IIIB) | 46 (41,5) |
| IV | 2 |
| Differentiationd | |
| Well/Moderate | 74 |
| Poor | 30 |
NSCLC, non-small cell lung cancer; aSmokers include former and current smoker individuals. bOthers include adenosquamous carcinoma (ADSQ), bronchioloalveolar carcinoma, mucoepidermoid lung tumor, Sarcomatoid carcinoma. cTNM Stages were assessed by the seventh edition of TNM classification criteria. dQualitative assessment of tumor differentiation was based on sum of the architecture score and cytologic atypia score (2 = well differentiated, 3 = moderately differentiated, 4 = poorly differentiated).
Differential methylation in non-small cell lung cancers (NSCLCs)
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|
|
| 12.88% | 4.48% | 1.06 × 10-7 | 3.49 (2.08, 4.91) | 1.30 × 10-6 | 59.73% | 79.59% | 0.71 |
|
| 18.31% | 2.91% | 6.58 × 10-9 | 2.56 (1.5, 3.63) | 2.30 × 10-6 | 46.98% | 85.03% | 0.67 |
|
| 9.37% | 0.56% | 1.09 × 10-9 | 9.02 (5.48, 12.55) | 5.90 × 10-7 | 44.30% | 94.56% | 0.70 |
|
| 25.59% | 11.66% | 4.77 × 10-12 | 3.80 (2.51, 5.09) | 7.80 × 10-9 | 52.35% | 88.44% | 0.67 |
|
| 6.95% | 12.82% | 1.08 × 10-7 | -4.61 (-6.27, -2.95) | 5.20 × 10-8 | 73.15% | 92.52% | 0.80 |
|
| 72.10% | 76.76% | 2.39 × 10-12 | -10.3 (-13.5, -7.2) | 1.80 × 10-10 | - | - | - |
| Referencee | 1.78% | 1.83% | 2.85 × 10-1 | -19.37 (-45.35, 6.62) | 0.14 | - | - | - |
aDifferential methylation analysis was conducted between 150 NSCLC and adjacent normal tissues. AMP represents average methylation percentage. b P valueb is the Bonferroni adjusted P value which is based on paired t-test comparing the intensity of the methylation signals between case and control. cThe log10(OR) and P valuec represent log-transformed odds ratio and P value based on logistic regression adjusted by sex, age and smoking status. dSensitivity, specificity and area under the curve (AUC) were calculated with a logistic regression prediction model without adjustment for sex, age and smoking status. eReference site was a C site that was not in the CpG site; therefore, no or a low-methylated signal would be detected and a nonsignificant association should be detected between cancer and normal tissues.
Diagnosis accuracy, sensitivity and specificity based on several classification methods with fivefold cross-validation
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| Logistic regression | 0.791 | 0.993 | 0.891 | 0.775 | 0.969 | 0.871 |
| SVMa | 0.897 | 0.977 | 0.937 | 0.855 | 0.941 | 0.897 |
| Random forest | 0.934 | 0.928 | 0.931 | 0.890 | 0.886 | 0.886 |
| Bayes tree | 0.911 | 0.976 | 0.944 | 0.863 | 0.957 | 0.909 |
aSVM represents support vector machines and Kernel Methods. Sensitivity, specificity and classification accuracy were its mean value in fivefold validations with 1,000 replications. In the main body of the manuscript, sensitivity, specificity and accuracy were derived from training result of the classification.