| Literature DB >> 22235325 |
Chen Yao1, Hongdong Li, Xiaopei Shen, Zheng He, Lang He, Zheng Guo.
Abstract
BACKGROUND: Hundreds of genes with differential DNA methylation of promoters have been identified for various cancers. However, the reproducibility of differential DNA methylation discoveries for cancer and the relationship between DNA methylation and aberrant gene expression have not been systematically analysed. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2012 PMID: 22235325 PMCID: PMC3250460 DOI: 10.1371/journal.pone.0029686
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The datasets of nine cancer types for analyzing batch effects.
| Cancer type | Abbreviation | Number of batch | Number of Laboratory | Number of Tumour samples | Number of normal samples |
| Ovarian serous cystadenocarcinoma | OV | 13 | 17 | 520 | 35 |
| Colon adenocarcinoma | COAD | 9 | 5 | 168 | 23 |
| Lung adenocarcinoma | LUAD | 4 | 11 | 128 | 27 |
| Lung squamous cell carcinoma | LUSC | 5 | 12 | 115 | 31 |
| Stomach adenocarcinoma | STAD | 3 | 3 | 82 | 61 |
| Kidney renal clear cell carcinoma | KIRC | 6 | 10 | 219 | 205 |
| Glioblastoma multiforme | GBM | 9 | 13 | 264 | 5 |
| Breast invasive carcinoma | BRCA | 3 | 9 | 186 | 2 |
| Rectal adenocarcinoma | READ | 5 | 4 | 70 | 7 |
The Methylation and Expression datasets of five cancer types for concordance analysis.
| Cancer type | Methylation | Database | Expression | Database |
| Colon adenocarcinoma | C22 | TCGA | c23 | GSE4183 |
| C44 | GSE17648 | c64 | GSE8671 | |
| Kidney renal clear cell carcinoma | K78 | TCGA | k20 | GSE6344 |
| K100 | TCGA | k34 | GSE15641 | |
| Stomach adenocarcinoma | S24 | TCGA | NA | |
| S94 | TCGA | |||
| Lung adenocarcinoma | La8 | TCGA | la52 | GSE7670 |
| La14 | TCGA | la107 | GSE10072 | |
| Lung squamous cell carcinoma | Ls24 | TCGA | NA | |
| Ls28 | TCGA | |||
| Platform | Illumina HumanMethylation27 BeadChip | Affymetrix Human Genome U133 (GPL96,GPL570) | ||
Each dataset is denoted by the following nomenclature: initial character of the cancer type followed by the total number of samples of the dataset; NA, not available.
Figure 1Batch effects on tumour samples for nine cancer types.
(a) different batches and different laboratories; (b) the same laboratory but different batches; (c) the same batch but different laboratories; (d) Hierarchical clustering the tumour samples of ovarian serous cystadenocarcinoma in batch 9 and batch 12. For a cancer type denoted in the x-axis in graph a, b or c, a box plot in the y-axis represents the percentage of probes significantly susceptible to different batch conditions. The percentage takes value ranging from 0 (no susceptible probe) to 1 (100% susceptible probes). Each box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and the median is shown as a line across the box.
Figure 2Batch effects on DM genes of six cancer types.
For each cancer type denoted in the x-axis, a box plot in the y-axis represents the consistency score defined as the proportion of DM genes with consistent methylation states among all overlapping DM gene commonly detected in both of the two groups (see ‘Methods’ section). The consistency score takes value ranging from 0 (no consistent states) to 1 (100% consistent states). Each box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and the median is shown as a line across the box.
Consistency of DM genes across different datasets for each cancer.
| Dataset | DM-S | DM-L | Overlap | POG12
| POG21
| Consistency |
| C22–C44 | 2601 | 4001 | 2421 | 93.1% | 60.1% | 99.9% |
| K78–K100 | 3778 | 3966 | 3443 | 91.1% | 86.8% | 100% |
| La8–La14 | 752 | 1698 | 488 | 64.9% | 28.7% | 99.6% |
| S24–S94 | 2274 | 4867 | 2210 | 97.2% | 45.4% | 100% |
| Ls24–Ls28 | 2682 | 2909 | 2152 | 80.2% | 74.0% | 100% |
Each dataset was denoted by the following nomenclature: initial character of the cancer type followed by the total number of samples of the dataset.
*DM-S denotes DM genes from the shorter list;
**DM-L denotes DM genes from the longer list.
POG12 denotes the score from the shorter list to the longer list;
POG21 denotes the score from the longer list to the shorter list.
Consistency denotes the percentage of overlapping genes which showed the same methylation directions across the two datasets.
Consistency of DE genes across different datasets for each cancer.
*DE-S denotes DE genes from the shorter list;
**DE-L denotes DE genes from the longer list.
Concordance between differential methylation and differential expression.
| Cancer types | Hypermethylation | Hypomethylation | ||||
| Gene | Concordance rate |
| Gene | Concordance rate |
| |
| Colon | 107 | 91.6% | 7.7*10−9 | 157 | 50.3% | 0.99 |
| Kidney | 254 | 86.6% | 1.5*10−12 | 302 | 39.4% | 0.397 |
| Lung | 34 | 88.2% | 1.5*10−6 | 88 | 62.5% | 4.2*10−6 |
Gene number denotes the number of hypermethylated (or hypomethylated) genes which were determined to be differentially expressed in the expression data.
Keratin associated protein genes hypomethylated in five cancers.
| GeneID | Gene Name | GeneID | Gene Name |
| 337880 | keratin associated protein 11-1 | 337972 | keratin associated protein 19-5 |
| 140258 | keratin associated protein 13-1 | 337976 | keratin associated protein 20-2 |
| 337960 | keratin associated protein 13-3 | 337977 | keratin associated protein 21-1 |
| 284827 | keratin associated protein 13-4 | 337978 | keratin associated protein 21-2 |
| 254950 | keratin associated protein 15-1 | 337979 | keratin associated protein 22-1 |
| 337882 | keratin associated protein 19-1 | 337879 | keratin associated protein 8-1 |