| Literature DB >> 27358049 |
E Andres Houseman1, Molly L Kile2, David C Christiani3, Tan A Ince4, Karl T Kelsey5, Carmen J Marsit6.
Abstract
BACKGROUND: Recent interest in reference-free deconvolution of DNA methylation data has led to several supervised methods, but these methods do not easily permit the interpretation of underlying cell types.Entities:
Keywords: DNA methylation; Deconvolution; Epigenetics; Non-negative matrix factorization
Mesh:
Year: 2016 PMID: 27358049 PMCID: PMC4928286 DOI: 10.1186/s12859-016-1140-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of proposed Methods. If associations between DNA methylation data Y and phenotypic metadata X factor through the decomposition Y = MΩ , and the data in M serve to distinguish cell types by their associations with relevant annotation data, then associations between X and Y are explained in whole or in part by differences in the distribution of constituent cell types. Numbers indicate steps in analysis: (1) deconvolution; (2) determining discriminating loci; (3) gene-set analysis; (4) analysis of associations with phenotype
Summary of Datasets
| Code | Tissue | Source | Ref | Platform | Source description | Number | Covariate model |
|---|---|---|---|---|---|---|---|
| g[nt] | gastric tissue: tumor + normal | GEO: | [ | 27K | 203 gastric tumors and 94 matched gastric non-malignant samples. | 297 | Tumor[normal|tumor] |
| g[n] | gastric tissue: normal | 94 | – | ||||
| g[t] | gastric tissue: tumor | 203 | – | ||||
| br-1[t] | breast: tumor | GEO: | [ | 27K | 119 breast tumor samples with histological information. Removed 29 samples with ambiguous or missing histology. | 119 | Histology[basal|HER2|LumA|LumB] + Age[young|old] + Size[small|large] |
| br-2[t] | breast: tumor | GEO: | [ | 27K | 103 primary invasive breast tumors. | 90 | Histology[basal|ER-|ER+|HER2|LumA|LumB] + Age |
| br-3[t] | breast: tumor | GEO: | [ | 27K | Breast tumor samples: 91 invasive ductal, 13 invasive lobular, 10 mucinous or medullary; 76 were ER+. | 114 | ER[ER-|ER+] + Histology[duct|lob|muc or med] + Age |
| bl-ov | peripheral blood | GEO: | [ | 27K | Whole blood from 131 ovarian cancer cases (drawn pre-treatment) and 274 controls. | 402 | Case[control|ovarian cancer case] + Age |
| bl-hn | peripheral blood | GEO: | [ | 27K | Peripheral blood from 92 head and neck squamous cell carcinoma (HNSCC) patients and 92 controls. Removed 2 outlier cases. | 182 | Case[control|HNSCC case] + Age |
| BL-ra | peripheral blood | GEO: | [ | 450K | Peripheral blood from 354 rheumatoid arthritis patients and 335 controls. | 689 | Case[control|arthritis case] |
| BL-as | cord blood | (not public) | [ | 450K* | Cord blood from 45 Bangladeshi neonates, with corresponding drinking water arsenic concentrations. | 45 | Log-arsenic + Sex[female|male] |
| SP | sperm | GEO: | [ | 450K | 26 normal sperm samples. | 26 | Fraction[swim down|swim up|whole 1h|whole 2h] |
| BV+LV | endothelial tissue | 16 | Source[BV|LV] | ||||
| BV | endothelial tissue: blood vessel | GEO: | [ | 450K | 16 vascular samples: 6 primary blood vessel endothelial cell samples and 10 primary lymphatic endothelial cell samples. | 6 | – |
| LV | endothelial tissue: lymphatic vessel | 10 | – | ||||
| UV-as | umbilical vein endothelial tissue | (not public) | [ | 450K* | Umbilical vein endothelial tissues from 51 Bangladeshi neonates, with corresponding drinking water arsenic concentrations. | 51 | Log-arsenic + Sex[female|male] |
| AR-as | placental artery | (not public) | [ | 450K* | Placental arteries from 46 Bangladeshi neonates, with corresponding drinking water arsenic concentrations. | 46 | Log-arsenic + Sex[female|male] |
| AR[np] | arterial tissue: atherosclerotic + normal | GEO: | [ | 450K | 15 normal aortic tissues, 15 atherosclerotic aortic lesions, 19 carotid atherosclerotic samples. | 49 | Source[normal|ath|carotid ath] + Sex[female|male] + Age |
| AR[n] | arterial tissue: normal aorta | 15 | − | ||||
| PL-as | placenta | (not public) | [ | 450K* | Placentas from 45 Bangladeshi neonates, with corresponding drinking water arsenic concentrations. | 45 | Log-arsenic + Sex[female|male] |
| L[np] | liver tissue: cirrhotic + normal | GEO: | [ | 450K | 34 normal liver tissues, 21 cirrhotic tissues (due to alcoholism), 45 cirrhotic tissues [due to chronic hepatitis B (HBV) or C (HCV) viral | 100 | Source[normal|CirrEtOH|CirrV] |
| L[n] | liver tissue: normal | 34 | – | ||||
| BR-tcga[n] | breast: normal | TCGA | [ | 450K* | 96 normal breast tissues (matched to tumor) from The Cancer Genome Atlas, downloaded Nov. 2014 | 96 | Age + Race[white|other] |
| BR-tcga[t] | breast: tumor | 450K* | 725 breast tumors from The Cancer Genome Atlas, downloaded Nov. 2014 | 725 | Age + Race[white|other] + Staging[II+|III+|IV/X|?] + ER[ER+|ER-] + HER2[HER2+|HER2-|HER2?] |
*Processed from idat files using FunNorm algorithm (Bioconductor library minfi). See Methods for details
Fig. 2Selection of Number of Classes K. a Estimated number of classes for each data set. b Bootstrapped deviance profiles for four selected data sets, along with mean deviance, median deviance, and quartiles for each value of K
Inference with Phenotypic Metadata
| Data set | Permutaton |
|---|---|
| g[nt] | Tumor < 0.001 |
| br-1[t] | Histology < 0.001; Age = 0.059; Size = 0.016 |
| br-2[t] | Histology < 0.001; Age = 0.06 |
| br-3[t] | ER < 0.001; Histology = 0.295; Age = 0.008; BSC = 0.297 |
| bl-ov | Case < 0.001; Age = 0.999 |
| bl-hn | Case < 0.001; Age < 0.001 |
| BL-ra | Case < 0.001 |
| BL-as | Log-arsenic < 0.001; Sex = 0.263 |
| SP | Fraction = 0.994 |
| BV+LV | Source = 0.013 |
| UV-as | Log-arsenic = 0.515; Sex = 0.962 |
| AR-as | Log-arsenic = 0.285; Sex = 0.505 |
| AR[np] | Source < 0.001; Sex = 0.043; Age = 0.377 |
| PL-as | Log-arsenic = 0.006; Sex = 0.451 |
| L[np] | Source < 0.001 |
| BR-tcga[n] | Age = 0.089; Race = 0.153 |
| BR-tcga[t] | Age < 0.001; Race < 0.001; Staging = 0.013; ER < 0.001; HER2 < 0.001 |
Fig. 3Cell Proportion Matrices. Clustering heatmaps of cell proportion matrix Ω for two data sets; purple intensity indicates cell proportion. a Blood from rheumatoid arthritis cases and controls (BL-ra, ); clustering heatmap obtained from untransformed coefficients and using Ward’s method of clustering (“ward.D” in R hclust function). b Sperm (SP, )
Fig. 4Comparison of Null Associations. Comparison of π 0 (proportion of null association CpGs) from the K = 1 model with π 0 from the K = K* model; only non-demographic variables are shown
Fig. 5Gene-Set Analysis (DMPs and PcG Targets). Gene-set odds ratios, showing the association of gene set membership with the set of CpGs whose values are highly variable across fitted methylomes (s 2 > q 0.75(s 2)). a Blood DMRs. b CpGs mapped to polycomb group protein genes
Fig. 6Gene-Set Analysis (Roadmap Epigenomics WGBS). Gene-set odds ratios for 450K data sets, showing association of sets of DMPs distinguishing various Roadmap Epigenomics WGBS specimens with the set of CpGs whose values are highly variable across fitted methylomes (s 2 > q 0.75(s 2)). Clustering heatmap obtained from log-odds-ratios and using Ward’s method of clustering (“ward.D” in R hclust function)