| Literature DB >> 31455416 |
Kristina Gervin1,2, Lucas A Salas3, Kelly M Bakulski4, Menno C van Zelm5,6, Devin C Koestler7, John K Wiencke8, Liesbeth Duijts9,10,11, Henriëtte A Moll12, Karl T Kelsey13,14, Michael S Kobor15, Robert Lyle2,16, Brock C Christensen3,17,18, Janine F Felix9,12,19, Meaghan J Jones20.
Abstract
BACKGROUND: Umbilical cord blood (UCB) is commonly used in epigenome-wide association studies of prenatal exposures. Accounting for cell type composition is critical in such studies as it reduces confounding due to the cell specificity of DNA methylation (DNAm). In the absence of cell sorting information, statistical methods can be applied to deconvolve heterogeneous cell mixtures. Among these methods, reference-based approaches leverage age-appropriate cell-specific DNAm profiles to estimate cellular composition. In UCB, four reference datasets comprising DNAm signatures profiled in purified cell populations have been published using the Illumina 450 K and EPIC arrays. These datasets are biologically and technically different, and currently, there is no consensus on how to best apply them. Here, we systematically evaluate and compare these datasets and provide recommendations for reference-based UCB deconvolution.Entities:
Keywords: Cell type heterogeneity; DNAm; Deconvolution; IDOL; Reference dataset; Umbilical cord blood; minfi; pickCompProbes
Mesh:
Year: 2019 PMID: 31455416 PMCID: PMC6712867 DOI: 10.1186/s13148-019-0717-y
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Descriptive overview of the UCB reference datasets
| Bakulski | de Goede | Gervin | Lin | |
|---|---|---|---|---|
| No. cell fractions | ||||
| Bcell | 15 | 7 | 11 | 13 |
| CD4T | 15 | 7 | 11 | 14 |
| CD8T | 14 | 6 | 11 | 14 |
| Gran | 12 | 7 | 11 | 14 |
| Mono | 15 | 12 | 11 | 14 |
| NK | 14 | 6 | 11 | 14 |
| nRBC | 4 | 7 | NA | NA |
| Sex (M/F) | 8/7 | 5/2 | 6/5 | 10/4 |
| Array technology | 450 K | 450 K | 450 K | EPIC |
| Isolation method | MACS | FACS | FACS | MACS |
| Gestational age (range) | NA | NA | 38.4–40.6 | NA |
| Nationality | Americans | Canadians | Norwegians | Mixed |
| Purity estimates | NA | NA | 97.1–98.8 | NA |
NA not applicable
Fig. 1PCA scatterplot of cell type-specific DNAm in four UCB references as published (raw). The two first principal components are plotted with the proportion of variance explained by each component indicated next to the axis labels. The plot clearly shows distinct clustering of the different cell types and most of the variance in DNAm can be attributed to the different cell types. Of note, nRBCs are not included in the Gervin and Lin references
Fig. 2Data filtering using a projection of adult cell types. Samples in the four UCB references showing < 70% of the adult cell type were removed, whereas samples with > 70% of a different cell type were reclassified to the corresponding cell type. Using a 70% cut-off resulted in removal of 24 samples (26.9%, indicated by red asterisk) and reclassification of three samples (indicated by green asterisk). Of note, the majority of the CD8T cell fractions in the Bakulski reference showed a large proportion of NK cells
Fig. 3Evaluation of libraries. The selected libraries from pickCompProbes and IDOL were evaluated by calculating the R2 and RMSE comparing estimates and FACS counts from each cell type in the test dataset (n = 22) using individual and combined UCB references. Mean R2 and RMSE are plotted on the y- and x-axes, respectively
Fig. 4Comparison of L-DMR libraries selected using automatic selection in pickCompProbes and the IDOL algorithm for optimization. a L-DMR libraries selected from combined UCB reference (raw n = 666 and filtered n = 662) using automatic selection in pickCompProbes and IDOL (n = 517). b Overlapping of probes from the three methods
Genomic and functional context of IDOL and pickCompProbes (raw and filtered) libraries
| pickCompProbes automatic selection: raw | pickCompProbes automatic selection: filtered | IDOL | |
|---|---|---|---|
| Genomic context | |||
| CpG island | 18 (2.7) | 17 (2.6) | 39 (7.5) |
| Shelves | 95 (14.2) | 101 (15.3) | 76 (14.7) |
| Shores | 100 (15) | 107 (16.2) | 119 (23) |
| Open sea | 453 (68.1) | 437 (66.1) | 282 (54.5) |
| Functional context | |||
| TSS1500 | 72 (10.8) | 74 (11.2) | 71 (13.7) |
| TSS200 | 36 (5.4) | 40 (6) | 37 (7.2) |
| 5'UTR | 81 (12.2) | 80 (12.1) | 75 (14.5) |
| Exon1 | 18 (2.7) | 18 (2.7) | 19 (3.7) |
| Body | 335 (50.3) | 320 (48.3) | 246 (47.6) |
| 3′ UTR | 46 (6.9) | 42 (6.3) | 29 (5.6) |
| Intergenic | 146 (21.9) | 153 (23.1) | 107 (20.1) |
| Enhancers (Phantom 5) | 38 (5.7) | 33 (5) | 19 (3.7) |
| DNase hypersensitive sites | 436 (65.5) | 436 (65.9) | 350 (67.7) |
Fig. 5Comparison of estimated cell types and matched FACS cell counts. Scatter plots of deconvolution estimates using CP/QP programming and matched FACS cell counts in an individual birth cohort (Generation R, n = 191) using cleaned IDOL and pickCompProbes libraries and the combined UCB reference. Smoothing lines represent the linear model. R2 and RMSE using the two methods are indicated for each cell type
Fig. 6Measurements of accuracy and agreement between methods. a Box plots of FACS cell counts (red) and estimates generated using IDOL (blue) and pickCompProbes (green) and a combined UCB reference (raw and filtered). b Absolute errors (estimates minus FACS counts) by deconvolution method and the combined UCB reference (filtered and raw). c Bland-Altman plots (differences versus means) showing the agreement between IDOL and pickCompProbes using a filtered combined UCB reference. The mean difference per method (blue and green) and zero difference (red) are indicated by horizontal lines