| Literature DB >> 30571772 |
Daniel W Kennedy1,2, Nicole M White1, Miles C Benton3, Andrew Fox4, Rodney J Scott5, Lyn R Griffiths3, Kerrie Mengersen1,2, Rodney A Lea3,4.
Abstract
Epigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been obtained from mixed-cell DNA data using linear regression models, but the accuracy of such estimates has not been critically assessed. We evaluated linear regression performance for cell-subtype specific methylation estimation using a 450K methylation array dataset of both mixed-cell and cell-subtype sorted samples from six healthy males. CpGs associated with each cell-subtype were first identified using t-tests between groups of cell-subtype sorted samples. Subsequent reduced panels of reliably accurate CpGs were identified from mixed-cell samples using an accuracy heuristic (D). Performance was assessed by comparing cell-subtype specific estimates from mixed-cells with corresponding cell-sorted mean using the mean absolute error (MAE) and the Coefficient of Determination (R2). At the cell-subtype level, methylation levels at 3272 CpGs could be estimated to within a MAE of 5% of the expected value. The cell-subtypes with the highest accuracy were CD56+ NK (R2 = 0.56) and CD8+T (R2 = 0.48), where 23% of sites were accurately estimated. Hierarchical clustering and pathways enrichment analysis confirmed the biological relevance of the panels. Our results suggest that linear regression for cell-subtype specific methylation estimation is accurate only for some cell-subtypes at a small fraction of cell-associated sites but may be applicable to EWASs of disease traits with a blood-based pathology. Although sample size was a limitation in this study, we suggest that alternative statistical methods will provide the greatest performance improvements.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30571772 PMCID: PMC6301777 DOI: 10.1371/journal.pone.0208915
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Comparison of LR estimate versus cell-sorted estimates.
Hexagon plots of LR estimate of cell-type methylation difference on the vertical axis obtained from mixed-cell data is compared with the cell-sorted estimate of cell-type methylation difference on the horizontal axis obtained independently from cell-sorted data for all measured CpGs. The shading of the hexagonal bin indicates the density of CpGs in that bin. This comparison is made for each cell-subtype and lineage grouping.
Estimation performance over the robust panels for cell-subtype and lineage groupings.
| Panel | Panel Size | MAE | MMCE | Observed versus Expected ( | Number (%) with AE<0.05 | Mean Cell-subtype Prop. (%) |
|---|---|---|---|---|---|---|
| Neutrophil | 2151 | 0.13 | 0.23 | 0.41 | 620 (28.8) | 65.0 |
| CD4+T | 2902 | 0.62 | 0.21 | 0.09 | 158 (5.4) | 13.4 |
| CD8+T | 1871 | 0.14 | 0.14 | 0.48 | 479 (25.6) | 6.1 |
| Nat. Killer | 3301 | 0.13 | 0.21 | 0.56 | 882 (26.7) | 2.4 |
| CD19+B | 12973 | 0.78 | 0.2 | 0.03 | 635 (4.9) | 3.0 |
| Monocyte | 2772 | 0.91 | 0.29 | 0.02 | 117 (4.2) | 5.4 |
| Eosinophil | 7968 | 0.75 | 0.24 | 0.03 | 381 (4.8) | 3.8 |
| Lymphocyte-I | 103035 | 0.15 | 0.15 | 0.46 | 25920 (25.2) | 25.0 |
| Myeloid-I | 102934 | 0.12 | 0.15 | 0.56 | 28298 (27.5) | 74.2 |
| Lymphocyte-II | 69455 | 0.14 | 0.16 | 0.52 | 17790 (25.6) | 22 |
| Myeloid-II | 28891 | 0.18 | 0.2 | 0.44 | 6054 (21) | 68.8 |
| Pan-T | 2911 | 0.31 | 0.17 | 0.21 | 353 (12.1) | 19.6 |
Summary statistics are compiled here for each robust estimation panel. Observed is the LR estimate, expected value is the cell-sorted estimate. Mean cell-subtype proportions are calculated from FACS estimates. AE: Absolute error (CpG-specific), MAE: Mean Absolute Error (averaged over panel), MMCE: Mean Mixed-cell Error. R: Coefficient of determination.
Fig 2Hierarchical clustering of cell-sorted data from robust panels.
Tree represents an unsupervised clustering of cell-sorted sample data from only the CpGs found in the robust panels for the base cell-subtypes. Terminal nodes correspond to single samples. Each sample is labelled by the type of cell-subtype to which it corresponds.