| Literature DB >> 35227298 |
Luo Qi1, Andrew E Teschendorff2,3.
Abstract
Most studies aiming to identify epigenetic biomarkers do so from complex tissues that are composed of many different cell-types. By definition, these cell-types vary substantially in terms of their epigenetic profiles. This cell-type specific variation among healthy cells is completely independent of the variation associated with disease, yet it dominates the epigenetic variability landscape. While cell-type composition of tissues can change in disease and this may provide accurate and reproducible biomarkers, not adjusting for the underlying cell-type heterogeneity may seriously limit the sensitivity and precision to detect disease-relevant biomarkers or hamper our understanding of such biomarkers. Given that computational and experimental tools for tackling cell-type heterogeneity are available, we here stress that future epigenetic biomarker studies should aim to provide estimates of underlying cell-type fractions for all samples in the study, and to identify biomarkers before and after adjustment for cell-type heterogeneity, in order to obtain a more complete and unbiased picture of the biomarker-landscape. This is critical, not only to improve reproducibility and for the eventual clinical application of such biomarkers, but importantly, to also improve our molecular understanding of disease itself.Entities:
Keywords: Cell-type deconvolution; Cell-type heterogeneity; Classification; DNA methylation; Epigenetic biomarkers
Mesh:
Substances:
Year: 2022 PMID: 35227298 PMCID: PMC8887190 DOI: 10.1186/s13148-022-01253-3
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Fig. 1The need to adjust for CTH in epigenome studies. a A comparison of the relative data variance, expressed as a fraction of the total variance accounted by the top 15 PCs (y-axis, fVAR), explained by each of the top-15 principal components (PCs) (x-axis) for 3 separate epigenome studies, with datapoints annotated to the main factor driving that PC. CTH = cell-type heterogeneity; Ethn = ethnicity; EADC = esophageal adenoma carcinoma; ER = estrogen receptor status. The tissue-type and number of samples in each study are given above plots. These plots derive from Illumina DNA methylation data from the following published works: Blood [51], Saliva [49] and Breast [52]. Briefly, the blood dataset is from healthy individuals, saliva samples are from EADC patients and matched healthy controls, and the breast tissue data is from breast cancers and normal-adjacent tissue. In the case of blood, the top-PC correlates with CTH, PC-2 correlates with ethnicity and PC-3 with age. b Sensitivity, false positive rate (FPR) and precision to detect 1000 simulated DMCs introduced in 139 monocyte samples from BLUEPRINT with an exposure distinguishing 69 cases from 70 controls. In each panel, we display the metrics when inferring DMCs from realistic mixtures of 3 cell-types (neutrophils, CD4+ T cells and monocytes) (Mix, red), when inferring DMCs from these same mixtures whilst adjusting for CTH (Mix CTH, blue) and when inferring DMCs from the purified monocyte samples (Mono, green). c For the same simulated data as in (b), the unsupervised hierarchical clustering obtained when clustering the 139 monocyte samples over the top 2 PCs correlating with the exposure (top panel), when clustering the 139 mixtures over the top 2 PCs correlating with the exposure without any adjustment for CTH (middle panel), and when clustering the 139 mixtures over the top 2 PCs correlating with the exposure after adjustment for CTH (lower panel). Note that in the second case, i.e. when clustering over the top 2 PCs derived from the mixtures without adjustment for CTH, that these PCs only exhibited very marginal associations with the exposure, hence why the samples do not segregate by exposure