| Literature DB >> 32299339 |
Abstract
BACKGROUND: DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface.Entities:
Keywords: Conditional Expectation Maximization; Epigenetics; Matrix Multiplication; Symbolic Algebra; Universal PaceMaker
Mesh:
Year: 2020 PMID: 32299339 PMCID: PMC7161103 DOI: 10.1186/s12864-020-6606-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Molecular Clock vs Universal PaceMaker: Solid lines (colors) represent different methylation sites. Vertical (dashed) lines represent time points. Hence dots along dashed lines correspond to (log) methylation rates at that very time point of each methylation site. Under the Molecular Clock (MC) model (left), methylation rates of sites differ among each other but are constant in time. By contrast, under the Universal PaceMaker (UPM) model (right), rates may vary during with time but the pairwise ratio between sites rates remains constant (diference between log rates is constant)
Fig. 2The mn×2n design matrix X that is used in our closed form solution to the MC case. Every row corresponds to a component in the RSS polynomial and the corresponding entries (ith and i+nth) in that row are set to t and 1 respectively
Detailed experimental results. The columns, from left to right are: data set id, description (tissue, ages), # individuals, running time (minutes) under the closed form - T(CF), running time (minutes) under the linear algebra operations - T(LA), residuals sum of square (RSS) under MC, RSS under EPM, χ2, Degree of freedom for χ2. All p-values of χ2 are below 10−6
| Data Set | Description | n | T(CF) | T(LA) | RSSMC | RSSEPM | DF | |
|---|---|---|---|---|---|---|---|---|
| GSE87571 | Adults, Blood | 366 | 2.4 | 745 | 29.9145 | 25.6283 | 2716.8 | 366 |
| GSE40279 | Adults, Blood | 656 | 5.63 | NA | 142.265 | 115.3552 | 13479.5 | 656 |
| GSE64495 | Human, All Ages, Blood | 113 | 0.7 | 254 | 258.333 | 203.851 | 26765.0 | 113 |
| GSE60132 | Human, All Ages, Blood | 192 | 1.8 | 346 | 19498.871 | 17122.519 | 24952.7 | 192 |
| GSE74193 | Human, All Ages, Brain development | 675 | 7.16 | NA | 1614.285 | 712.837 | 551740.8 | 675 |
| GSE36064 | Children Blood | 78 | 0.52 | 193 | 166.774 | 148.41 | 9099.3 | 78 |