| Literature DB >> 28254385 |
Athina Vidaki1, David Ballard2, Anastasia Aliferi3, Thomas H Miller3, Leon P Barron3, Denise Syndercombe Court3.
Abstract
The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person's lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2-90 years) analysed with Illumina's genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R2=0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R2=0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers' potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R2=0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq® platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility.Entities:
Keywords: Artificial neural networks; Chronological age prediction; DNA methylation; Forensic epigenetics; Next generation sequencing
Mesh:
Substances:
Year: 2017 PMID: 28254385 PMCID: PMC5392537 DOI: 10.1016/j.fsigen.2017.02.009
Source DB: PubMed Journal: Forensic Sci Int Genet ISSN: 1872-4973 Impact factor: 4.882
Designed bisulfite PCR assays.
| CpG site | Gene | Primer Sequence (5′-3′) | Amplicon Length (bp) | |
|---|---|---|---|---|
| cg19761273 | CSNK1D | F | TGTTTAGTTTGAAGATTGAG | 150 |
| R | CCTTATTTCCTTTACAAAAA | |||
| cg27544190 | C21orf63 | F | GGGTAGGATTAAAGTTGA | 106 |
| R | CTTAAAAATAACAATCCCC | |||
| cg03286783 | CASC4 | F | GTTTTAGTTAGTGGGTG | 181 |
| R | CCCCTCCTCAAATCAAA | |||
| cg01511567 | SSRP1 | F | TATTAGATTTAGTATAGGGG | 132 |
| R | CCCACAACTATTCAAATA | |||
| cg07158339 | FXN | F | GGAATATGTTTTGTTTAAAA | 122 |
| R | TAATTAACCTCTCTATACCT | |||
| cg05442902 | P2RXL1 | F | GTATGTTTTGGTTTTTGT | 109 |
| R | AATAACCTCTAAACTAACC | |||
| cg24450312 | RASSF5 | F | GTTATTTATAGAGTTTGAG | 201 |
| R | TCTACTACAAACCAAA | |||
| cg17274064 | ERG | F | AGGGAATAAGTATTTTTT | 139 |
| R | CTCACAATCAAACTTCTATATAC | |||
| cg02085507 | TRIP10 | F | GTTAATGGATTTGGTTTTG | 186 |
| R | AACTCAAAAAATCCTTCCT | |||
| cg20692569 | FZD9 | F | TTGTTGTTGTGGTAGT | 160 |
| R | AACCCAACAAATTAAA | |||
| cg04528819 | KLF14 | F | AATAGGTTTTGGTGTAGTT | 138 |
| R | CAACCTCTAATAAATTCTCT | |||
| cg08370996 | NR2F2 | F | GTGTTAAAGTTTATTATATAGA | 187 |
| R | AAAAAAAAAAACACACAC | |||
| cg04084157 | VGF | F | GAGGGTGTTTGTTTTTTT | 111 |
| R | AACATTTCATTCATTCATTC | |||
| cg22736354 | NHLRC1 | F | GTTGAGTTTAGGAGTTTTAT | 201 |
| R | CTTTAAAAAATTTAACCACC | |||
| cg06493994 | SCGN | F | GGAGAGTAAGTTAAGAAATA | 150 |
| R | AACCTACCAAAAACCAAC | |||
| cg02479575 | C19orf30 | F | GGAGGAGAATGTTATTTATT | 143 |
| R | CTATCCAAAATTCTAAAAAC | |||
Fig. 1Change of methylation levels over advancing age for the 16 CpG sites included in the eventual ANN model.
Fig. 2Age prediction using multiple regression analysis (23 CpG sites) (a) Predicted vs. Chronological age (years) for all 1156 individuals used in this study (linear correlation R2 = 0.923, mean absolute error = 4.61 years, standard deviation = 4.36 years), (b) Predicted error (years) over advancing age. As shown most individuals were predicted within a ±5 year error range (0.61), while 1029 out of 1156 samples were predicted within a ±10 year error range (0.89).
Fig. 3Summary of ANN model for age prediction analysis. (a) Predicted vs. Chronological age for all 1156 individuals included in the study using the optimised 16–694-2-1 GRNN model, (b) Residual errors for the optimised model, (c) Prediction skewness for the blind test cases only using the optimised model, and (d) Sensitivity analysis and marker input consistency to age predictions across training, verification and blind test subsets. Error ratios are calculated as the ratio of the prediction inaccuracy by including all inputs to the prediction accuracy following systematic removal of each CpG site from 10 replicated GRNN networks. Boxes include data from the 25th–75th percentile as well as the median (thin line) and mean (thick line); error bars include the 5th and 95th percentile; numbers over boxes represent the rank order based on the mean.
Epigenetic aging signature consisted of 16 CpG sites Information in this table includes the exact chromosomal location of the selected CpG sites (GRCh37/hg19) as well as the involved genes.
| CpG sites | Chromosomal location | Gene |
|---|---|---|
| cg19761273 | 17: 80,232,096 | CSNK1D − casein kinase 1; delta isoform 1 |
| cg27544190 | 21: 33,785,434 | C21orf63 − chromosome 21 open reading frame 63 |
| cg03286783 | 15: 44,580,973 | CASC4 − cancer susceptibility candidate 4 isoform a |
| cg01511567 | 11: 57,103,631 | SSRP1 − structure specific recognition protein 1 |
| cg07158339 | 9: 71,650,237 | FXN −frataxin, mitochondrial isoform 1 preproprotein |
| cg05442902 | 22: 21,369,010 | P2RXL1 − purinergic receptor P2X-like 1; orphan receptor |
| cg24450312 | 1: 206,681,158 | RASSF5 − Ras association domain family 5 isoform B |
| cg17274064 | 21: 40,033,892 | ERG − v-ets erythroblastosis virus E26 oncogene like isoform 2 |
| cg02085507 | 19: 6,739,192 | TRIP10 − thyroid hormone receptor interactor 10 |
| cg20692569 | 7: 72,848,481 | FZD9 − frizzled 9 |
| cg04528819 | 7: 130,418,315 | KLF14 − Kruppel-like factor 14 |
| cg08370996 | 15: 96,874,031 | NR2F2 − nuclear receptor subfamily 2; group F; member 2 |
| cg04084157 | 7: 100,809,049 | VGF − nerve growth factor inducible precursor |
| cg22736354 | 6: 18,122,719 | NHLRC1 − malin |
| cg06493994 | 6: 25,652,602 | SCGN − secretagogin precursor |
| cg02479575 | 19: 4,769,653 | C19orf30 − hypothetical protein LOC284424 |
Fig. 4Age prediction in blood using the developed MiSeq method (n = 46).