| Literature DB >> 35304597 |
Paul D Yousefi1, Matthew Suderman1, Ryan Langdon1, Oliver Whitehurst1, George Davey Smith1, Caroline L Relton2.
Abstract
DNA methylation data have become a valuable source of information for biomarker development, because, unlike static genetic risk estimates, DNA methylation varies dynamically in relation to diverse exogenous and endogenous factors, including environmental risk factors and complex disease pathology. Reliable methods for genome-wide measurement at scale have led to the proliferation of epigenome-wide association studies and subsequently to the development of DNA methylation-based predictors across a wide range of health-related applications, from the identification of risk factors or exposures, such as age and smoking, to early detection of disease or progression in cancer, cardiovascular and neurological disease. This Review evaluates the progress of existing DNA methylation-based predictors, including the contribution of machine learning techniques, and assesses the uptake of key statistical best practices needed to ensure their reliable performance, such as data-driven feature selection, elimination of data leakage in performance estimates and use of generalizable, adequately powered training samples.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35304597 DOI: 10.1038/s41576-022-00465-w
Source DB: PubMed Journal: Nat Rev Genet ISSN: 1471-0056 Impact factor: 53.242