Sailalitha Bollepalli1,2, Tellervo Korhonen1,3, Jaakko Kaprio1,2, Simon Anders1,4, Miina Ollikainen1,2. 1. Institute for Molecular Medicine Finland, University of Helsinki, 00290 Helsinki, Uusimaa, Finland. 2. Department of Public Health, University of Helsinki, 00290 Helsinki, Uusimaa, Finland. 3. National Institute for Health & Welfare, University of Helsinki, P.O. Box 30, FI-00271 Helsinki, Uusimaa, Finland. 4. Center for Molecular Biology of the University of Heidelberg, Im Neuenheimer Feld 282, 69120 Heidelberg, Baden-Württemberg, Germany.
Abstract
Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.
Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.
Keywords:
DNA methylation; epigenetic smoking status; multinomial LASSO; smoking status classifier; tobacco smoking
Authors: Danni A Gadd; Robert F Hillary; Daniel L McCartney; Liu Shi; Aleks Stolicyn; Neil A Robertson; Rosie M Walker; Robert I McGeachan; Archie Campbell; Shen Xueyi; Miruna C Barbu; Claire Green; Stewart W Morris; Mathew A Harris; Ellen V Backhouse; Joanna M Wardlaw; J Douglas Steele; Diego A Oyarzún; Graciela Muniz-Terrera; Craig Ritchie; Alejo Nevado-Holgado; Tamir Chandra; Caroline Hayward; Kathryn L Evans; David J Porteous; Simon R Cox; Heather C Whalley; Andrew M McIntosh; Riccardo E Marioni Journal: Nat Commun Date: 2022-08-09 Impact factor: 17.694
Authors: Gang Peng; Yibo Xi; Chiara Bellini; Kien Pham; Zhen W Zhuang; Qin Yan; Man Jia; Guilin Wang; Lingeng Lu; Moon-Shong Tang; Hongyu Zhao; He Wang Journal: Am J Cancer Res Date: 2022-08-15 Impact factor: 5.942
Authors: Paul J Hop; Ramona A J Zwamborn; Eilis J Hannon; Annelot M Dekker; Kristel R van Eijk; Emma M Walker; Alfredo Iacoangeli; Ashley R Jones; Aleksey Shatunov; Ahmad Al Khleifat; Sarah Opie-Martin; Christopher E Shaw; Karen E Morrison; Pamela J Shaw; Russell L McLaughlin; Orla Hardiman; Ammar Al-Chalabi; Leonard H Van Den Berg; Jonathan Mill; Jan H Veldink Journal: NAR Genom Bioinform Date: 2020-12-17
Authors: C Christiansen; J E Castillo-Fernandez; A Domingo-Relloso; W Zhao; J S El-Sayed Moustafa; P-C Tsai; J Maddock; K Haack; S A Cole; S L R Kardia; M Molokhia; M Suderman; C Power; C Relton; A Wong; D Kuh; A Goodman; K S Small; J A Smith; M Tellez-Plaza; A Navas-Acien; G B Ploubidis; R Hardy; J T Bell Journal: Clin Epigenetics Date: 2021-02-16 Impact factor: 6.551
Authors: Daniel L McCartney; Robert F Hillary; Eleanor L S Conole; Daniel Trejo Banos; Danni A Gadd; Rosie M Walker; Cliff Nangle; Robin Flaig; Archie Campbell; Alison D Murray; Susana Muñoz Maniega; María Del C Valdés-Hernández; Mathew A Harris; Mark E Bastin; Joanna M Wardlaw; Sarah E Harris; David J Porteous; Elliot M Tucker-Drob; Andrew M McIntosh; Kathryn L Evans; Ian J Deary; Simon R Cox; Matthew R Robinson; Riccardo E Marioni Journal: Genome Biol Date: 2022-01-17 Impact factor: 13.583
Authors: Arce Domingo-Relloso; Angela L Riffo-Campos; Karin Haack; Pilar Rentero-Garrido; Christine Ladd-Acosta; Daniele M Fallin; Wan Yee Tang; Miguel Herreros-Martinez; Juan R Gonzalez; Anne K Bozack; Shelley A Cole; Ana Navas-Acien; Maria Tellez-Plaza Journal: Environ Health Perspect Date: 2020-06-02 Impact factor: 9.031
Authors: Antonella De Lillo; Gita A Pathak; Flavio De Angelis; Marco Di Girolamo; Marco Luigetti; Mario Sabatelli; Federico Perfetto; Sabrina Frusconi; Dario Manfellotto; Maria Fuciarelli; Renato Polimanti Journal: Clin Epigenetics Date: 2020-11-17 Impact factor: 6.551
Authors: Dilini M Kothalawala; Latha Kadalayil; John A Curtin; Clare S Murray; Angela Simpson; Adnan Custovic; William J Tapper; S Hasan Arshad; Faisal I Rezwan; John W Holloway Journal: J Pers Med Date: 2022-01-08