Shicai Fan1, Chengzhe Li2, Rizi Ai3, Mengchi Wang3, Gary S Firestein4, Wei Wang3. 1. School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China Department of Chemistry and Biochemistry. 2. School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China. 3. Department of Chemistry and Biochemistry. 4. Division of Rheumatology, Allergy and Immunology, University of California San Diego, La Jolla, CA, USA.
Abstract
MOTIVATION: DNA methylation signatures in rheumatoid arthritis (RA) have been identified in fibroblast-like synoviocytes (FLS) with Illumina HumanMethylation450 array. Since <2% of CpG sites are covered by the Illumina 450K array and whole genome bisulfite sequencing is still too expensive for many samples, computationally predicting DNA methylation levels based on 450K data would be valuable to discover more RA-related genes. RESULTS: We developed a computational model that is trained on 14 tissues with both whole genome bisulfite sequencing and 450K array data. This model integrates information derived from the similarity of local methylation pattern between tissues, the methylation information of flanking CpG sites and the methylation tendency of flanking DNA sequences. The predicted and measured methylation values were highly correlated with a Pearson correlation coefficient of 0.9 in leave-one-tissue-out cross-validations. Importantly, the majority (76%) of the top 10% differentially methylated loci among the 14 tissues was correctly detected using the predicted methylation values. Applying this model to 450K data of RA, osteoarthritis and normal FLS, we successfully expanded the coverage of CpG sites 18.5-fold and accounts for about 30% of all the CpGs in the human genome. By integrative omics study, we identified genes and pathways tightly related to RA pathogenesis, among which 12 genes were supported by triple evidences, including 6 genes already known to perform specific roles in RA and 6 genes as new potential therapeutic targets. AVAILABILITY AND IMPLEMENTATION: The source code, required data for prediction, and demo data for test are freely available at: http://wanglab.ucsd.edu/star/LR450K/ CONTACT: wei-wang@ucsd.edu or gfirestein@ucsd.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: DNA methylation signatures in rheumatoid arthritis (RA) have been identified in fibroblast-like synoviocytes (FLS) with Illumina HumanMethylation450 array. Since <2% of CpG sites are covered by the Illumina 450K array and whole genome bisulfite sequencing is still too expensive for many samples, computationally predicting DNA methylation levels based on 450K data would be valuable to discover more RA-related genes. RESULTS: We developed a computational model that is trained on 14 tissues with both whole genome bisulfite sequencing and 450K array data. This model integrates information derived from the similarity of local methylation pattern between tissues, the methylation information of flanking CpG sites and the methylation tendency of flanking DNA sequences. The predicted and measured methylation values were highly correlated with a Pearson correlation coefficient of 0.9 in leave-one-tissue-out cross-validations. Importantly, the majority (76%) of the top 10% differentially methylated loci among the 14 tissues was correctly detected using the predicted methylation values. Applying this model to 450K data of RA, osteoarthritis and normal FLS, we successfully expanded the coverage of CpG sites 18.5-fold and accounts for about 30% of all the CpGs in the human genome. By integrative omics study, we identified genes and pathways tightly related to RA pathogenesis, among which 12 genes were supported by triple evidences, including 6 genes already known to perform specific roles in RA and 6 genes as new potential therapeutic targets. AVAILABILITY AND IMPLEMENTATION: The source code, required data for prediction, and demo data for test are freely available at: http://wanglab.ucsd.edu/star/LR450K/ CONTACT: wei-wang@ucsd.edu or gfirestein@ucsd.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Michael B Stadler; Rabih Murr; Lukas Burger; Robert Ivanek; Florian Lienert; Anne Schöler; Erik van Nimwegen; Christiane Wirbelauer; Edward J Oakeley; Dimos Gaidatzis; Vijay K Tiwari; Dirk Schübeler Journal: Nature Date: 2011-12-14 Impact factor: 49.962
Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205
Authors: Rizi Ai; John W Whitaker; David L Boyle; Paul Peter Tak; Danielle M Gerlag; Wei Wang; Gary S Firestein Journal: Arthritis Rheumatol Date: 2015-07 Impact factor: 10.995
Authors: Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson Journal: Nat Biotechnol Date: 2010-10 Impact factor: 54.908
Authors: Hyang-Min Byun; Kimberly D Siegmund; Fei Pan; Daniel J Weisenberger; Gary Kanel; Peter W Laird; Allen S Yang Journal: Hum Mol Genet Date: 2009-09-23 Impact factor: 6.150
Authors: Shicai Fan; Jianxiong Tang; Nan Li; Ying Zhao; Rizi Ai; Kai Zhang; Mengchi Wang; Wei Du; Wei Wang Journal: NPJ Genom Med Date: 2019-02-01 Impact factor: 8.617
Authors: Liang Li; Hongmei Zhang; John W Holloway; Susan Ewart; Caroline L Relton; S Hasan Arshad; Wilfried Karmaus Journal: ERJ Open Res Date: 2022-02-28