Kathleen Oros Klein1, Stepan Grinek1, Sasha Bernatsky2, Luigi Bouchard3, Antonio Ciampi4, Ines Colmegna5, Jean-Philippe Fortin6, Long Gao4, Marie-France Hivert7, Marie Hudson8, Michael S Kobor9, Aurelie Labbe4, Julia L MacIsaac10, Michael J Meaney11, Alexander M Morin10, Kieran J O'Donnell12, Tomi Pastinen13, Marinus H Van Ijzendoorn14, Gregory Voisin1, Celia M T Greenwood15. 1. Lady Davis Institute, Jewish General Hospital, Montreal, QC H3T 1E2, Canada, Ludmer Center for Neuroinformatics and Mental Health. 2. Divisions of Rheumatology and Clinical Epidemiology, McGill University Health Centre, McGill University, Montreal, QC H4A 3J1, Canada. 3. ECOGENE-21, Centre intégré universitaire de santé et de service sociaux du Saguenay-Lac-Saint-Jean, QC G8H 3P7, Canada, Department of Biochemistry, Université de Sherbrooke, QC J1K 2R1, Canada. 4. Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2, Canada. 5. Division of Experimental Medicine, McGill University Health Centre, McGill University, Montreal, QC H3A 1A3, Canada. 6. Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA. 7. Department of Population Medicine, Harvard Medical School, Harvard Pilgrim Health Care Institute, Boston, MA 02215, USA, Department of Medicine, Division of Endocrinology, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada. 8. Lady Davis Institute, Jewish General Hospital, Montreal, QC H3T 1E2, Canada, Department of Medicine, McGill University Health Center, Montreal, QC H4A 3J1, Canada. 9. Canadian Institute for Advanced Research, Child, and Brain Development Program, Toronto, ON M5G 1Z8, Canada, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC V5Z 4H4, Canada, Department of Medical Genetics, University of British Columbia, Vancouver, BC V6H 3N1, Canada. 10. Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC V5Z 4H4, Canada, Department of Medical Genetics, University of British Columbia, Vancouver, BC V6H 3N1, Canada. 11. Ludmer Center for Neuroinformatics and Mental Health, Canadian Institute for Advanced Research, Child, and Brain Development Program, Toronto, ON M5G 1Z8, Canada, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC V5Z 4H4, Canada, Douglas Mental Health University Institute, McGill University, Montreal, QC H4H 1R3, Canada, Departments of Psychiatry, McGill University, Montreal, QC, Canada H3A 1A1, Department of Neurology and Neurosurgery, McGill University, Montreal, QC H3A 2B4, Canada. 12. Douglas Mental Health University Institute, McGill University, Montreal, QC H4H 1R3, Canada. 13. Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada and. 14. Centre for Child and Family Studies, Leiden University, Leiden 2300 RB, The Netherlands. 15. Lady Davis Institute, Jewish General Hospital, Montreal, QC H3T 1E2, Canada, Department of Biochemistry, Université de Sherbrooke, QC J1K 2R1, Canada, Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2, Canada, Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada and.
Abstract
MOTIVATION: DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450 K) built on the concepts in the recently published funNorm method, and introducing cell-type or tissue-type flexibility. RESULTS: funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability. AVAILABILITY AND IMPLEMENTATION: An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org.
MOTIVATION: DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450 K) built on the concepts in the recently published funNorm method, and introducing cell-type or tissue-type flexibility. RESULTS: funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability. AVAILABILITY AND IMPLEMENTATION: An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org.
Recently, a normalization method was introduced by Fortin specifically designed for the Illumina Infinium Human Methylation 450 BeadChip (Illumina 450 K) and implemented in Bioconductor’s minfi package(Aryee ). The percentile-specific adjustments in funNorm are the key feature allowing batch effects and technical artefacts to have non-constant influence across the range of signal strengths.However, since methylation patterns may differ substantially across cell types or tissues leading to cell- (or tissue)-type-specific quantiles, optimal normalization adjustments should capture this. Here we present an R package for normalization of Illumina 450 K data, funtooNorm (an extension of the ideas in funNorm) applicable to such heterogeneous data sets.
2 Methods
Key features of funtooNorm and funNorm are identical, i.e. normalization adjustments are estimated via regression models applied to a series of quantiles of the probe-type-specific signals in each sample. Covariates, derived from the control probes, capture variation not associated with the biological signals of interest. In funtooNorm, an augmented covariate matrix is constructed by including interactions between cell-type or tissue-type indicators and the average signal from each control probe type. Either principal component regression (PCR) or partial least squares regression (PLS) (Tenenhaus, 1998) can be fit (the type.fits option); as in funNorm, normalized methylation values are based on predictions from linear interpolations between the analyzed percentiles (see Supplemental Methods).The function, funtoonorm, operates in two distinct modes:
Three data sets are used to illustrate performance (Supplemental Table S1). In the Replication Data Set, methylation was measured in ten healthy individuals who contributed 2–3 samples of each of whole blood, buccal swab and dried blood spots, including a mixture of technical and biological replicates. In the Systemic Autoimmune Diseases Data (SARDS), monocytes and CD4 + T-cells from incident patients were separated from whole blood, with repeated samples drawn before and after 6 months of immunosuppressive treatment. For the Gestational Diabetes Data (GD), one technical replicate sample was available for each of fetal placenta and cord blood tissues. Agreement—within a tissue or cell type—is measured by the average (over probes) of the squared intra-replicate set differences, summed over distinct individuals.Normalization mode: When validate = FALSE, normalization of the data is performed for a chosen number of components in the regressions. The model-fitting step requires only a set of quantiles for each sample, and hence is efficient both computationally and in memory usage. Calculations can be performed in a modular fashion; intermediary results can be saved by setting appropriate flags.Cross-validation mode: When validate = TRUE, a graphical display of root mean squared errors (RMSE) obtained with cross-validation facilitates choice of an appropriate number of components (Fig. 1). Plots are provided for both PCR and PLS fits.
Fig. 1.
Root mean square error from cross-validation comparing different numbers of components in funtooNorm on the Replication Data Set. Separate model fits are implemented for A and B signals, and for different probe types
Root mean square error from cross-validation comparing different numbers of components in funtooNorm on the Replication Data Set. Separate model fits are implemented for A and B signals, and for different probe types
3 Results
Figure 1 displays the cross-validation RMSE plot for the Replication Data set with PCR. The optimal number of components varies across the percentiles and signals; evidently there is substantial improvement in mean squared error from 2 to 3 components.Technical replicate agreement was improved with funtooNorm compared to funNorm (Supplemental Figs S1 and S2, Supplemental Tables S2 and S3). Agreement improved by substantially for technical replicates of whole blood, blood spots, and fetal placenta tissues, although there was little difference between the methods for buccal swabs or cord blood. For biological replicates, we saw improvements of 10-20% in many tissues. Performance was particularly good for probes on the X chromosome. Supplemental Figure S3 shows that the distribution across probes of the differences between tissue types is distinct on the X chromosome; this is captured by our augmented covariate matrix. A similar argument explains enhanced performance for some probe annotations (Supplemental Fig. S4). Performance on the Y chromosome was poor, since with only 416 probes, a quantile-based model fit is overly complex; we recommend the simpler method implemented in funNorm for this chromosome.
4 Discussion
Most methylation studies today are designed to detect inter-individual differences, rather than inter-tissue differences. Improved normalization of datasets containing multiple tissues can be expected to translate into increased power to detect associations of interest, due to the inferred reduction in residual error; funNorm and this extension funtooNorm are designed with this goal in mind.
Authors: Martin J Aryee; Andrew E Jaffe; Hector Corrada-Bravo; Christine Ladd-Acosta; Andrew P Feinberg; Kasper D Hansen; Rafael A Irizarry Journal: Bioinformatics Date: 2014-01-28 Impact factor: 6.937
Authors: Jean-Philippe Fortin; Drew Parker; Birkan Tunç; Takanori Watanabe; Mark A Elliott; Kosha Ruparel; David R Roalf; Theodore D Satterthwaite; Ruben C Gur; Raquel E Gur; Robert T Schultz; Ragini Verma; Russell T Shinohara Journal: Neuroimage Date: 2017-08-18 Impact factor: 6.556
Authors: Marie Hudson; Sasha Bernatsky; Ines Colmegna; Maximilien Lora; Tomi Pastinen; Kathleen Klein Oros; Celia M T Greenwood Journal: Epigenetics Date: 2017-04-07 Impact factor: 4.528
Authors: Dongjing Liu; Benjamin E Zusman; John R Shaffer; Yunqi Li; Annie I Arockiaraj; Shuwei Liu; Daniel E Weeks; Shashvat M Desai; Patrick M Kochanek; Ava M Puccio; David O Okonkwo; Yvette P Conley; Ruchira M Jha Journal: Neurocrit Care Date: 2022-01-13 Impact factor: 3.532
Authors: Mitali Ray; Lacey W Heinsberg; Yvette P Conley; James M Roberts; Arun Jeyabalan; Carl A Hubel; Daniel E Weeks; Mandy J Schmella Journal: Hypertens Pregnancy Date: 2021-10-26 Impact factor: 2.108
Authors: Tillie-Louise Hackett; Alan J Knox; Rachel L Clifford; Nick Fishbane; Jamie Patel; Julia L MacIsaac; Lisa M McEwen; Andrew J Fisher; Corry-Anke Brandsma; Parameswaran Nair; Michael S Kobor Journal: Clin Epigenetics Date: 2018-03-05 Impact factor: 7.259
Authors: Amery Treble-Barna; Lacey W Heinsberg; Ava M Puccio; John R Shaffer; David O Okonkwo; Sue R Beers; Daniel E Weeks; Yvette P Conley Journal: Neurorehabil Neural Repair Date: 2021-06-25 Impact factor: 3.919
Authors: Marie Forest; Kieran J O'Donnell; Greg Voisin; Helene Gaudreau; Julia L MacIsaac; Lisa M McEwen; Patricia P Silveira; Meir Steiner; Michael S Kobor; Michael J Meaney; Celia M T Greenwood Journal: Epigenetics Date: 2018-01-30 Impact factor: 4.528
Authors: Janine F Felix; Bonnie R Joubert; Andrea A Baccarelli; Gemma C Sharp; Catarina Almqvist; Isabella Annesi-Maesano; Hasan Arshad; Nour Baïz; Marian J Bakermans-Kranenburg; Kelly M Bakulski; Elisabeth B Binder; Luigi Bouchard; Carrie V Breton; Bert Brunekreef; Kelly J Brunst; Esteban G Burchard; Mariona Bustamante; Leda Chatzi; Monica Cheng Munthe-Kaas; Eva Corpeleijn; Darina Czamara; Dana Dabelea; George Davey Smith; Patrick De Boever; Liesbeth Duijts; Terence Dwyer; Celeste Eng; Brenda Eskenazi; Todd M Everson; Fahimeh Falahi; M Daniele Fallin; Sara Farchi; Mariana F Fernandez; Lu Gao; Tom R Gaunt; Akram Ghantous; Matthew W Gillman; Semira Gonseth; Veit Grote; Olena Gruzieva; Siri E Håberg; Zdenko Herceg; Marie-France Hivert; Nina Holland; John W Holloway; Cathrine Hoyo; Donglei Hu; Rae-Chi Huang; Karen Huen; Marjo-Riitta Järvelin; Dereje D Jima; Allan C Just; Margaret R Karagas; Robert Karlsson; Wilfried Karmaus; Katerina J Kechris; Juha Kere; Manolis Kogevinas; Berthold Koletzko; Gerard H Koppelman; Leanne K Küpers; Christine Ladd-Acosta; Jari Lahti; Nathalie Lambrechts; Sabine A S Langie; Rolv T Lie; Andrew H Liu; Maria C Magnus; Per Magnus; Rachel L Maguire; Carmen J Marsit; Wendy McArdle; Erik Melén; Phillip Melton; Susan K Murphy; Tim S Nawrot; Lorenza Nisticò; Ellen A Nohr; Björn Nordlund; Wenche Nystad; Sam S Oh; Emily Oken; Christian M Page; Patrice Perron; Göran Pershagen; Costanza Pizzi; Michelle Plusquin; Katri Raikkonen; Sarah E Reese; Eva Reischl; Lorenzo Richiardi; Susan Ring; Ritu P Roy; Peter Rzehak; Greet Schoeters; David A Schwartz; Sylvain Sebert; Harold Snieder; Thorkild I A Sørensen; Anne P Starling; Jordi Sunyer; Jack A Taylor; Henning Tiemeier; Vilhelmina Ullemar; Marina Vafeiadi; Marinus H Van Ijzendoorn; Judith M Vonk; Annette Vriens; Martine Vrijheid; Pei Wang; Joseph L Wiemels; Allen J Wilcox; Rosalind J Wright; Cheng-Jian Xu; Zongli Xu; Ivana V Yang; Paul Yousefi; Hongmei Zhang; Weiming Zhang; Shanshan Zhao; Golareh Agha; Caroline L Relton; Vincent W V Jaddoe; Stephanie J London Journal: Int J Epidemiol Date: 2018-02-01 Impact factor: 7.196