| Literature DB >> 29588806 |
Flavie Perrier1, Alexei Novoloaca2, Srikant Ambatipudi2, Laura Baglietto3, Akram Ghantous2, Vittorio Perduca4, Myrto Barrdahl5, Sophia Harlid6, Ken K Ong7, Alexia Cardona7, Silvia Polidoro8, Therese Haugdahl Nøst9, Kim Overvad10,11, Hanane Omichessan12,13, Martijn Dollé14, Christina Bamia15,16, José Marìa Huerta17,18, Paolo Vineis19, Zdenko Herceg2, Isabelle Romieu20, Pietro Ferrari1.
Abstract
Background: Methylation measures quantified by microarray techniques can be affected by systematic variation due to the technical processing of samples, which may compromise the accuracy of the measurement process and contribute to bias the estimate of the association under investigation. The quantification of the contribution of the systematic source of variation is challenging in datasets characterized by hundreds of thousands of features.In this study, we introduce a method previously developed for the analysis of metabolomics data to evaluate the performance of existing normalizing techniques to correct for unwanted variation. Illumina Infinium HumanMethylation450K was used to acquire methylation levels in over 421,000 CpG sites for 902 study participants of a case-control study on breast cancer nested within the EPIC cohort. The principal component partial R-square (PC-PR2) analysis was used to identify and quantify the variability attributable to potential systematic sources of variation. Three correcting techniques, namely ComBat, surrogate variables analysis (SVA) and a linear regression model to compute residuals were applied. The impact of each correcting method on the association between smoking status and DNA methylation levels was evaluated, and results were compared with findings from a large meta-analysis.Entities:
Keywords: Epigenetics; Methylation; Normalization; PC-PR2; Smoking status
Mesh:
Year: 2018 PMID: 29588806 PMCID: PMC5863487 DOI: 10.1186/s13148-018-0471-6
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Fig. 1Description of laboratory variables. a Position of chips within batches, each batch was made of 8 chips. b Sample position within chips, each chip contains 12 samples
Fig. 2Box plots of global methylation (β values) according to laboratory factors. a Batch. b Chip position within batches. c Sample position within chips. d Batches and sample position within chips
Values of weighted partial R2 (%) from PC-PR2 analysis indicating the proportion of variability of methylation levels, before and after normalization step, explained by a specific set of laboratory factors
| Values | Methodsa | Row sample position | Batch | Chip position | Totalb |
|---|---|---|---|---|---|
| Raw | 11.4 | 9.5 | 6.5 | 30.4 | |
| Residuals | 0.2 | 1.3 | 5.9 | 17.9 | |
| ComBat | 0.2 | 1.3 | 6.0 | 17.1 | |
| SVA | 0.6 | 1.3 | 0.9 | 6.5 | |
| Raw | 12.3 | 9.7 | 6.8 | 30.7 | |
| Residuals | 0.2 | 1.2 | 5.8 | 16.5 | |
| ComBat | 0.2 | 1.3 | 6.2 | 17.0 | |
| SVA | 0.4 | 0.7 | 0.8 | 5.3 |
aResiduals, COMBAT and SVA methods used to correct effect due to batch and row sample position (within the chips)
bTotal variability explained by laboratory factors and characteristics of the samples (recruitment centre, the five percentages of leukocyte subtypes, alcohol consumption, age and BMI, menopausal status, smoking, BC status and dietary folate)
Values of weighted partial R2 (%) from PC-PR2 analysis indicating the proportion of variability of raw methylation levels explained by a specific set of covariates
| Characteristics of samples | ||
|---|---|---|
| Recruitment centre | 3.0 | 2.9 |
| Percentages of leukocyte subtypes | ||
| CD4T | 3.2 | 3.2 |
| CD8T | 3.7 | 3.1 |
| Natural killers | 5.2 | 4.7 |
| B cells | 1.7 | 1.1 |
| Monocytes | 0.4 | 0.4 |
| Alcohol intake at recruitment | 0.2 | 0.1 |
| Age at recruitment | 0.4 | 0.4 |
| BMI at recruitment | 0.1 | 0.1 |
| Menopausal status | 0.2 | 0.2 |
| Smoking status | 0.1 | 0.2 |
| Breast cancer status | 0.1 | 0.1 |
| Dietary folate | 0.1 | 0.1 |
Fig. 3DNA methylation levels of the CpG site cg00000029 before and after normalization step. a β values. b M values
CpG site-specific regression models before and after normalization step
| Values | Methods | Significant sitesb | CHARGEc | Sensitivity | 1-Specificity |
|---|---|---|---|---|---|
| Standard adjustmenta | 444 | 357 (80%) | 1.9×10−2 | 2.2×10−4 | |
| Residuals | 427 | 365 (85%) | 1.9×10− 2 | 1.5×10− 4 | |
| ComBat | 600 | 411 (69%) | 2.2×10− 2 | 4.7×10− 4 | |
| SVA | 96 | 89 (92%) | 0.5×10−2 | 0.2×10−4 | |
| Standard adjustmenta | 322 | 274 (85%) | 1.5×10−2 | 1.2×10−4 | |
| Residuals | 332 | 299 (90%) | 1.6×10−2 | 0.8×10−4 | |
| ComBat | 387 | 335 (87%) | 1.8×10−2 | 1.3×10−4 | |
| SVA | 144 | 134 (93%) | 0.7×10−2 | 0.2×10−4 |
Models are adjusted for chip position, recruitment centre, the five percentages of leukocyte subtypes and age at recruitment, menopausal status and BC status
aAlso adjusted for batch and sample position
bNumber of significant sites for smoking status after p values FDR correction
cNumber (and percentage) of significant sites identified by the CHARGE meta-analysis
Fig. 4Venn diagram of significantly identified CpG sites for smoking status using each correcting methods and CHARGE. a β values. b M values. p values were corrected for multiple testing with FDR