| Literature DB >> 34496950 |
Lauren A Vanderlinden1, Randi K Johnson2, Patrick M Carry2, Fran Dong2, Dawn L DeMeo3, Ivana V Yang4, Jill M Norris2, Katerina Kechris5.
Abstract
OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort.Entities:
Keywords: DNA methylation; Illumina 450K; Illumina EPIC; Platform harmonization
Mesh:
Year: 2021 PMID: 34496950 PMCID: PMC8424820 DOI: 10.1186/s13104-021-05741-2
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Pipeline Methods Considered. The four main pre-processesing steps are: 1. Normalization and probe QC, 2. Batch effect adjustment, 3. Extra probe filtering and 4. Genomic inflation adjustment. The various methods considered for each step is listed along with the evaluation(s) used to assess these methods
Fig. 2Platform Effect. The 1st and 2nd principal components (PCs) from the ssNoob normalization are plotted with colors symbolizing both platform and sex. Red and blue dots signify the 450K platform while purple and green dots signify the EPIC data. Red and purple dots signifiy females and blue and green dots signify males. Percent variance explained by each PC is noted in parentheses
Fig. 3Correlation of Technical Replicates. Density plots of correlations across the platform technical replicates for each probe (n = 12, green) as well as a random subset of pairs for comparison (n = 12, purple) for the data normalized using A SeSAMe and B SWAN. The median correlation coefficient among technical replicates is both 0.41 in the SeSAMe and SWAN methods. The 1st and 3rd quartiles for technical replicates for SeSAMe and SWAN were (0.06, 0.72) and (0.11, 0.67) respectively