| Literature DB >> 28149326 |
Elior Rahmani1, Liat Shenhav2, Regev Schweiger1, Paul Yousefi3, Karen Huen3, Brenda Eskenazi3, Celeste Eng4, Scott Huntsman4, Donglei Hu4, Joshua Galanter4,5, Sam S Oh4, Melanie Waldenberger6,7, Konstantin Strauch8,9, Harald Grallert6,7,10, Thomas Meitinger11,12, Christian Gieger6,7,10, Nina Holland3, Esteban G Burchard4,5, Noah Zaitlen4, Eran Halperin13,14.
Abstract
BACKGROUND: Genetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data.Entities:
Keywords: Ancestry; DNA methylation; Epigenetics; Epigenome-wide association study (EWAS); Illumina 450K; Population structure
Mesh:
Year: 2017 PMID: 28149326 PMCID: PMC5267476 DOI: 10.1186/s13072-016-0108-y
Source DB: PubMed Journal: Epigenetics Chromatin ISSN: 1756-8935 Impact factor: 4.954
Fig. 1Fraction of variance explained in the first two genotype-based PCs of the GALA II data using several methods. Presented are linear predictors using increasing number of EPISTRUCTURE PCs (in blue), methylation-based PCs (in red) and methylation-based PCs after feature selection based on a previous study [21] (in yellow) for capturing a the first genotype-based PC and b the second genotype-based PC
Fig. 2Capturing population structure in the GALA II data using an unsupervised approach. a The first two PCs of the genotypes, considered as the gold standard, separate the samples into two subpopulations: Puerto Ricans (in blue) and Mexicans (in red), b the first two PCs of the methylation levels (methylation PCs) cannot reconstruct the separation found with the genotype data, c recalculating the first two PCs after applying a feature selection based on proximity of CpGs to nearby SNPs as was proposed by Barfield et al. [21], d the first two PCs of the methylation after adjusting the data for cell-type composition (adjusted methylation PCs) can reconstruct most of the separation found in the genotypes, e using adjusted methylation PCs after excluding the 70,889 polymorphic sites from the data, f using adjusted methylation PCs after excluding the 167,738 probes containing at least one common SNP
Fig. 3Capturing population structure in the CHAMACOS data. Presented are linear predictors of the first genotype-based PC using a the first two methylation PCs of the data, b the first two PCs calculated after applying a feature selection based on proximity of CpGs to nearby SNPs [21], c the first two PCs after adjusting the data for cell-type composition (adjusted methylation PCs), d the first two adjusted methylation PCs after excluding 167,738 probes containing SNPs from the data and e using the first two EPISTRUCTURE PCs