| Literature DB >> 31399127 |
Victor Yuan1,2, E Magda Price1,2, Giulia Del Gobbo1,2, Sara Mostafavi1,2,3, Brian Cox4, Alexandra M Binder5, Karin B Michels5, Carmen Marsit6, Wendy P Robinson7,8.
Abstract
BACKGROUND: The influence of genetics on variation in DNA methylation (DNAme) is well documented. Yet confounding from population stratification is often unaccounted for in DNAme association studies. Existing approaches to address confounding by population stratification using DNAme data may not generalize to populations or tissues outside those in which they were developed. To aid future placental DNAme studies in assessing population stratification, we developed an ethnicity classifier, PlaNET (Placental DNAme Elastic Net Ethnicity Tool), using five cohorts with Infinium Human Methylation 450k BeadChip array (HM450k) data from placental samples that is also compatible with the newer EPIC platform.Entities:
Keywords: Association study; Confounding; DNA methylation; Epigenetics; Machine learning; Microarray; Placenta; Population stratification
Mesh:
Year: 2019 PMID: 31399127 PMCID: PMC6688210 DOI: 10.1186/s13072-019-0296-3
Source DB: PubMed Journal: Epigenetics Chromatin ISSN: 1756-8935 Impact factor: 4.954
Description of methods to infer self-reported ethnicity or genetic ancestry using HM450K data
| Name of method | Statistical approach | Input HM450K sites | Output | Sample characteristics | ||
|---|---|---|---|---|---|---|
| Tissue | Populationsa | Cohort location | ||||
| Barfield et al. [ | PCA | 7703 DNAme sites with a 1000 genomes project SNP at the CpG site | Genetic ancestry as PC scores | Blood | Caucasian-Americans, African–Americans | USA |
| EPISTRUCTURE [ | PCA | 4913 DNAme sites associated with local genetic variation (mQTLs) | Genetic ancestry as PC scores | Blood | Europeans, Puerto Ricans, Mexicans | Southern Germany; USA |
| Zhou et al. [ | Predictive-modeling | 59/65 SNP sites | Ethnicity | Multiple | White, Black or African American, Asian | Many |
| PlaNET; this study | Predictive-modeling | 15 SNPs; 1845 DNAme sites | Ethnicity and genetic ancestry | Placenta | Caucasians, Asians, Africans | Canada, USA |
aEthnicity/ancestry as defined in associated study
Description of HM450K DNAme datasets used to develop and test PlaNET
| Cohort ( | GEO accession | Dataset summary | Location | Self-reported ethnicity | Non-HM450K genetic data ( | ||
|---|---|---|---|---|---|---|---|
| AFR ( | ASI ( | CAU ( | |||||
| C1 (72) | GSE70453 | 36 controls, 36 gestational diabetes mellitus | Boston, MA, USA | 13 | 13 | 46 | N/A |
| C2 (24) | GSE73375 | 13 controls, 11 preeclampsia (PE) | Chapel Hill, NC, USA | 13 | 1 | 10 | N/A |
| C3 (289) | GSE75248 | 289 samples from infants with variable newborn neurobehavior | RI, USA; MA, USA | 23 | 9 | 257 | N/A |
| C4 (44) | GSE100197 | 17 controls, 27 PE | Toronto, CAN | 7 | 12 | 25 | 50 AIMs (41) |
| C5 (70) | GSE100197, GSE108567, GSE74738, unpublished | 35 controls, 35 fetal growth restriction, PE, and/or preterm birth | Vancouver, CAN | 1 | 18 | 51 | 50 AIMs (67); Omni2.5 (27) |
AFR African, ASI Asian, CAU Caucasian
Fig. 1Evaluating PlaNET’s performance and characterizing ethnicity-predictive HM450K sites. We developed PlaNET (Placental elastic net ethnicity classifier), using placental HM450K data and evaluated its classification performance using leave-one-dataset-out cross validation. a Each sample’s ethnicity classification from PlaNET is shown with respect to their self-reported ethnicity. Samples were called ‘ambiguous’ if their predicted probability fell below a ‘confidence’ threshold of 75%. b PlaNET utilizes a subset of ethnicity-predictive sites from the HM450K. To investigate whether genetic signal is present in the measurement for these sites, we cross-referenced ethnicity-predictive sites to an existing placental mQTL database [42] and determined whether any sites had SNPs present in either the probe body, CpG site of interrogation, or single base extension sites, based on dbSNP137
Fig. 2Probabilities associated with PlaNET ethnicity predictions and genetic ancestry inferred from AIMs. Ethnicity classifications from PlaNET and associated confidence/probability scores were compared to genetic ancestry inferred from 50 AIMs (n = 109, cohorts C4, C5), represented by the first three coordinates from multidimensional scaling using 1000 genomes project samples as reference populations
Fig. 3Probabilities associated with PlaNET ethnicity predictions and genetic ancestry inferred from high-density genotyping data. PlaNET was tested in a subset of cohort C5 (n = 37). a PlaNET’s ethnicity classifications were compared with self-reported ethnicity. b Ethnicity probabilities generated by PlaNET were compared to c genetic ancestry coefficients determined from high-density genotyping data (Omni 2.5, > 2 million SNPs), using the function snmf() from the R package LEA, and found to be highly correlated (R2 = 0.95–0.96, p < 0.001) determined by linear regression
Fig. 4Comparing PlaNET to existing methods to account for population stratification using HM450K data. For each cohort, principal components analysis was conducted on PlaNET using a model trained on all other cohorts. PlaNET’s principal components (PCs) were then compared to the PCs computed on sites from EPISTRUCTURE [9], Barfield’s method [8], and the 59 SNPs. a Amount of variance explained from a series of linear models where principal component “i” is a function of self reported ethnicity encoded as a dummy variable. b This was then repeated using AIMs coordinates 1 and 2 instead of ethnicity as the independent variable (n = 109)
Distribution of PlaNET ethnicity predictions across previously published placental EWAS datasets
| GEO accession | Primary groups | African | Asian | Caucasian | Ambiguous |
|---|---|---|---|---|---|
| GSE98224 | EOPET | 5 | 4 | 10 | 0 |
| Preterm controls | 1 | 3 | 5 | 0 | |
| LOPET | 1 | 1 | 8 | 1 | |
| Term controls | 0 | 4 | 5 | 0 | |
| GSE100197 | EOPET | 1 | 5 | 15 | 1 |
| Preterm controls | 1 | 4 | 19 | 0 | |
| LOPET | 0 | 6 | 12 | 0 | |
| Term controls | 0 | 2 | 17 | 0 | |
| IUGR | 0 | 3 | 8 | 0 | |
| GSE71678 | NAa | 0 | 0 | 342 | 1 |
EOPET early onset preeclampsia, LOPET late onset preeclampsia
aPhenotype of interest is a continuous variable (arsenic concentration)