| Literature DB >> 25762173 |
Carolina Medina-Gomez1, Janine Frédérique Felix, Karol Estrada, Marjoline Josephine Peters, Lizbeth Herrera, Claudia Jeanette Kruithof, Liesbeth Duijts, Albert Hofman, Cornelia Marja van Duijn, Andreas Gerardus Uitterlinden, Vincent Wilfred Vishal Jaddoe, Fernando Rivadeneira.
Abstract
Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases. Up to now, the majority of GWAS have focused on European populations. However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power. In this paper, we describe the methodological steps surrounding genetic data generation, quality control, study design and analytical procedures needed to run GWAS in the multiethnic and highly admixed Generation R Study, a large prospective birth cohort in Rotterdam, the Netherlands. Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.Entities:
Mesh:
Year: 2015 PMID: 25762173 PMCID: PMC4385148 DOI: 10.1007/s10654-015-9998-4
Source DB: PubMed Journal: Eur J Epidemiol ISSN: 0393-2990 Impact factor: 8.082
Fig. 1Flowchart overview of the entire GWAS QC process. Quality control of all samples from Generation R-1 and Generation R-2 after merging of the projects. Red font denotes exclusion of either SNPs or samples from the dataset in the different QC steps. (Color figure online)
Fig. 2Genetic substructure of the Generation R Study. Two-dimensional plots from MDS analyses of the Generation R Study and the three initial Panels form the HapMap Project. Left panel First two components explaining most of the variability of the data. Right panel Third and fourth components explaining some of the remaining data variability
Fig. 3Imputation quality metrics evaluation HapMap. a Boxplots of the MACH Rsq in function of the MAF of the imputed SNPs. b Imputation quality distribution per MAF category. Blue and green denotes the poorly and well imputed SNPs based in a 0.3 quality score as threshold. 88,625 out of 3,021,329 (2.93 %) are poorly imputed SNPs (Rsq < 0.3). (Color figure online)
Fig. 4Imputation Quality metrics evaluation 1KG. a Boxplots of the MACH Rsq in function of the MAF of the imputed SNPs. b Imputation quality distribution per MAF category. Blue and green denotes the poorly and well imputed SNPs based in a 0.3 quality score as threshold. 8,263,752 out of 30,072,738 (27.4 %) are poorly imputed SNPs (Rsq < 0.3)
Fig. 5Genome-wide association of red-hair pigmentation in the Generation R cohort. a Q–Q plot showing the inflation of the test statistics when correction for data structure is not applied (black dots) and the slightly lower power when genomic components correction is applied (red dots) in comparison with the EMMAX model (green dots). b Manhattan plots of the red-hair pigmentation GWAS in the Generation R Study using adjustment for genomic components. c Manhattan plots of the red-hair pigmentation GWAS in the Generation R Study using a linear mixed model as implemented in EMMAX. (Color figure online)