Literature DB >> 24788602

Improving the power of GWAS and avoiding confounding from population stratification with PC-Select.

George Tucker¹, Alkes L Price², Bonnie Berger³.

Abstract

Using a reduced subset of SNPs in a linear mixed model can improve power for genome-wide association studies, yet this can result in insufficient correction for population stratification. We propose a hybrid approach using principal components that does not inflate statistics in the presence of population stratification and improves power over standard linear mixed models.

Entities: Chemical

Keywords: GWAS; mixed models; population stratification

Mesh：

Year: 2014 PMID： 24788602 PMCID： PMC4096359 DOI： 10.1534/genetics.114.164285

Source DB: PubMed Journal: Genetics ISSN： 0016-6731 Impact factor: 4.562

IN recent years, there has been extensive research on linear mixed models (LMM) to calculate genome-wide association study (GWAS) statistics (Kang , 2010; Segura ; Svishcheva ; Zhou and Stephens 2012; Yang ). While linear mixed models implicitly assume that all SNPs have an effect on the phenotype (an infinitesimal genetic architecture), it is widely believed that disease phenotypes do not follow an infinitesimal model and that modeling a genetic architecture where most SNPs have negligible effect and some have modest effect (a noninfinitesimal genetic architecture) would increase power. As a step in that direction, Listgarten ; Lippert ) recently developed the state-of-the-art FaST-LMM Select method, which constructs a genetic relationship matrix (GRM) from a subset of top associated SNPs that are more likely to be causal. However, as a recent Perspective article (Yang ) shows, limiting the GRM to a subset of SNPs can result in insufficient correction for population stratification, leading to significantly inflated statistics and false positive associations (Table 1, Table 2, Supporting Information, Figure S2, Figure S3, Figure S4, and File S1).

Table 1

Extent of null statistic inflation as measured by λ [median Wald statistic on test null SNPs divided by the theoretical median under the null distribution (Devlin and Roeder 1999)]

Mean λ_GC (SE)	Pop. strat., P = 0.05	Pop. strat., P = 0.005	P = 0.05	P = 0.005
Linear regression	3.8 (0.4)	4.5 (0.5)	1.01 (0.01)	1.01 (0.01)
Linear regression with PCs	1.02 (0.01)	1.03 (0.01)	1.01 (0.01)	1.02 (0.01)
LMM	1.01 (0.01)	1.02 (0.01)	1.01 (0.01)	1.01 (0.01)
FaST-LMM Select	1.04 (0.01)	1.26 (0.03)	1.01 (0.01)	0.99 (0.01)
PC-Select	1.01 (0.01)	1.01 (0.01)	1.01 (0.01)	0.99 (0.01)

Table 2

Extent of null statistic inflation measured by λ

Mean λ_GC (SE)	Pop. strat., P = 0.05	Pop. strat., P = 0.005	P = 0.05	P = 0.005
Linear regression	1.58 (0.02)	1.55 (0.02)	1.03 (0.01)	1.04 (0.01)
Linear regression with PCs	1.01 (0.01)	1.00 (0.01)	1.01 (0.01)	1.02 (0.01)
LMM	1.02 (0.01)	1.01 (0.01)	1.00 (0.01)	1.02 (0.01)
FaST-LMM Select	1.02 (0.01)	1.06 (0.01)	1.00 (0.01)	1.02 (0.01)
PC-Select	1.01 (0.01)	1.01 (0.01)	1.00 (0.01)	1.01 (0.01)

We tabulate λ for linear regression, linear regression with PCs, standard LMM, FaST-LMM Select, and PC-Select on real genotypes and simulated phenotypes with and without population stratification as the fraction of causal SNPs (P = 0.05, 0.005) varies. Values shown are mean λ over 200 simulations with standard errors (SE) in parentheses. FaST-LMM Select inflates statistics in the presence of population stratification when few SNPs are causal (P = 0.005), which may result in false positives.

We tabulate λ for linear regression, linear regression with PCs, standard LMM, FaST-LMM Select, and PC-Select on simulated genotypes and phenotypes with and without population stratification as the fraction of causal SNPs (P = 0.05, 0.005) varies. Values shown are mean λ over 100 simulations with standard errors (SE) in parentheses. FaST-LMM Select inflates statistics in the presence of population stratification when few SNPs are causal (P = 0.005), which may result in false positives. Pop. strat., population stratification. We tabulate λ for linear regression, linear regression with PCs, standard LMM, FaST-LMM Select, and PC-Select on real genotypes and simulated phenotypes with and without population stratification as the fraction of causal SNPs (P = 0.05, 0.005) varies. Values shown are mean λ over 200 simulations with standard errors (SE) in parentheses. FaST-LMM Select inflates statistics in the presence of population stratification when few SNPs are causal (P = 0.005), which may result in false positives. As a solution to this problem, we propose PC-Select, a novel hybrid approach that includes the principal components (PCs) of the genotype matrix as fixed effects in FaST-LMM Select. PC-Select leverages the advantages of the FaST-LMM Select framework while correcting for population stratification. The two main steps of FaST-LMM Select are ranking SNPs by linear regression P-values to form the GRM with the top-ranked SNPs and then calculating association statistics in a mixed-model framework, using this GRM. We used the top five PCs as fixed effects in both of these steps (see Materials and Methods). [We follow the recommendations in the literature (Price ) and use a fixed number of PCs. We have found that five PCs are generally sufficient to correct for stratification in simulated and real data sets. Alternatively, the number of PCs may be selected through cross-validation or Tracy–Widom statistics (Patterson ).] As a result, PC-Select yields noninflated test statistics in the presence of population stratification and maintains high power to detect causal SNPs. Specifically, to examine inflation and power, we followed the simulation procedure in Yang and generated data sets each containing 10,000 SNPs for 1000 individuals. To avoid a loss in power for LMM that can occur when candidate SNPs are included in the GRM (Listgarten ; Yang ), we separately simulated a set of candidate SNPs to compute test statistics. We sampled individuals from two populations with Fst = 0.05, ancestral minor allele frequencies uniform in [0.1, 0.5], and mean phenotypic difference 0.25 SD. To simulate causal SNPs in the GRM, we selected a fraction P = 0.05 or 0.005 of the SNPs at random and sampled Gaussian effect sizes (variance equal to 0.5 divided by the number of casual SNPs in the GRM) for these SNPs. We generated 500 candidate test null SNPs that were not causal, and to measure inflation we calculated λ, the median Wald statistic on these SNPs divided by the theoretical median under the null distribution (Devlin and Roeder 1999). To investigate power, we generated 50 causal candidate SNPs with normally distributed effect sizes (variance equal to 0.5 divided by the number of causal candidate SNPs) and measured mean Wald statistic on these SNPs. We split the variability from causal SNPs evenly between the GRM and the causal candidate SNPs. We repeated all simulations 100 times and report the mean and standard error. We found that when few SNPs were causal (P = 0.005), FaST-LMM Select inflated null statistics in the presence of population stratification (λ = 1.26 ± 0.03), whereas PC-Select was properly calibrated (λ = 1.01 ± 0.01) (Table 1). Moreover, FaST-LMM Select lost power in the presence of population stratification (measured by the mean Wald statistic on causal SNPs: 14.3 ± 0.2 with stratification vs. 16.4 ± 0.1 without), whereas PC-Select’s power in simulations with and without population stratification was not significantly different (16.3 ± 0.1 vs. 16.3 ± 0.1) (Figure 1). Thus, even though PC-Select corrected for stratification, this advantage did not come at the expense of power. This gain is likely because the PCs reduce noise in selecting subsets of SNPs for the GRM in the presence of population stratification. In addition, PC-Select chose fewer SNPs than FaST-LMM Select to include in the GRM (over 100 simulations, mean SNPs chosen: ∼20 vs. ∼240, Figure S1), yielding potential computational savings. When many SNPs were causal (P = 0.05), both methods used nearly all SNPs in the GRM (over 100 simulations, mean SNPs chosen: ∼9400 and ∼8800 of 10,000, respectively), achieving similar performance to standard LMM.

Figure 1

(A and B) Comparison of power for linear regression, linear regression with PCs, standard LMM, FaST-LMM Select, and PC-Select on simulated genotypes and phenotypes (A) and real genotypes and simulated phenotypes (B) with and without population stratification as the fraction of casual SNPs (P = 0.05, 0.005) varies. To measure power, we plot the mean Wald statistic on test causal SNPs. In all cases, PC-Select has the highest power of the methods that do not inflate statistics. We also investigated a recent extension of FaST-LMM Select, the genard method (Hoffman 2013) that fits a data-adaptive low-rank GRM; however, we found that it did not have increased power over LMM in our simulations (Figure S5), which is consistent with previous simulations in a similar context (Hoffman 2013). Next, we evaluated inflation and power on real genotypes with simulated phenotypes in a similar manner. We analyzed 5000 individuals randomly subsampled from a multiple-sclerosis (MS) study genotyped on Illumina arrays (Sawcer ) made available via Wellcome Trust Case Control Consortium 2 (WTCCC2) (see Materials and Methods). As before, we separated GRM SNPs and candidate SNPs to avoid proximal contamination and provide a fair comparison of methods. We randomly sampled 50,000 SNPs for the GRM from chromosomes 3 to 22, 250 causal SNPs from chromosome 1, and 500 null SNPs from chromosome 2. To simulate environmental variance aligned with population structure, we added 0.25 times the first PC (after the PC had been normalized to variance 1) to each individual’s phenotype. Otherwise, we generated phenotypes as before and report simulations over 200 randomly generated phenotypes. We again found that when few SNPs were causal (P = 0.005), FaST-LMM Select inflated null statistics in the presence of population stratification (λ = 1.06 ± 0.01), whereas PC-Select was properly calibrated (λ = 1.01 ± 0.01) (Table 2). Moreover, FaST-LMM Select lost power in the presence of population stratification (measured by the mean Wald statistic on causal SNPs: 14.64 ± 0.05 with stratification vs. 16.02 ± 0.05 without); in contrast, PC-Select’s power in simulations with and without population stratification was not significantly different (16.02 ± 0.05 vs. 16.08 ± 0.05) (Figure 1). In all of our simulations, PC-Select produced noninflated statistics and high power. Finally, we analyzed data from 10,204 MS cases and 5429 controls genotyped on Illumina arrays (Sawcer ) made available via WTCCC2 (see Materials and Methods). The cases and controls were not matched for ancestry and thus exhibited substantial population stratification. Evaluated over all SNPs, PC-Select had λ = 1.24 and FaST-LMM Select had λ = 1.20. Due to polygenicity, we expect λ on all markers to be >1. On the same data, Yang report λ = 1.23 and 1.20 for linear regression with PCs and LMM, respectively, which they show is consistent with polygenicity. To evaluate power, we considered Wald statistics at 75 known associated SNPs (see Materials and Methods and Table S1 for Wald statistics). PC-Select consistently gave larger Wald statistics than FaST-LMM Select (63 of 75 markers; P = 2 × 10−9, mean Wald statistic 12.07 vs. 11.30). Based on cross-validation, both PC-Select and FaST-LMM Select chose to use all markers. This may indicate that the disease is not caused by a small number of loci with large effects or that our sample size is too small to capture this effect. Although PC-Select and FaST-LMM Select chose to use all SNPs and thus neither method inflated statistics, we emphasize that without a priori knowledge about the genetic architecture, PC-Select automatically tunes the number of SNPs to include in the GRM to optimize power and simultaneously protects against population stratification at no cost to power. Janss caution against using PCs as fixed effects in combination with a random effect derived from the GRM when estimating heritability. This may result in an ill-posed model because the PCs enter both as fixed effects and implicitly through the random effect. We avoid this issue when estimating variance components by using the PCs as fixed effects in a restricted maximum-likelihood (REML) approach, which projects the genotype matrix into a subspace orthogonal to the PCs, effectively removing them from the random effect. We also note that population structure and PCs have previously been used successfully as fixed effects (or separate random effects) in mixed-model settings to address confounding from population structure and from unusually differentiated markers (Yu ; Zhao ; Price , 2013; Sul and Eskin 2013). Using PCs in a linear model does not correct for family relatedness and cryptic relatedness (Price ). As suggested by Yang , due to the large length of segments shared identical-by-descent, using a subset of SNPs may correct for cryptic relatedness. Listgarten show that using a subset of SNPs in the GRM does not inflate statistics on the WTCCC data, where inflation is likely primarily due to cryptic relatedness. We expect that PC-Select will not be inflated by cryptic relatedness for the same reasons. In most human data sets with unrelated individuals, family relatedness is not an issue; however, for data sets with strong family relatedness, we suspect there may be cases where both PC-Select and FaST-LMM Select inflate statistics. PC-Select has the same asymptotic runtime as FaST-LMM Select, quadratic in the number of individuals and linear in the number of markers. In practice, the runtime for the additional step of computing the PCs for the genotype matrix is minimal because both methods require several spectral decompositions of matrices of nearly the same size for the cross-validation step. It should be noted that while the asymptotic runtime of PC-Select and FaST-LMM Select is the same as that of previously published exact LMM methods (Lippert ; Zhou and Stephens 2012), the actual runtime of both methods is ostensibly longer by a factor of 10 due to the cross-validation step. The cross-validation step is parallelizable, so in practice this is not a significant limitation. Including PCs as fixed effects allows PC-Select to infer ancestry from all SNPs simultaneously, while at the same time maintaining the benefits of using a statistically chosen subset of the SNPs to estimate the GRM (Listgarten ; Lippert ). As we have shown, using a combination of PCs and a subset of SNPs in the GRM gives the best of both worlds.

Materials and Methods

MS data set

We analyzed data from 10,204 MS cases and 5429 controls [the National Blood Service (NBS) and the 1958 Birth Cohort (1958BC)] genotyped on Illumina arrays made available to researchers via WTCCC2 (http://wtccc.org.uk/ccc2/). We follow the quality-control standards in Yang . Although Sawcer analyzed United Kingdom (UK) and non-UK samples separately followed by meta-analysis in most of their analyses, the data made available to researchers include both UK and non-UK cases but only UK controls. We retained all samples to maximize sample size. We considered markers that were present in each of MS, NBS, and 1958BC data sets and removed markers with >0.5% missing data, P < 0.01 for allele-frequency difference between NBS and 1958BC, P < 0.05 for deviation from Hardy–Weinberg equilibrium, P < 0.05 for differential missingness between cases and controls, or minor allele frequency <0.1% in any data set, leaving 360,557 markers. The 75 known associated markers were defined by including, for each MS-associated marker listed in the National Human Genome Research Institute (NHGRI) GWAS catalog (http://genome.gov/gwastudies/), a single best tag at r2 > 0.4 from the set of 360,557 markers if available.

Statistical methods

PC-Select follows a similar framework to that of FaST-LMM Select (Lippert , 2013; Listgarten ). For completeness, we list the steps and equations we used. First, we describe a method for computing association statistics, and then in subsequent sections we describe the steps of PC-Select.

Association statistics:

The phenotype y, covariates X, and genotypes W are mean centered. Additionally, each genotype is divided by where is the estimated minor allele frequency. Then the phenotype is modeled aswhere α is a vector of weights for the covariates, and K is the GRM. This model naturally leads to an association statistic based on the Wald statistic. To calculate the association statistic for SNP w, we add w as a fixed-effect covariate to the previous model and test whether its coefficient is significantly different from 0. Specifically, consider the modelwhere β is the coefficient for the test SNP. We estimate and by REML. The fixed-effect coefficients (β, α) are estimated by maximum likelihood. It is straightforward to construct the Wald statistic to test whether β ≠ 0. Let and Q = [w; X]. Then is equal to the first entry of (Q−1Q)−1Q−1y and is equal to the first entry of (Q−1Q)−1. The test statistic iswhich is asymptotically χ2 distributed with 1 d.f.

PC-Select:

Now we describe the PC-Select method: Step 1: Extracting PCs: We extract the top five PCs from a GRM formed using all of the genotype data, WW, to use as fixed-effect covariates. We use X to denote the matrix of user-specified covariates and the top five PCs. Step 2: Ranking SNPs by linear regression: Second, we rank the SNPs by a linear regression test statistic. Linear regression test statistics are calculated by fixing to 0 and using the procedure described above to calculate Wald statistics. Step 3: Determining the GRM: As in FaST-LMM Select, PC-Select uses a subset of the SNPs that are likely to be causal. In this step, we determine k, the number of top SNPs (as ranked in Step 2) to include in the GRM. We use 10-fold cross-validation on predictive log-likelihood to choose the number of top SNPs. We choose k from a list of user-defined possibilities (e.g., k ∈ {100, 1000, 3000, 10,000, 30,000, …}). First, we randomly divide individuals into 10 equal groups or folds. For each fold i, we form a test set from the individuals in fold i and use the rest of the individuals as a training set. For each choice of k, we consider a subset of the genotype matrix consisting only of the top k SNPs (the ranking of the SNPs is recomputed per fold, using the training data). For notational simplicity, we also refer to the reduced genotype matrix by W, and it will be clear from context if this refers to the full genotype matrix or a subset. Let W denote the genotypes from fold i and W− represent the genotypes from the rest of the folds (similarly for y and X). We wish to evaluate the predictive log-likelihood of y given the training information (y−, X−, X) to assess the predictive power of using only the top k SNPs in the GRM. Specifically, to evaluate the predictive log-likelihood, we start by forming a GRM from the training set Then we estimate and from the training set by REML. We estimate α by ML with these variance parameters fixed. Then under the modelwhere and the predictive distribution of the phenotypes given the training parameters, is normally distributed with meanand covarianceThis can be evaluated efficiently, using the spectral decompositions computed in the REML step (Lippert ; Listgarten ). We average the predictive log-likelihood over each of the 10 folds and choose the k that gives the highest average log-likelihood. Calculating association statistics: Finally, with the number of top SNPs to use in the GRM fixed, we calculate association statistics for each SNP. Let W be the genotype matrix using the top k SNPs chosen in the previous step. To avoid proximal contamination (Listgarten ), we use a leave-one-chromosome-out procedure (Yang ). For each test SNP w (which is not necessarily in W), we exclude the chromosome including that SNP from the GRM and calculate the Wald statistic for w with this GRM. We do this efficiently by precomputing and storing the GRM, excluding each chromosome in turn.

20 in total

1. Improved linear mixed models for genome-wide association studies.

Authors: Jennifer Listgarten; Christoph Lippert; Carl M Kadie; Robert I Davidson; Eleazar Eskin; David Heckerman
Journal: Nat Methods Date: 2012-05-30 Impact factor: 28.547

2. Variance component model to account for sample structure in genome-wide association studies.

Authors: Hyun Min Kang; Jae Hoon Sul; Susan K Service; Noah A Zaitlen; Sit-Yee Kong; Nelson B Freimer; Chiara Sabatti; Eleazar Eskin
Journal: Nat Genet Date: 2010-03-07 Impact factor: 38.330

3. Rapid variance components-based method for whole-genome association analysis.

Authors: Gulnara R Svishcheva; Tatiana I Axenovich; Nadezhda M Belonogova; Cornelia M van Duijn; Yurii S Aulchenko
Journal: Nat Genet Date: 2012-09-16 Impact factor: 38.330

4. Inferences from genomic models in stratified populations.

Authors: Luc Janss; Gustavo de Los Campos; Nuala Sheehan; Daniel Sorensen
Journal: Genetics Date: 2012-07-18 Impact factor: 4.562

5. Response to Sul and Eskin.

Authors: Alkes L Price; Noah A Zaitlen; David Reich; Nick Patterson
Journal: Nat Rev Genet Date: 2013-02-26 Impact factor: 53.242

Review 6. New approaches to population stratification in genome-wide association studies.

Authors: Alkes L Price; Noah A Zaitlen; David Reich; Nick Patterson
Journal: Nat Rev Genet Date: 2010-07 Impact factor: 53.242

7. Advantages and pitfalls in the application of mixed-model association methods.

Authors: Jian Yang; Noah A Zaitlen; Michael E Goddard; Peter M Visscher; Alkes L Price
Journal: Nat Genet Date: 2014-02 Impact factor: 38.330

8. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics.

Authors: Christoph Lippert; Gerald Quon; Eun Yong Kang; Carl M Kadie; Jennifer Listgarten; David Heckerman
Journal: Sci Rep Date: 2013 Impact factor: 4.379

9. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis.

Authors: Stephen Sawcer; Garrett Hellenthal; Matti Pirinen; Chris C A Spencer; Nikolaos A Patsopoulos; Loukas Moutsianas; Alexander Dilthey; Zhan Su; Colin Freeman; Sarah E Hunt; Sarah Edkins; Emma Gray; David R Booth; Simon C Potter; An Goris; Gavin Band; Annette Bang Oturai; Amy Strange; Janna Saarela; Céline Bellenguez; Bertrand Fontaine; Matthew Gillman; Bernhard Hemmer; Rhian Gwilliam; Frauke Zipp; Alagurevathi Jayakumar; Roland Martin; Stephen Leslie; Stanley Hawkins; Eleni Giannoulatou; Sandra D'alfonso; Hannah Blackburn; Filippo Martinelli Boneschi; Jennifer Liddle; Hanne F Harbo; Marc L Perez; Anne Spurkland; Matthew J Waller; Marcin P Mycko; Michelle Ricketts; Manuel Comabella; Naomi Hammond; Ingrid Kockum; Owen T McCann; Maria Ban; Pamela Whittaker; Anu Kemppinen; Paul Weston; Clive Hawkins; Sara Widaa; John Zajicek; Serge Dronov; Neil Robertson; Suzannah J Bumpstead; Lisa F Barcellos; Rathi Ravindrarajah; Roby Abraham; Lars Alfredsson; Kristin Ardlie; Cristin Aubin; Amie Baker; Katharine Baker; Sergio E Baranzini; Laura Bergamaschi; Roberto Bergamaschi; Allan Bernstein; Achim Berthele; Mike Boggild; Jonathan P Bradfield; David Brassat; Simon A Broadley; Dorothea Buck; Helmut Butzkueven; Ruggero Capra; William M Carroll; Paola Cavalla; Elisabeth G Celius; Sabine Cepok; Rosetta Chiavacci; Françoise Clerget-Darpoux; Katleen Clysters; Giancarlo Comi; Mark Cossburn; Isabelle Cournu-Rebeix; Mathew B Cox; Wendy Cozen; Bruce A C Cree; Anne H Cross; Daniele Cusi; Mark J Daly; Emma Davis; Paul I W de Bakker; Marc Debouverie; Marie Beatrice D'hooghe; Katherine Dixon; Rita Dobosi; Bénédicte Dubois; David Ellinghaus; Irina Elovaara; Federica Esposito; Claire Fontenille; Simon Foote; Andre Franke; Daniela Galimberti; Angelo Ghezzi; Joseph Glessner; Refujia Gomez; Olivier Gout; Colin Graham; Struan F A Grant; Franca Rosa Guerini; Hakon Hakonarson; Per Hall; Anders Hamsten; Hans-Peter Hartung; Rob N Heard; Simon Heath; Jeremy Hobart; Muna Hoshi; Carmen Infante-Duarte; Gillian Ingram; Wendy Ingram; Talat Islam; Maja Jagodic; Michael Kabesch; Allan G Kermode; Trevor J Kilpatrick; Cecilia Kim; Norman Klopp; Keijo Koivisto; Malin Larsson; Mark Lathrop; Jeannette S Lechner-Scott; Maurizio A Leone; Virpi Leppä; Ulrika Liljedahl; Izaura Lima Bomfim; Robin R Lincoln; Jenny Link; Jianjun Liu; Aslaug R Lorentzen; Sara Lupoli; Fabio Macciardi; Thomas Mack; Mark Marriott; Vittorio Martinelli; Deborah Mason; Jacob L McCauley; Frank Mentch; Inger-Lise Mero; Tania Mihalova; Xavier Montalban; John Mottershead; Kjell-Morten Myhr; Paola Naldi; William Ollier; Alison Page; Aarno Palotie; Jean Pelletier; Laura Piccio; Trevor Pickersgill; Fredrik Piehl; Susan Pobywajlo; Hong L Quach; Patricia P Ramsay; Mauri Reunanen; Richard Reynolds; John D Rioux; Mariaemma Rodegher; Sabine Roesner; Justin P Rubio; Ina-Maria Rückert; Marco Salvetti; Erika Salvi; Adam Santaniello; Catherine A Schaefer; Stefan Schreiber; Christian Schulze; Rodney J Scott; Finn Sellebjerg; Krzysztof W Selmaj; David Sexton; Ling Shen; Brigid Simms-Acuna; Sheila Skidmore; Patrick M A Sleiman; Cathrine Smestad; Per Soelberg Sørensen; Helle Bach Søndergaard; Jim Stankovich; Richard C Strange; Anna-Maija Sulonen; Emilie Sundqvist; Ann-Christine Syvänen; Francesca Taddeo; Bruce Taylor; Jenefer M Blackwell; Pentti Tienari; Elvira Bramon; Ayman Tourbah; Matthew A Brown; Ewa Tronczynska; Juan P Casas; Niall Tubridy; Aiden Corvin; Jane Vickery; Janusz Jankowski; Pablo Villoslada; Hugh S Markus; Kai Wang; Christopher G Mathew; James Wason; Colin N A Palmer; H-Erich Wichmann; Robert Plomin; Ernest Willoughby; Anna Rautanen; Juliane Winkelmann; Michael Wittig; Richard C Trembath; Jacqueline Yaouanq; Ananth C Viswanathan; Haitao Zhang; Nicholas W Wood; Rebecca Zuvich; Panos Deloukas; Cordelia Langford; Audrey Duncanson; Jorge R Oksenberg; Margaret A Pericak-Vance; Jonathan L Haines; Tomas Olsson; Jan Hillert; Adrian J Ivinson; Philip L De Jager; Leena Peltonen; Graeme J Stewart; David A Hafler; Stephen L Hauser; Gil McVean; Peter Donnelly; Alastair Compston
Journal: Nature Date: 2011-08-10 Impact factor: 49.962

10. Population structure and eigenanalysis.

Authors: Nick Patterson; Alkes L Price; David Reich
Journal: PLoS Genet Date: 2006-12 Impact factor: 5.917

23 in total

1. Evaluation of population stratification adjustment using genome-wide or exonic variants.

Authors: Yuning Chen; Gina M Peloso; Ching-Ti Liu; Anita L DeStefano; Josée Dupuis
Journal: Genet Epidemiol Date: 2020-06-30 Impact factor: 2.135

2. Evaluation of multi-locus models for genome-wide association studies: a case study in sugar beet.

Authors: T Würschum; T Kraft
Journal: Heredity (Edinb) Date: 2014-10-29 Impact factor: 3.821

3. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models.

Authors: Han Chen; Chaolong Wang; Matthew P Conomos; Adrienne M Stilp; Zilin Li; Tamar Sofer; Adam A Szpiro; Wei Chen; John M Brehm; Juan C Celedón; Susan Redline; George J Papanicolaou; Timothy A Thornton; Cathy C Laurie; Kenneth Rice; Xihong Lin
Journal: Am J Hum Genet Date: 2016-03-24 Impact factor: 11.025

4. Power considerations for λ inflation factor in meta-analyses of genome-wide association studies.

Authors: Georgios Georgiopoulos; Evangelos Evangelou
Journal: Genet Res (Camb) Date: 2016-05-19 Impact factor: 1.588

5. A genome-wide association study of growth and fatness traits in two pig populations with different genetic backgrounds.

Authors: Y Jiang; S Tang; C Wang; Y Wang; Y Qin; Y Wang; J Zhang; H Song; S Mi; F Yu; W Xiao; Q Zhang; X Ding
Journal: J Anim Sci Date: 2018-04-03 Impact factor: 3.159

6. Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?

Authors: Yiwei Zhang; Wei Pan
Journal: Genet Epidemiol Date: 2014-12-23 Impact factor: 2.135

Review 7. Genome-Wide Association Study Statistical Models: A Review.

Authors: Mohsen Yoosefzadeh-Najafabadi; Milad Eskandari; François Belzile; Davoud Torkamaneh
Journal: Methods Mol Biol Date: 2022

8. Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations.

Authors: Sean Simmons; Cenk Sahinalp; Bonnie Berger
Journal: Cell Syst Date: 2016-07-21 Impact factor: 10.304

9. Genetic Association of Attention-Deficit/Hyperactivity Disorder and Major Depression With Suicidal Ideation and Attempts in Children: The Adolescent Brain Cognitive Development Study.

Authors: Phil H Lee; Alysa E Doyle; Xuyang Li; Micah Silberstein; Jae-Yoon Jung; Randy L Gollub; Andrew A Nierenberg; Richard T Liu; Ronald C Kessler; Roy H Perlis; Maurizio Fava
Journal: Biol Psychiatry Date: 2021-12-22 Impact factor: 12.810

10. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection.

Authors: Antonio F Pardiñas; Peter Holmans; Andrew J Pocklington; Valentina Escott-Price; Stephan Ripke; Noa Carrera; Sophie E Legge; Sophie Bishop; Darren Cameron; Marian L Hamshere; Jun Han; Leon Hubbard; Amy Lynham; Kiran Mantripragada; Elliott Rees; James H MacCabe; Steven A McCarroll; Bernhard T Baune; Gerome Breen; Enda M Byrne; Udo Dannlowski; Thalia C Eley; Caroline Hayward; Nicholas G Martin; Andrew M McIntosh; Robert Plomin; David J Porteous; Naomi R Wray; Armando Caballero; Daniel H Geschwind; Laura M Huckins; Douglas M Ruderfer; Enrique Santiago; Pamela Sklar; Eli A Stahl; Hyejung Won; Esben Agerbo; Thomas D Als; Ole A Andreassen; Marie Bækvad-Hansen; Preben Bo Mortensen; Carsten Bøcker Pedersen; Anders D Børglum; Jonas Bybjerg-Grauholm; Srdjan Djurovic; Naser Durmishi; Marianne Giørtz Pedersen; Vera Golimbet; Jakob Grove; David M Hougaard; Manuel Mattheisen; Espen Molden; Ole Mors; Merete Nordentoft; Milica Pejovic-Milovancevic; Engilbert Sigurdsson; Teimuraz Silagadze; Christine Søholm Hansen; Kari Stefansson; Hreinn Stefansson; Stacy Steinberg; Sarah Tosato; Thomas Werge; David A Collier; Dan Rujescu; George Kirov; Michael J Owen; Michael C O'Donovan; James T R Walters
Journal: Nat Genet Date: 2018-02-26 Impact factor: 38.330