Literature DB >> 35430887

Population differentiation of polygenic score predictions under stabilizing selection.

Abstract

Given the many small-effect loci uncovered by genome-wide association studies (GWAS), polygenic scores have become central to genomic medicine, and have found application in diverse settings including evolutionary studies of adaptation. Despite their promise, polygenic scores have been found to suffer from limited portability across human populations. This at first seems in conflict with the observation that most common genetic variation is shared among populations. We investigate one potential cause of this discrepancy: stabilizing selection on complex traits. Counterintuitively, while stabilizing selection constrains phenotypic evolution, it accelerates the loss and fixation of alleles underlying trait variation within populations (GWAS loci). Thus even when populations share an optimum phenotype, stabilizing selection erodes the variance contributed by their shared GWAS loci, such that predictions from GWAS in one population explain less of the phenotypic variation in another. We develop theory to quantify how stabilizing selection is expected to reduce the prediction accuracy of polygenic scores in populations not represented in GWAS samples. In addition, we find that polygenic scores can substantially overstate average genetic differences of phenotypes among populations. We emphasize stabilizing selection around a common optimum as a useful null model to connect patterns of allele frequency and polygenic score differentiation. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.

Entities: Chemical

Keywords: Lewontin 1972; genome-wide association studies; polygenic scores; portability; stabilizing selection

Mesh：

Year: 2022 PMID： 35430887 PMCID： PMC9014188 DOI： 10.1098/rstb.2020.0416

Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN： 0962-8436 Impact factor: 6.671

Introduction

Lewontin’s foundational early work found that most common genetic variation is shared among human populations [1]. Thus, it strongly refuted the view that genetic variation is partitioned among mostly invariable populations and has become a classic work discrediting discrete human races. Lewontin’s finding is reflected by low estimates of FST among populations [2], i.e. only a small proportion (approx. 10%) of the total allelic variance is attributable to differences in frequency between populations (a finding that has been replicated for many different marker types genome-wide; [2-7]). While some loci that are highly differentiated among human populations have been uncovered (e.g. underlying loci in skin pigmentation and infectious disease immunity; see [8] for a review), there are relatively few such strongly selected loci [9,10]. Lewontin’s results also have implications for our a priori expectations of the partitioning of phenotypic variation within and among human populations. For phenotypes evolving neutrally in diploids, we expect the proportion of additive genetic variance attributable to among-population differences to be ≈2 FST when all individuals are measured in a common set of environments [11-16]. Thus in humans, we expect only approximately 18% of additive genetic variance to be due to among-population differences when measured in a common set of environments. Compared to this null, phenotypes subject to divergent selection among populations are expected to be over-dispersed, while phenotypes subject to stabilizing selection with the same selective optimum are expected to be less differentiated among populations. To distinguish among potential contributors to genetic differences among populations, researchers often turn to settings like common gardens in an effort to eliminate among-population environmental variation. However, it is not feasible to measure human phenotypes in a common environment (a major drawback of studies that investigated population-level phenotypic variation; [17,18]). Thus for the majority of complex traits, we do not know the role of genetics, let alone natural selection, in explaining phenotypic differences among human populations. These questions have received renewed interest in human genetics due to genome-wide association studies (GWAS), which have found common variation associated with many phenotypes within populations. GWAS have revealed that most phenotypes are highly polygenic within populations [19-21] and confirmed that much of the genetic variance is additive (reviewed in [22]). These observations have motivated phenotype prediction through additive genetic values, the additive contribution of polymorphisms to phenotypic differences among individuals. One common approach to predict an individual’s additive genetic value is based on a polygenic score, the sum across trait-associated loci of genotypes weighted by their estimated effects. Predictions based on polygenic scores are being explored in a number of clinical settings and more generally as a tool for understanding the genetic basis of disease and phenotypic variability. However, the generalizability of polygenic scores across populations is concerning because GWAS samples are strongly biased towards European populations and studies of other populations are much smaller [23-26]. There is wide agreement that these portability issues must be addressed so that the future clinical use of polygenic scores does not further compound inequalities in healthcare [23,24,26]. Currently, for most complex traits, polygenic scores poorly predict additive genetic values, and therefore phenotypes. This issue stands even in samples closely related to the GWAS population, because polygenic scores aggregate across many loci with slightly mis-estimated effects. These problems increase as we move to populations that are genetically and environmentally more distant from the GWAS populations. For one, the associated loci are usually not the causal loci underlying trait variation; instead they tag the effects of linked causal sites. The interpretation of the effect size of an associated variant can be tricky because of: (i) linkage disequilibrium (LD), whereby it absorbs the effects at correlated causal sites [27-31]; (ii) population stratification, whereby it absorbs the effects of covarying environments [32-36]; and (iii) gene-by-environment (GxE) or gene-by-gene (GxG) interactions, whereby its estimate is averaged over the interacting environmental contexts or genetic backgrounds in the sample [28,37-47]. As all of these factors can and will vary across populations, the effect sizes of alleles will differ among them and so polygenic scores will have lower prediction accuracy, i.e. imperfect portability, across populations. Even with perfectly estimated effects at the causal loci with significant trait associations, the prediction accuracy of polygenic scores will be limited because GWAS only identify loci with common alleles that contribute enough variance to exceed some significance threshold determined by the sample size. Therefore, an allele that is rare in the GWAS sample but common elsewhere will not be discovered. This would lead to a greater reduction in the phenotypic variance accounted for, or prediction accuracy, in populations not represented in the GWAS sample (hereafter ‘unrepresented populations’; [30,40,44,48-52]). Indeed, many variants contributing to trait variation in European GWAS samples are not at a high enough frequency to be detected in other populations, suggesting that different sets of polymorphisms contribute to the trait variance in different populations [53,54]. Genetic differentiation likely contributes to the reduction in the prediction accuracy of polygenic scores in unrepresented populations, as groups with increasing genetic distance from GWAS samples experience a greater loss in prediction accuracy [55-58]. The factors that reduce the utility of polygenic scores for individual-level prediction may also complicate the interpretation of average polygenic score differences across populations. Such issues arise in studies of adaptation that use polygenic scores to assess the contribution of selection to the genetic basis of phenotypic differentiation among human populations. An early application of this approach found polygenic signals of selection on height within Europe [59,60], where polygenic scores were over-differentiated among populations compared to the neutral prediction based on FST. Importantly, the null distribution of this test of neutrality at the level of the phenotype does not rely on accurate polygenic scores and so changes in LD, GxG and GxE should not cause false signals. However, the results are very sensitive to slight biases in estimated effect sizes due to population structure, and indeed the signal of polygenic selection on height turned out to be almost entirely due to stratification [61-63]. More generally, the imperfect portability of polygenic scores and the fact that the mean environmental contribution to phenotypes can vary greatly between populations raise concerns about over-interpreting differences in mean polygenic scores as genetic differences in the average phenotype among populations [60,64-67]. The consequence of genetic differentiation on the low generalizability of GWAS results for phenotype prediction may appear to contradict Lewontin’s observation of minor allele frequency (MAF) differences between populations. However, allele frequency differentiation for complex traits under selection can occur at a rate faster than drift. This would lead to noisy estimates of additive genetic values, and so reduced prediction accuracy of polygenic scores, for unrepresented populations. In order to understand the impact of this turnover on portability, we need models of allele frequency differentiation that are informed by plausible forms of natural selection on complex traits. Stabilizing selection with a constant fitness optimum is a sensible null model for the evolution of complex traits. Indeed, studies of its influence on allelic dynamics have set a conceptual foundation for interpreting and designing GWAS within populations [68]. Under stabilizing selection, intermediate trait values have the highest fitness, with decreasing fitness with distance from that optimum (figure 1a). Many quantitative traits have been shown to experience stabilizing selection in humans (e.g. [69]) as well as across many other species (e.g. [70-72]). Stabilizing selection also contributes to the lack of variation in morphological traits within and between closely related species and the morphological constancy of traits in the fossil record [73-77].

Figure 1

Stabilizing selection to a constant phenotypic optimum constrains the evolution of mean phenotypes through underdominant selection at the locus-level. (a) Gaussian fitness function for phenotypic stabilizing selection (teal) and resulting density of simulated phenotypes in the population (coral). Lines correspond to the intermediate strength of stabilizing selection that we simulated (w = 5, where w determines the width of the fitness peak; see §2). Note how stabilizing selection keeps the genetic variance of the population (V) small compared to the width of the selection peak (w). (b) Change in mean phenotype from the ancestor (generation 0) under stabilizing selection (orange) and neutral phenotypic evolution (pink). Each line corresponds to a single simulation; here 10 simulations are shown for each strength of selection. (c) Deterministic allele frequency trajectories, assuming an infinite population size, based on the underdominant model, at a locus that contributes to the variance of a trait under stabilizing selection (here w = 5). The trajectory differs according to effect size and starting frequency. Note the symmetry for starting frequencies that are the same distance from their closest boundaries. Here, we show how stabilizing selection on complex traits reduces portability and increases the chance of false signals of directional polygenic adaptation, despite low overall genetic differentiation and no genetically based trait differentiation among populations. Under parameter ranges estimated from empirical studies, we combine simulations of stabilizing selection on complex traits with analytical models of its effect on genetic differentiation to investigate how stabilizing selection drives these results. We focus only on scenarios in which the trait optimum is shared among populations, leading to levels of trait differentiation among populations much smaller than neutrality. In our baseline scenario, stabilizing selection occurs on a single additive trait, where alleles have the same effect in different populations (reflecting no GxG or GxE interactions and no pleiotropy; we relax these assumptions later). We study differentiation between a pair of populations, either an ancestral and descendant population or a pair of contemporary populations, in which the results of a GWAS in one of those populations is used to make trait predictions in the other. In doing so we provide a polygenic score perspective on earlier investigations into the relationship between population structure, stabilizing selection and quantitative trait variation (e.g. [78-83]). We show how these factors reduce the prediction accuracy of polygenic scores and can readily lead to patterns of polygenic score differentiation rife with the potential for misinterpretation.

Model background

An individual’s additive genetic value G is the sum of the additive effects of all alleles they carry, where a represents the additive effect of an allele relative to another at locus l and g is the number of copies of that allele carried by the individual. All of these loci denoted by l are polymorphic within a specified population from which the individual was drawn. Note that the additive genetic value does not represent an absolute measure of an individual’s phenotype, and instead represents the additive contribution of the polymorphisms they carry to their deviation from their population’s mean. Under a constant selective environment, stabilizing selection keeps the population mean phenotype close to the optimum and decreases the phenotypic variance in the population because individuals on both tails of the distribution have lower fitness (figure 1a,b; [12,84,85]). To understand the process by which stabilizing selection reduces the phenotypic variance, we focus on the additive genetic variance in which the first component refers to the additive genic variance (V) and the second component accounts for the contribution of linkage disequilibrium (LD) among loci that contribute to the variance. In the short term, stabilizing selection reduces the phenotypic variance by generating negative LD between like-effect alleles, thereby limiting extreme phenotypes (known as the Bulmer effect; [86]). The additive genetic variance quickly reaches an equilibrium reflecting a balance between selection producing negative LD, and recombination and chromosome segregation breaking up that generated LD [87]. The long-term genetic response to stabilizing selection is driven by a reduction in the additive genic variance, in particular the variance in genotypes at a locus (Var(g); the expected heterozygosity). Yet if the genetic basis of trait variation was truly infinitesimal, i.e. made up of loci of infinitely small effect, only genetic drift would erode variation in the long term. However, while the loci discovered by GWAS contribute very small effects, these effects are not infinitesimally small and so they can be directly acted on by selection [68,88]. Selection at the individual loci underlying trait variation is in many cases well approximated by underdominant selection, in which the more common allele fixes and the minor allele is lost (figure 2a; [84,89]). Owing to underdominant selection at the locus level, stabilizing selection removes polymorphisms at a faster rate than neutrality, with selection coefficients proportional to their squared effect sizes (see [68,88,90] for recent applications of such models to understand GWAS variation within populations). Meanwhile, under moderate strengths of stabilizing selection and a constant environment, the population mean phenotype stays very close to the optimum through rapid, small fluctuations at many individual loci.

Figure 2

The effect of stabilizing selection, distribution of effect sizes and GWAS ascertainment scheme on the portability of a GWAS to another population. (a) Reduction in heterozygosity at loci that contributed to the variance 1500 generations ago in a population with N = 10 000. Each open point represents the midpoint of the effect size bin of width 0.01, within which we averaged heterozygosity from 200 simulations. Each filled point represents our analytical predictions for that midpoint. (b) Reduction over time in the total variance contributed by polymorphisms in an ancestral population. Lighter lines show results from 200 simulations and darker lines show analytical predictions. The axis at the top shows the expected neutral FST between the ancestral and descendant population for the time scales of divergence shown on the bottom axis. The inset on the left shows the distribution from which mutation effect sizes are drawn. For the results shown we used a normal distribution with standard deviation of 0.1. The inset on the right shows the equilibrium density of variance contributed from each effect size. As the width of the fitness peak increases, and thus stabilizing selection weakens, mutations with large effects can drift to higher frequencies and contribute a greater proportion of the trait’s variance. (c) The reduction over time in the total variance contributed by polymorphisms in an ancestral population for three different mutation effect size distributions, holding the strength of phenotypic stabilizing selection constant (w = 5). The inset shows the three mutation effect size distributions: two normal (green and blue) and a mixture of normals to produce a heavy-tailed distribution (yellow). The blue lines correspond to the same distribution used to produce sub-figure (b), thus replicating the orange lines in sub-figure (b). Lighter lines show results from simulations (100 each for yellow and green) and darker lines show analytical predictions. Predictions in (b) and (c) are based on the diffusion approximation; see equations (A 17) and (A 18). We note that the full analysis of models of stabilizing selection is challenging because changes in allele frequencies and covariances must be tracked over many loci (e.g. [87,91]). Indeed, when we simulated physically linked loci with low recombination, we found that selection on an allele was weaker than we predicted due to the persistence of selection-generated LD between alleles with opposing effects (the Bulmer effect; electronic supplementary material, figure S1). To understand the long-term turnover in the genetic basis of trait variation, from here on we focus on the additive genic variance, V, under the assumption that loci contributing to the variance are physically unlinked. We begin by thinking of an ancestral population at equilibrium and the loss of ancestral phenotypic variance over time. When the population is at mutation-drift-selection equilibrium, stabilizing selection and genetic drift remove variation such that as time goes on less and less of the variance in the descendant population is contributed by ancestral polymorphisms. Instead, the variance contributed by new mutations that are private to the descendant population will increase, replenishing what was removed by drift and selection. We assume a model of Gaussian stabilizing selection where the width of the fitness peak is determined by w (w2 is equivalent to V in other models of stabilizing selection, e.g. [92]), which will approximate any symmetric quadratic stabilizing selection model when the mean trait is close to its optimum. We also assume thousands of unlinked loci contributing to trait variation and explore dynamics resulting from three different mutation effect size distributions (figure 2c); see appendix A, §(a) for simulation details. We can predict the reduction in additive genic variance contributed by ancestral variants (anc) in the descendant population (desc) after t generations for alleles with effect size a using a common approximation for the per-generation loss (e.g. [93]) where N is the population size, V is the total population trait variance and V ≪ w2, S = 2Na2/w2 is the population-scaled selection coefficient of the allele, and FST ≈ t/2N for neutral polymorphisms. On the right-hand side of equation (2.3a), the first bracketed term is the per-generation reduction due to drift and the second term is the reduction due to stabilizing selection. Looking at the exponent in equation (2.3c), we see that the decay of the variance (heterozygosity) contributed by alleles depends on FST, but will be increased for alleles whose effect sizes are large enough such that their population-scaled selection is appreciable (S > 1). Equations (2.3) offer good intuition; however, in the remainder of the main text we show results from a diffusion approximation that we developed that is better for our purposes (extending from [68], see appendix A, §(c)). The reduction in additive genic variance contributed by a particular polymorphism with effect size a is the same as the reduction in heterozygosity at that site. Selection causes a stronger reduction when the fitness peak is narrower (i.e. when w is smaller) and when the allele’s effect is larger (figure 2a; see electronic supplementary material, figure S2 for the approximation in equation (2.3a)). We can average the reduction in heterozygosity across all sites, weighting by the distribution of effect sizes and genic variance contributed by a given effect size, to predict the total remaining variance and thus total reduction in V over time (figure 2b; equation (A 17)). In the example shown in figure 2, this total reduction tends to be weaker than what we see for the largest effect polymorphisms, because most sites that contribute to the variance are of small effect. This is because (i) under our mutational distribution most alleles have small effects, and (ii) under stabilizing selection, the equilibrium distribution of observed effects is narrower than the distribution of mutation effects. The form of the decay in ancestral variance strongly depends on the distribution of mutation effects. Consider a case in which a higher proportion of introduced mutations are strongly selected due to their large effects (S > 10). Some of these alleles are still capable of drifting to intermediate frequencies, and so a higher proportion of the ancestral variance will be contributed by larger effect polymorphisms [68]. Therefore, stabilizing selection will cause a steeper reduction in the ancestral variance (figure 2c). Moreover, if most of the other mutations are nearly neutral, the early and steep decline in ancestral variance will be followed by a decline more consistent with neutrality (yellow lines in figure 2c; see [68] for a discussion of selection regimes). The extent to which ancestral variance is depleted determines the amount of shared additive genetic variance between diverging populations and thus the portability of polygenic scores.

Accuracy of polygenic score predictions

To understand the effect of stabilizing selection and drift on the prediction accuracy of polygenic scores in isolation from other sources of bias, we make the simplifying assumption that GWAS identify associations between polymorphisms and trait variation only at causal loci. For a GWAS within a population to identify a locus as being associated with the trait, the locus has to be polymorphic in that population and its phenotypic association has to achieve some level of statistical significance (i.e. contribute above some level of variance to the trait). For those causal loci with significant associations, we also assume that their effects are estimated perfectly and for the moment that these true effects do not vary within the sample, the population the sample was drawn from, or across populations (i.e. populations experience the same set of environments). We also assume that their effects are strictly additive. At the trait-associated loci, one can sum the additive effects of all alleles that an individual carries at a predefined set of markers to form a polygenic score. We are interested in the reduction in prediction accuracy for a population not represented in the GWAS sample, relative to the prediction accuracy of those represented populations. To explore this we consider the genetic differentiation between a pair of populations, A and B, in which the GWAS sample is drawn from population A but not population B. When the effects of alleles do not vary between populations, we can quantify the reduction in phenotype prediction accuracy (r2) of polygenic scores, compared to using additive genetic values, constructed for any population as where S is an individual’s polygenic score and G is their additive genetic value. Since an individual’s polygenic score is only part of their additive genetic value when a GWAS estimates true effects, this reduction can be understood as the proportion of the total additive genetic variance, or proportion of the heritability, explained by GWAS-significant sites (appendix A, §(b)). The ratio of in population B to population A quantifies the reduction in prediction accuracy due to a lack of GWAS representation for population B. If these populations experience the same selective environment, we expect the same Var(G) for each population, so this reduction due to a lack of representation would simply be the ratio of the variances explained by polygenic scores. Note that our definition of the prediction accuracy in population B is the squared correlation of the deviation of an individual’s polygenic score with its deviation in phenotype, where both of these deviations are with respect to the mean phenotype in population B. This definition matches typical polygenic score practices where predictions are statements about the departure of an individual from their ancestry group’s mean genetic value, rather than a prediction of their departure from the mean genetic value of the GWAS population. However, this definition of prediction accuracy does not include discrepancies from the evolution of mean polygenic scores and phenotypes among populations, meaning it does not account for a systematic shift in polygenic scores between populations. We turn to this point in §4.

Ascertainment of all causal loci in the GWAS sample

We begin with the simplified case in which all causal loci that are polymorphic in population A have been identified, such that polygenic scores equal the additive genetic values in population A. As polygenic scores from ancient DNA are being used to investigate the phenotypic diversity of ancient human populations [94], we first consider the prediction accuracy in a population ancestral to population A of a polygenic score constructed using variants found in population A. The reduction in polygenic score prediction accuracy for this ancestral population is approximately the reduction in variance contributed by that ancestral population to the present (figure 2b,c; electronic supplementary material, figure S4; see appendix A, §c(ii) for an explanation). While ancient individuals were likely not drawn from populations directly ancestral to present-day populations, we should observe the same general patterns with genetic differentiation between the ancient and present-day population (see also [95] for approximations of this decline). For the remainder of this article, we consider A and B to be contemporary populations; we assume for simplicity that they split from a common ancestral population without subsequent gene flow. In figure 3a, we use simulations and analytical predictions to show how the prediction accuracy in population B decreases with increasing time since its common ancestor with population A. The span of neutral genetic differentiation was chosen to reflect a scale along which various human populations could fall. We see slightly weaker reductions in prediction accuracy with time when the variance of the mutation effect size distribution is quartered and stronger reductions when the mutation effect size distribution has a heavy tail (figure 3b; electronic supplementary material, figure S5). All variance-contributing loci were ascertained in population A, so the only reason the full genic variance in population B was not captured is because private polymorphisms contribute to the phenotypic variance in each population. At the time of divergence between the pair of populations, they entirely share their genetic basis of trait variation. Then as stabilizing selection and drift remove polymorphisms (at equilibrium), new mutations replenish them at different sites in each population, leading to the same total variance but different genetic bases of that variance (assuming a very large mutational target). Without gene flow between the pair of populations, the polymorphisms that arose since their common ancestor will remain private to each population and will thus not contribute to the variance in the other population. Therefore, the polymorphisms in population A that also contribute to the variance in population B must be ancestrally shared polymorphisms, specifically at those loci in which ancestral polymorphisms were not removed by drift or selection in either descendant population. Note that these shared polymorphisms on average contribute less variance in the descendant populations than in their ancestor. Thus we see a loss in the additive genetic variance shared between populations over time due to a reduction in both the number of shared ancestral polymorphisms and the variance they each contribute. In appendix A, §c(ii), we describe our analytical predictions for this process, which match well to simulations.

Figure 3

Reduction of polygenic score prediction accuracy in unrepresented population B relative to contemporary, represented population A increases with the strength of stabilizing selection, density of large-effect mutations and power-based GWAS discovery schemes. (a) Reduction of prediction accuracy in population B, when all variance-contributing polymorphisms in population A were ascertained, with increasing time since the common ancestor of populations A and B. Paler lines show results from simulations and darker lines show results from analytical predictions. (b) Distribution of the reduction in prediction accuracy for population B relative to population A from simulations with w = 5 when different sets of variance-contributing sites in population A were ascertained. Results are shown for two mutation effect size distributions. The Gaussian distribution corresponds to the blue distribution and the heavy tail to the yellow distribution in the inset of figure 1c. The same Gaussian distribution was used to produce results in sub-figure (a).

Ascertainment of a subset of causal loci in the GWAS sample

To more realistically explore the impact of GWAS ascertainment, we explore power- and frequency-based variant discovery. In the main text, we focus on a case in which a GWAS uncovered the top 5% of variance-contributing polymorphisms in population A. Under our choice of effect size distribution and strengths of stabilizing selection, these polymorphisms explain just under 50% of the additive genic variance in population A (similar to that of height in Europeans; [96-98]). We also consider a case in which a GWAS uncovered all causal loci with an MAF that exceeds 1%. As expected, under any ascertainment scheme we observe a substantial drop in the variance explained in population B compared to the case in which all polymorphisms in population A are ascertained (figure 3b). If this reduction in variance for each population is the same, then the decline in prediction accuracy in population B relative to population A over time will be similar to the case when all polymorphisms in population A are ascertained. We find that this is approximately the case for a Gaussian mutation effect size distribution but not for the heavy-tail effect size distribution that we simulate (figure 3c). These differences arise from differences in the strength of selection on and variance contributed by ancestrally shared polymorphisms; see electronic supplementary material, §1.2 for details. We present results under GxE, pleiotropy, and directional selection in electronic supplementary material, §1.3.

Difference in polygenic means among populations

In the previous section, we described how stabilizing selection and ascertainment in the GWAS sample can reduce the prediction accuracy of individual genetic values, but what are the consequences for the mean polygenic score of populations? The mean polygenic score of the population is twice the sum of population allele frequencies weighted by effect sizes. If a trait is neutrally evolving, the loci contributing to its variation are just like other neutrally evolving loci, and so differences among populations in their mean polygenic score just reflect a weighted sum of neutral allele frequency differences. Naively, as trait-increasing alleles underlying a neutral trait are equally likely to drift up or down, one might think that over many loci we expect only a small mean difference between populations. However, the polygenic score is a sum rather than a mean, and so each locus we add into the score is like an additional step in the random walk that two populations take away from each other [99]. We expect the variance among populations, i.e. the average squared difference between population means and the global mean, to be 2V FST [11,14]. We first explore the differentiation of mean polygenic scores under a constant optimum and constant environment, and then relax these assumptions.

Stabilizing selection to a constant optimum

Under a constant selective environment, stabilizing selection keeps population mean phenotypes close to their optimum in the face of genetic drift and mutation, such that the difference in mean phenotypes among populations with the same optimum should be minor relative to neutral expectations (even accounting for the lower genetic variance within populations under stabilizing selection; figure 4a). This reduction in the divergence of the mean phenotype between populations reflects the fact that if trait-increasing alleles accidentally drift up in frequency, thus pushing the population mean away from its selective optimum, trait-decreasing alleles are subject to directional selection in their favour (and vice versa).

Figure 4

Polygenic scores can overestimate differences in additive genetic values between populations for traits under stabilizing selection. (a,b) Absolute standardized mean polygenic score differences between populations A and B, , either when (a) all polymorphisms in both populations were ascertained, or when (b) the top 5% of variance-contributing polymorphisms in population A were ascertained. This measure has the same interpretation as Q; it equals . (c) Partitioning of mean polygenic score differences between the ascertained and non-ascertained sets of polymorphisms. Ascertained polymorphisms are from the top 5% of variance-contributing sites in population A. Points represent results for a single simulation. Ellipses denote the 95% confidence interval. The dashed grey line is the line of exactly opposing effects. To explore population differences in mean polygenic scores, we can use the polygenic score equivalent of QST/2FST [100,101], where QST measures the proportion of total variance in additive genetic values attributable to among-population differences. Under neutrality, for strictly additive traits we expect QST to equal 2FST estimated from neutral polymorphisms [14,15], and their ratio should be χ2-distributed with degrees of freedom equal to one fewer than the number of populations (an extension of the Lewontin–Krakauer test; [102]). The polygenic score analogue is Q [60], where additive genetic values are substituted by polygenic scores. For our pair of populations, our Q statistic is where Z is the mean polygenic score in the population denoted in the subscript. When population mean phenotypes are over-dispersed relative to neutral expectations, Q will be larger than 1, potentially resulting in a statistically significant p-value under the null distribution. When population means are under-dispersed relative to neutral expectations, as we would expect for traits under stabilizing selection with the same optimum across populations, Q will be much smaller than 1, and the p-values under the null will be large. Stabilizing selection to the same optimum tightly constrains the difference in mean additive genetic values between populations, i.e. the difference in mean polygenic scores using all of the variation, to be much lower than the difference for neutral traits, even after standardizing for the lower overall levels of variation (figure 4a). This leads to a distribution of Q that is skewed toward lower values than the neutral χ2-distribution (electronic supplementary material, figure S12). However, when we ascertain the top 5% of variance-contributing polymorphisms in population A, the level of standardized polygenic score differentiation under stabilizing selection becomes more similar to neutral levels (figure 4b), with a comparable level of false positive signals of adaptive differentiation as in the neutral case (electronic supplementary material, figure S12). While this result is more specific to our choice of mutation effect size distribution and strengths of stabilizing selection, we can generally conclude that for cases of stabilizing selection with limited ascertainment, estimates of standardized mean polygenic score differences will approach and perhaps exceed what we observe under neutrality. Note that under the neutral case, the distribution of standardized mean polygenic score differences stays about the same between ascertainment levels, whether ascertaining all polymorphisms from both populations or just the top 5% in population A. This is because, under neutrality, all alleles exhibit the same behaviour and do not evolve in coordination with one another. Thus stabilizing selection, combined with incomplete and asymmetric ascertainment, causes the inflation of estimated mean standardized population differences above that seen for the underlying genetic values. Since under stabilizing selection to the same optimum we expect only small differences between populations in their mean genetic value, the disparity between the (true) differences in mean genetic values and (estimated) differences in mean polygenic scores represents the mean difference contributed by non-ascertained polymorphisms. Intuitively this occurs because if population B has a larger value of an ascertained polygenic score than population A, then the ascertained trait-increasing alleles have by chance drifted up in population B (compared to A) and this imbalance will have induced directional selection for the rest of the trait-increasing alleles to decrease their frequency to keep the population close to the optimum (under high polygenicity). In line with our expectations, we find that the mean polygenic score differences calculated from ascertained sites and mean polygenic score differences calculated from non-ascertained sites are close to opposite one another (figure 4b). The countervailing effect of the non-ascertained loci is noisier when stabilizing selection is weaker because with weaker selection, population means can drift further from their optimum. These results confirm that mean polygenic scores calculated from all polymorphisms (i.e. additive genetic values) should closely match between populations experiencing stronger and similar selection pressures and highlight how incomplete ascertainment can by chance generate misleading differences between them.

Adapting to a changing optimum or environment

The combination of changes in the stabilizing selection regime and incomplete ascertainment of causal polymorphisms can also generate misleading differences in mean polygenic scores and signals of differential selection among populations. When the optimal phenotype shifts from its ancestral value equally in both descendant populations, this directional selection generates responses in the true mean genetic values in each population that track each other extremely closely. However, because our polygenic score based on ascertained polymorphisms explains less of the variance in population B than in population A, we capture less of its response to selection and thus artificially generate a difference between populations in their mean polygenic scores (figure 5a). This shift in the mean polygenic score between populations, as well as a decrease in the total variance explained, can push the distribution of Q towards larger values than the neutral case (figure 5b). The chances of getting a p-value below a significance threshold of 0.05 are highest when stabilizing selection is strongest (w = 4; up to 30% chance depending on time since the optimum shift) but tend to be greater than neutrality for the strengths of selection we investigated (electronic supplementary material, figure S6). These signals of polygenic selection are consistent with the idea that Q informs us about directional selection on polygenic scores, as selection has driven an increase in the polygenic score of A relative to B. However, they also highlight the very incomplete picture that we obtain about selection and genetic differences in the phenotype.

Figure 5

Polygenic scores can generate false signals of adaptive differentiation when populations adapt to the same optimum shift. (a) Change in mean phenotype or polygenic score over time since the optimum shifted by 2 standard deviations of the phenotype distribution. True phenotypes are different from additive genetic values in that they account for the contribution of substitutions. (b) Quantile–quantile plot of observed p-values against expected p-values under neutrality (uniform distribution). The dashed lines show equality; points that lie above these lines indicate that the observed distribution has a higher density of low p-values than the neutral distribution, and points that lie below them indicate the opposite.

Discussion

Our work builds on population genetics theory of stabilizing selection to investigate why polygenic scores perform poorly in populations not represented in GWAS samples (unrepresented populations) and how they can be misleading about average genetic differences among populations. Specifically, we provide a theoretical foundation to understand how stabilizing selection on complex traits impacts the population differentiation of polygenic scores without a difference in environments or trait optima. Negative selection has been invoked as an explanation for low portability, consistent with large effect alleles discovered by GWAS being rare in the population [30,54]. Here we explore these predictions under a model of stabilizing selection that has the advantage of offering clear-cut connections between an allele’s phenotypic and fitness effects. With this approach to modelling allele frequency dynamics, we obtained a detailed understanding of how a lack of population representation in GWAS can reduce the utility of polygenic scores in explaining phenotypic variation within and among populations. We discuss the dynamics contributing to low portability, considerations for those interested in explaining variation beyond the GWAS sample, and necessary discretion when using polygenic scores to understand group differences.

Dynamics contributing to low portability

While stabilizing selection around the same optimum among populations reduces phenotypic differentiation relative to neutrality, somewhat counterintuitively it increases genetic differentiation at trait-influencing loci compared to neutral polymorphisms. Genetic differentiation increases because stabilizing selection drives the turnover of polymorphisms that contribute to trait variation within populations; thus polymorphisms that are ancestrally shared between populations are eventually lost and replaced by private mutations. The consequences of this differentiation at large-effect QTLs and a few loci have been explored by various authors [82,83,103], and here we investigated its implications on GWAS results and downstream analyses. Over time, the sets of polymorphisms that contribute substantial variance in each population diverge, such that stabilizing selection reduces the additive genic variance in one population that can be explained by polymorphisms in another. Thus, polygenic scores constructed from polymorphisms ascertained in a GWAS will have reduced prediction accuracy in unrepresented populations compared to represented ones. The prediction accuracy for unrepresented populations decreases with increasing strengths of stabilizing selection and time since the common ancestor with represented populations. Occasional fluctuations in the fitness optima would negligibly influence our portability results, because such directional selection would cause very minor shifts in frequency over the short timescales we consider [90]. The distribution of mutation effect sizes is critical to understanding the effects of stabilizing selection on the turnover of genetic variation among populations. With increasing weight towards larger mutation effects, stabilizing selection leads to a faster reduction in the shared additive genetic variance, and thus polygenic score portability. In addition, the decline in prediction accuracy for unrepresented populations relative to represented ones will depend on frequency- and power-based ascertainment schemes; since polymorphisms shared between populations will more likely be of small effect and common, the discovery of polymorphisms based on MAF will lead to a weaker decline than discovery based on variance contributed. Alternatively, if the trait was truly infinitesimal, stabilizing selection would not contribute to the loss of heterozygosity, such that the sharing of genetic variance would be well predicted by genetic drift, or neutral FST. While the loci mapped by GWAS are often of small effect, and the traits highly polygenic, we know that their effects are not infinitesimal, as loci must make a reasonable contribution to the variance to be discovered by a GWAS (see [68] for a detailed population genetic model). While we do not consider migration between represented and unrepresented populations here, the general decline in portability with increasing genetic differentiation between groups should hold, though the exact prediction will differ. Pleiotropy and GxE complicate the predictions of stabilizing selection for portability, which we discuss in electronic supplementary material, §1.3. Differences in the environment among populations can lead to changes in the effect sizes of alleles (GxE), which reduces the portability because (i) effect sizes will be only partially correlated between populations and (ii) stabilizing selection will purge more of the ancestral variation shared between populations since GxE increases the overall trait variance when the interacting environment changes. Pleiotropy, in our implementation, leads to the opposite effect. Holding the average strength of selection on all alleles constant, when an allele can independently affect multiple traits under stabilizing selection, the correlation between its effect size on the trait of interest and its selection coefficient weakens. Thus stabilizing selection purges less of the shared variation and causes a weaker reduction in portability.

Considerations for applications of GWAS results across groups

Like much of the recent work on portability, our work emphasizes the reduction in prediction accuracy for populations not represented in GWAS [29,31,48,104,105]. A natural conclusion is that polygenic predictions that work well across populations will require GWAS across a range of diverse ancestries, in line with other calls to reduce Euro-centrism in GWAS [23,24,26]. In addition to changes in allele frequency and linkage disequilibrium, the relationship of polygenic scores to phenotypes will vary across populations due to variation in genetic effects, assortative mating, and differences in GxG and GxE. Indeed the prediction accuracy of polygenic scores for some traits was recently shown to be quite variable across different groups within an ancestry, suggesting that GxE and environmental variation are quite prevalent for some traits [45]. In addition, associated variants on the same haplotype were found to have differing effects in European–Americans and admixed African–Americans, suggesting that genetic or environmental interactions modify additive effect sizes across groups [47]. Thus, we caution that a better understanding of the portability of polygenic scores across populations also requires a stronger understanding of the causes of variation in prediction accuracy within populations. With the rise of ancient DNA sequencing, GWAS from contemporary populations have also been used to construct polygenic scores for ancient individuals (reviewed in [94]). These scores have been used to provide a window into the phenotypic diversity of past populations [106-108] and to disentangle genetic and environmental contributors to temporal phenotypic variation (e.g. at the Neolithic transition; [109-111]). Such studies are most convincing when there are relevant phenotypic measurements on at least some ancient individuals and polygenic prediction accuracies can be judged. However, investigators will often not have this luxury, leaving unclear the insight these approaches can provide. Some studies using ancient DNA have identified reduced rates of disease alleles in the past compared to present-day populations. While we have focused here on quantitative traits, rather than disease traits, we caution that purifying selection against risk alleles will lead modern day populations to systematically underrepresent the diversity of disease alleles in the past [107,112-114].

Misinterpretations of group differences based on polygenic scores

When using polygenic scores constructed from the ascertained set of polymorphisms, we increase the possibility of generating misleading signals of differentiation between populations. Stabilizing selection to a constant optimum alone does not generate more false signals of directional selection than what we expect under the neutral evolution of complex traits. However, the standardized difference between populations will be systematically over-estimated with ascertained polygenic scores. We see this result because with stabilizing selection, the unascertained portion of the variance tends to act exactly counter to the trend seen in the ascertained portion. Misleading signals of adaptive polygenic differentiation can also be generated when the stabilizing selection regime shifts in the same way in each population (such that there is still minor phenotypic differentiation between them) as the ascertained polymorphisms only capture a shift in the ascertainment population. This issue arises because stabilizing selection lowers the proportion of the additive genic variance explained in the unrepresented population and so we capture a lower proportion of that population’s response to directional selection. Thus this issue of missing the parallel adaptive response across ancestries can be expected in many situations with imperfect portability. Such signals of polygenic adaptation can be useful as Q is correctly detecting that directional selection has acted on the genetic variation along the branch leading to the represented population, but the signal is very open to the misinterpretation that the unrepresented population has not also responded to the same selection pressures. Many traits have likely experienced a mixture of stabilizing selection and bursts of directional selection across human history. Even if the populations share the same phenotypic optimum, if an environmental change systematically shifts one population away from this optimum, there would be directional polygenic adaptation to move that population back towards the optimum, resulting in a difference in polygenic scores but no difference in the mean phenotypes between populations [67]. Therefore, under current ascertainment schemes and pervasive stabilizing selection, the difference in mean polygenic scores among populations provides very unreliable information about the potential role of selection in generating phenotypic differences among populations. A polygenic score is a prediction of how an individual’s phenotype is expected to deviate away from the sample mean given their genotype at some (large) number of polymorphisms. Sometimes they are quoted as absolute values, but that is always based on an empirical phenotypic mean. The mean polygenic score in a population cannot inform us of the mean phenotype in a population, even if it was constructed from all polymorphisms and averaged across all individuals. This is because these scores are based on polymorphic genotypes alone whereas the absolute phenotype of an individual represents the end product of the entire genome and environment played out through a vast number of developmental processes. Yet it is easy to fall into the trap of believing that a difference in polygenic scores between groups is a strong statement about the difference in the mean phenotype of those groups, which also differ in a myriad of environmental and cultural factors (a related set of issues are present in epidemology; [115]). Thus while the field of human genetics is increasing its power to predict phenotypic variance among individuals within groups, it remains a poor guide to the causes of phenotypic variance among groups with greater environmental and genetic differentiation [116,117].

101 in total

1. Differentiation of allelic frequencies at quantitative trait loci affecting locally adaptive traits.

Authors: R G Latta
Journal: Am Nat Date: 1998-03 Impact factor: 3.926

2. An apportionment of human DNA diversity.

Authors: G Barbujani; A Magagni; E Minch; L L Cavalli-Sforza
Journal: Proc Natl Acad Sci U S A Date: 1997-04-29 Impact factor: 11.205

3. Negative selection on complex traits limits phenotype prediction accuracy between populations.

Authors: Arun Durvasula; Kirk E Lohmueller
Journal: Am J Hum Genet Date: 2021-03-09 Impact factor: 11.025

4. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia.

Authors: David Curtis
Journal: Psychiatr Genet Date: 2018-10 Impact factor: 2.458

5. Quantitative traits in relation to population structure: why and how are they used and what do they imply?

Authors: R Chakraborty
Journal: Hum Biol Date: 1990-02 Impact factor: 0.553

6. Using Genetic Distance to Infer the Accuracy of Genomic Prediction.

Authors: Marco Scutari; Ian Mackay; David Balding
Journal: PLoS Genet Date: 2016-09-02 Impact factor: 5.917

7. Modeling Heterogeneity in the Genetic Architecture of Ethnically Diverse Groups Using Random Effect Interaction Models.

Authors: Yogasudha Veturi; Gustavo de Los Campos; Nengjun Yi; Wen Huang; Ana I Vazquez; Brigitte Kühnel
Journal: Genetics Date: 2019-02-22 Impact factor: 4.562

8. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.

Authors: Jian Yang; Andrew Bakshi; Zhihong Zhu; Gibran Hemani; Anna A E Vinkhuyzen; Sang Hong Lee; Matthew R Robinson; John R B Perry; Ilja M Nolte; Jana V van Vliet-Ostaptchouk; Harold Snieder; Tonu Esko; Lili Milani; Reedik Mägi; Andres Metspalu; Anders Hamsten; Patrik K E Magnusson; Nancy L Pedersen; Erik Ingelsson; Nicole Soranzo; Matthew C Keller; Naomi R Wray; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2015-08-31 Impact factor: 38.330

9. Defining the role of common variation in the genomic and biological architecture of adult human height.

Authors: Andrew R Wood; Tonu Esko; Jian Yang; Sailaja Vedantam; Tune H Pers; Stefan Gustafsson; Audrey Y Chu; Karol Estrada; Jian'an Luan; Zoltán Kutalik; Najaf Amin; Martin L Buchkovich; Damien C Croteau-Chonka; Felix R Day; Yanan Duan; Tove Fall; Rudolf Fehrmann; Teresa Ferreira; Anne U Jackson; Juha Karjalainen; Ken Sin Lo; Adam E Locke; Reedik Mägi; Evelin Mihailov; Eleonora Porcu; Joshua C Randall; André Scherag; Anna A E Vinkhuyzen; Harm-Jan Westra; Thomas W Winkler; Tsegaselassie Workalemahu; Jing Hua Zhao; Devin Absher; Eva Albrecht; Denise Anderson; Jeffrey Baron; Marian Beekman; Ayse Demirkan; Georg B Ehret; Bjarke Feenstra; Mary F Feitosa; Krista Fischer; Ross M Fraser; Anuj Goel; Jian Gong; Anne E Justice; Stavroula Kanoni; Marcus E Kleber; Kati Kristiansson; Unhee Lim; Vaneet Lotay; Julian C Lui; Massimo Mangino; Irene Mateo Leach; Carolina Medina-Gomez; Michael A Nalls; Dale R Nyholt; Cameron D Palmer; Dorota Pasko; Sonali Pechlivanis; Inga Prokopenko; Janina S Ried; Stephan Ripke; Dmitry Shungin; Alena Stancáková; Rona J Strawbridge; Yun Ju Sung; Toshiko Tanaka; Alexander Teumer; Stella Trompet; Sander W van der Laan; Jessica van Setten; Jana V Van Vliet-Ostaptchouk; Zhaoming Wang; Loïc Yengo; Weihua Zhang; Uzma Afzal; Johan Arnlöv; Gillian M Arscott; Stefania Bandinelli; Amy Barrett; Claire Bellis; Amanda J Bennett; Christian Berne; Matthias Blüher; Jennifer L Bolton; Yvonne Böttcher; Heather A Boyd; Marcel Bruinenberg; Brendan M Buckley; Steven Buyske; Ida H Caspersen; Peter S Chines; Robert Clarke; Simone Claudi-Boehm; Matthew Cooper; E Warwick Daw; Pim A De Jong; Joris Deelen; Graciela Delgado; Josh C Denny; Rosalie Dhonukshe-Rutten; Maria Dimitriou; Alex S F Doney; Marcus Dörr; Niina Eklund; Elodie Eury; Lasse Folkersen; Melissa E Garcia; Frank Geller; Vilmantas Giedraitis; Alan S Go; Harald Grallert; Tanja B Grammer; Jürgen Gräßler; Henrik Grönberg; Lisette C P G M de Groot; Christopher J Groves; Jeffrey Haessler; Per Hall; Toomas Haller; Goran Hallmans; Anke Hannemann; Catharina A Hartman; Maija Hassinen; Caroline Hayward; Nancy L Heard-Costa; Quinta Helmer; Gibran Hemani; Anjali K Henders; Hans L Hillege; Mark A Hlatky; Wolfgang Hoffmann; Per Hoffmann; Oddgeir Holmen; Jeanine J Houwing-Duistermaat; Thomas Illig; Aaron Isaacs; Alan L James; Janina Jeff; Berit Johansen; Åsa Johansson; Jennifer Jolley; Thorhildur Juliusdottir; Juhani Junttila; Abel N Kho; Leena Kinnunen; Norman Klopp; Thomas Kocher; Wolfgang Kratzer; Peter Lichtner; Lars Lind; Jaana Lindström; Stéphane Lobbens; Mattias Lorentzon; Yingchang Lu; Valeriya Lyssenko; Patrik K E Magnusson; Anubha Mahajan; Marc Maillard; Wendy L McArdle; Colin A McKenzie; Stela McLachlan; Paul J McLaren; Cristina Menni; Sigrun Merger; Lili Milani; Alireza Moayyeri; Keri L Monda; Mario A Morken; Gabriele Müller; Martina Müller-Nurasyid; Arthur W Musk; Narisu Narisu; Matthias Nauck; Ilja M Nolte; Markus M Nöthen; Laticia Oozageer; Stefan Pilz; Nigel W Rayner; Frida Renstrom; Neil R Robertson; Lynda M Rose; Ronan Roussel; Serena Sanna; Hubert Scharnagl; Salome Scholtens; Fredrick R Schumacher; Heribert Schunkert; Robert A Scott; Joban Sehmi; Thomas Seufferlein; Jianxin Shi; Karri Silventoinen; Johannes H Smit; Albert Vernon Smith; Joanna Smolonska; Alice V Stanton; Kathleen Stirrups; David J Stott; Heather M Stringham; Johan Sundström; Morris A Swertz; Ann-Christine Syvänen; Bamidele O Tayo; Gudmar Thorleifsson; Jonathan P Tyrer; Suzanne van Dijk; Natasja M van Schoor; Nathalie van der Velde; Diana van Heemst; Floor V A van Oort; Sita H Vermeulen; Niek Verweij; Judith M Vonk; Lindsay L Waite; Melanie Waldenberger; Roman Wennauer; Lynne R Wilkens; Christina Willenborg; Tom Wilsgaard; Mary K Wojczynski; Andrew Wong; Alan F Wright; Qunyuan Zhang; Dominique Arveiler; Stephan J L Bakker; John Beilby; Richard N Bergman; Sven Bergmann; Reiner Biffar; John Blangero; Dorret I Boomsma; Stefan R Bornstein; Pascal Bovet; Paolo Brambilla; Morris J Brown; Harry Campbell; Mark J Caulfield; Aravinda Chakravarti; Rory Collins; Francis S Collins; Dana C Crawford; L Adrienne Cupples; John Danesh; Ulf de Faire; Hester M den Ruijter; Raimund Erbel; Jeanette Erdmann; Johan G Eriksson; Martin Farrall; Ele Ferrannini; Jean Ferrières; Ian Ford; Nita G Forouhi; Terrence Forrester; Ron T Gansevoort; Pablo V Gejman; Christian Gieger; Alain Golay; Omri Gottesman; Vilmundur Gudnason; Ulf Gyllensten; David W Haas; Alistair S Hall; Tamara B Harris; Andrew T Hattersley; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Lucia A Hindorff; Aroon D Hingorani; Albert Hofman; G Kees Hovingh; Steve E Humphries; Steven C Hunt; Elina Hypponen; Kevin B Jacobs; Marjo-Riitta Jarvelin; Pekka Jousilahti; Antti M Jula; Jaakko Kaprio; John J P Kastelein; Manfred Kayser; Frank Kee; Sirkka M Keinanen-Kiukaanniemi; Lambertus A Kiemeney; Jaspal S Kooner; Charles Kooperberg; Seppo Koskinen; Peter Kovacs; Aldi T Kraja; Meena Kumari; Johanna Kuusisto; Timo A Lakka; Claudia Langenberg; Loic Le Marchand; Terho Lehtimäki; Sara Lupoli; Pamela A F Madden; Satu Männistö; Paolo Manunta; André Marette; Tara C Matise; Barbara McKnight; Thomas Meitinger; Frans L Moll; Grant W Montgomery; Andrew D Morris; Andrew P Morris; Jeffrey C Murray; Mari Nelis; Claes Ohlsson; Albertine J Oldehinkel; Ken K Ong; Willem H Ouwehand; Gerard Pasterkamp; Annette Peters; Peter P Pramstaller; Jackie F Price; Lu Qi; Olli T Raitakari; Tuomo Rankinen; D C Rao; Treva K Rice; Marylyn Ritchie; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Jouko Saramies; Mark A Sarzynski; Peter E H Schwarz; Sylvain Sebert; Peter Sever; Alan R Shuldiner; Juha Sinisalo; Valgerdur Steinthorsdottir; Ronald P Stolk; Jean-Claude Tardif; Anke Tönjes; Angelo Tremblay; Elena Tremoli; Jarmo Virtamo; Marie-Claude Vohl; Philippe Amouyel; Folkert W Asselbergs; Themistocles L Assimes; Murielle Bochud; Bernhard O Boehm; Eric Boerwinkle; Erwin P Bottinger; Claude Bouchard; Stéphane Cauchi; John C Chambers; Stephen J Chanock; Richard S Cooper; Paul I W de Bakker; George Dedoussis; Luigi Ferrucci; Paul W Franks; Philippe Froguel; Leif C Groop; Christopher A Haiman; Anders Hamsten; M Geoffrey Hayes; Jennie Hui; David J Hunter; Kristian Hveem; J Wouter Jukema; Robert C Kaplan; Mika Kivimaki; Diana Kuh; Markku Laakso; Yongmei Liu; Nicholas G Martin; Winfried März; Mads Melbye; Susanne Moebus; Patricia B Munroe; Inger Njølstad; Ben A Oostra; Colin N A Palmer; Nancy L Pedersen; Markus Perola; Louis Pérusse; Ulrike Peters; Joseph E Powell; Chris Power; Thomas Quertermous; Rainer Rauramaa; Eva Reinmaa; Paul M Ridker; Fernando Rivadeneira; Jerome I Rotter; Timo E Saaristo; Danish Saleheen; David Schlessinger; P Eline Slagboom; Harold Snieder; Tim D Spector; Konstantin Strauch; Michael Stumvoll; Jaakko Tuomilehto; Matti Uusitupa; Pim van der Harst; Henry Völzke; Mark Walker; Nicholas J Wareham; Hugh Watkins; H-Erich Wichmann; James F Wilson; Pieter Zanen; Panos Deloukas; Iris M Heid; Cecilia M Lindgren; Karen L Mohlke; Elizabeth K Speliotes; Unnur Thorsteinsdottir; Inês Barroso; Caroline S Fox; Kari E North; David P Strachan; Jacques S Beckmann; Sonja I Berndt; Michael Boehnke; Ingrid B Borecki; Mark I McCarthy; Andres Metspalu; Kari Stefansson; André G Uitterlinden; Cornelia M van Duijn; Lude Franke; Cristen J Willer; Alkes L Price; Guillaume Lettre; Ruth J F Loos; Michael N Weedon; Erik Ingelsson; Jeffrey R O'Connell; Goncalo R Abecasis; Daniel I Chasman; Michael E Goddard; Peter M Visscher; Joel N Hirschhorn; Timothy M Frayling
Journal: Nat Genet Date: 2014-10-05 Impact factor: 38.330

10. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations.

Authors: Ying Wang; Jing Guo; Guiyan Ni; Jian Yang; Peter M Visscher; Loic Yengo
Journal: Nat Commun Date: 2020-07-31 Impact factor: 14.919

7 in total

1. An integrative skeletal and paleogenomic analysis of stature variation suggests relatively reduced health for early European farmers.

Authors: Stephanie Marciniak; Christina M Bergey; Ana Maria Silva; Agata Hałuszko; Mirosław Furmanek; Barbara Veselka; Petr Velemínský; Giuseppe Vercellotti; Joachim Wahl; Gunita Zariņa; Cristina Longhi; Jan Kolář; Rafael Garrido-Pena; Raúl Flores-Fernández; Ana M Herrero-Corral; Angela Simalcsik; Werner Müller; Alison Sheridan; Žydrūnė Miliauskienė; Rimantas Jankauskas; Vyacheslav Moiseyev; Kitti Köhler; Ágnes Király; Beatriz Gamarra; Olivia Cheronet; Vajk Szeverényi; Viktória Kiss; Tamás Szeniczey; Krisztián Kiss; Zsuzsanna K Zoffmann; Judit Koós; Magdolna Hellebrandt; Robert M Maier; László Domboróczki; Cristian Virag; Mario Novak; David Reich; Tamás Hajdu; Noreen von Cramon-Taubadel; Ron Pinhasi; George H Perry
Journal: Proc Natl Acad Sci U S A Date: 2022-04-06 Impact factor: 12.779