Literature DB >> 29686100

Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data.

Thomas A DiPrete¹, Casper A P Burik², Philipp D Koellinger³.

Abstract

Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.

Entities: Chemical

Keywords: causal effects; genetic instrumental variables; genome-wide association studies; pleiotropy; polygenic scores

Mesh：

Year: 2018 PMID： 29686100 PMCID： PMC5984483 DOI： 10.1073/pnas.1707388115

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

A major challenge in the social sciences and in epidemiology is the identification of causal effects in nonexperimental data. In these disciplines, ethical and legal considerations along with practical constraints often preclude the use of experiments to randomize the assignment of observations between treatment and control groups or to carry out such experiments in samples that represent the relevant population (1). Instead, many important questions are studied in field data which make it difficult to discern between causal effects and (spurious) correlations that are induced by unobserved factors (2). Obviously, confusing correlation with causation is not only a conceptual error; it can also lead to ineffective or even harmful recommendations, treatments, and policies, as well as a significant waste of resources (e.g., as in ref. 3). One important source of bias in field data stems from genetic effects: Twin studies (4) as well as methods based on molecular genetic data (5, 6) allow estimation of the proportion of variance in a trait that is due to linear genetic effects (so-called narrow-sense heritability). Using these and related methods, an overwhelming body of literature demonstrates that almost all important human characteristics, behaviors, and health outcomes are influenced both by genetic predisposition and by environmental factors (7–9). Most of these traits are “genetically complex,” which means that the observed heritability is due to the accumulation of effects from a very large number of genes that each have a small, often statistically insignificant, influence (10). Furthermore, genes often influence several seemingly unrelated traits, a phenomenon called direct or vertical pleiotropy (11, 12). For example, a mutation of a single gene that causes the disease phenylketonuria is responsible for mental retardation and also for abnormally light hair and skin color (13). Pleiotropy is not restricted to diseases. All genes involved in healthy cell metabolism and cell division can be expected to directly influence a broad range of traits such as body height, cognitive ability, and longevity, even if the effect on each of these traits may be tiny. Similarly, any gene involved in neurodevelopment and brain function is likely to contribute to human behavior and mental health in some way (14). In addition to direct pleiotropic effects, genes can also have indirect or horizontal pleiotropic effects, where a genetic variant influences one trait, which in turn influences another trait (11). The similarity of the genetic architecture of two traits is estimated by their genetic correlation (i.e., the correlation of the “true” effect sizes of all genetic variants on both traits) (15), which captures both direct and indirect pleiotropic effects (16–18). Genetic correlations exist between many traits and often exceed their phenotypic correlations (19), giving rise to the concern that direct pleiotropy may substantially bias studies that do not control for genetic effects (20). If an experimental design is not possible, the gold standard in the presence of genetic confounds is to compare outcomes for monozygotic (MZ) twins (21, 22), who are by definition genetically (almost) identical (23). In addition, this approach also controls for effects that arise from shared parental environment. However, the practical challenge is that such studies require very large sample sizes of MZ twin pairs because differences within MZ twin pairs tend to be small or nonexistent. Furthermore, unobserved environmental differences between the twins or reverse causation can still lead to wrong conclusions in this study design. Another popular strategy to isolate causal effects in nonexperimental data is to use instrumental variables (IVs) (24). Valid IVs are conceptually similar to natural experiments: They provide an exogenous “shock” on the exposure of interest to isolate the effect of that exposure on an outcome. Valid IVs need to satisfy two important conditions. First, they need to be correlated with the exposure conditional on the other control variables in the regression (i.e., IVs need to be “relevant”). Second, they need to be independent of the error term of the regression conditional on the other control variables and produce their correlation with the outcome solely through their effect on the exposure (the so-called exclusion restriction). In practice, finding valid IVs that satisfy both requirements is difficult. In particular, satisfying the exclusion restriction is challenging. [Two other conditions that valid IVs need to satisfy are monotonicity (everyone who is affected by the IV is affected in the same direction) and the stable unit treatment value assumption (SUTVA): The “treatment” of one unit does not affect the outcome variable for other units.] Epidemiologists have proposed to use genetic information to construct IVs and termed this approach Mendelian randomization (MR) (25–28). The idea is in principle appealing because genotypes are randomized in the production of gametes by the process of meiosis. Thus, conditional on the genotype of the parents, the genotype of the offspring is the result of a random draw. So if it could be known which genes affect the exposure, it may be possible to use them as IVs to identify the causal influence of the exposure on some outcome of interest. However, there are four challenges to this idea. First, we need to know which genes affect the exposure and isolate true genetic effects from environmental confounds that are correlated with ancestry. Second, if the exposure is a genetically complex trait, any gene by itself will capture only a very small part of the variance in the trait, which leads to the well-known problem of weak instruments (29, 30). Third, genotypes are randomly assigned only conditional on the genotype of the parents. Unless it is possible to control for the genotype of the parents, the genotype of the offspring is not random and correlates with everything that the genotypes of the parents correlate with (e.g., parental environment, personality, and habits) (31). Fourth, if direct pleiotropic effects of genes are the source of the confound, these genes could obviously not be used as IVs. One could try to isolate a subset of genes that influence only the exposure, but such attempts are still hindered by our limited knowledge of the function of most genes (27, 32, 33). Recent advances in complex trait genetics make it possible to address the first two challenges of MR. Array-based genotyping technologies have made the collection of genetic data fast and cheap. As a result, very large datasets are now available to study the genetic architecture of many human traits and a plethora of robust, replicable genetic associations have recently been reported in large-scale genome-wide association studies (GWASs) (34). These results begin to shed light on the genetic architecture that is driving the heritability of traits such as body height (35), body mass index (BMI) (36), schizophrenia (37), Alzheimer’s disease (38), depression (39), and educational attainment (EA) (15). High-quality GWASs use several strategies to control for genetic structure in the population, and empirical evidence suggests that the vast majority of the reported genetic associations for many traits is not confounded by ancestry (40–43). Polygenic scores (PGSs) have become the favored tool for summarizing the genetic predispositions for genetically complex traits (15, 39, 44, 45). PGSs are linear indexes that aggregate the estimated effects of all currently measured genetic variants [typically single-nucleotide polymorphisms (SNPs)]. The effects of each SNP on an outcome are estimated in large-scale GWASs that exclude the prediction sample. Recent studies demonstrate that this approach yields PGSs that begin to predict genetically complex outcomes such as height, BMI, schizophrenia, and EA (35–37, 39, 46). Although PGSs still capture substantially less of the variation in traits than suggested by their heritability (47) (an issue we return to below), PGSs capture a much larger share of the variance of genetically complex traits than individual genetic markers. The third challenge to MR in the above list could in principle be addressed if the genotypes of the parents and the offspring are observed (e.g., in a large sample of parent–offspring trios) or by using large samples of siblings or dizygotic twins where the genetic differences between siblings are random draws from the parent’s genotypes. However, the fourth challenge (i.e., pleiotropy) remains a serious obstacle despite recent efforts to relax the exogeneity assumptions in MR (48–50). Here, we address the implications of pleiotropy for modeling causal relationships using nonexperimental data. We demonstrate that pleiotropy is a serious source of bias in ordinary least-squares regression (OLS) and MR. We propose alternative estimation strategies that use PGSs for the outcome of interest to reduce bias arising from pleiotropy. In particular, we propose an approach that we call genetic instrumental variables (GIV) regression that can be implemented using widely available statistical software. GIV regression estimates practically useful upper and lower bounds for the causal effect of an exposure on an outcome even in the presence of substantial direct pleiotropy. We begin by providing intuition and laying out the assumptions of our approach. We go on to show that GIV regression produces accurate estimates for the effect of the PGSs on the outcome variable when the other covariates in the model are exogenous, when the true PGS is uncorrelated with the error term net of the included covariates, and when the GWAS sample sizes are sufficiently large relative to the number of SNPs. We then turn to the more complex case of when a regressor of interest is potentially correlated with unobserved variables in the error term because of pleiotropy, and we show with evidence from a comprehensive set of simulations that the bias under these assumptions with GIV regression is generally smaller than with OLS, MR, or what we term an enhanced version of MR (EMR). Next, we demonstrate the practical usefulness of our approach in empirical applications using the publicly available Health and Retirement Study (HRS) (51). First, we demonstrate that a consistent estimate of the so-called chip heritability (47) of EA can be obtained with our method. Then, we estimate the effects of body height on EA. As a “negative control,” we check whether our method finds a causal effect of EA on body height (it should not). (Note that a clean experimental design which randomizes people into groups based on body height or EA is not possible. Thus, any attempt to study the causal relationship between the two variables must rely on observational data and naturally occurring experiments like the genetic endowment of individuals, which we exploit here.) Formal derivations and technical details are contained in .

Theory

Intuition.

To build intuition for our approach, we introduce the concept of the true PGS for which would be constructed using the true effects of each SNP on . In theory, the true SNP effects could be estimated in a GWAS on in an infinitely large sample that is drawn from the same population as the prediction sample. The true PGS would capture the narrow-sense heritability of . Of course, the true PGS is unknown. All one can practically obtain is a PGS from a finite GWAS sample that will capture a part, but not all, of the genetic influence on because the effect of each SNP is estimated with noise. The attenuated predictive accuracy of practically available PGSs (45, 47, 52) is conceptually similar to the well-known problem of measurement error in regression analysis. It has long been understood that multiple indicators can, under certain conditions, provide a strategy to correct regression estimates for attenuation from measurement error (53–55). We show below that by splitting the GWAS sample into independent subsamples, one can obtain several PGSs (i.e., multiple indicators) in the prediction sample. Each will have even lower predictive accuracy than the original score due to the smaller GWAS subsamples used in their construction, but these multiple indicators can be used as instrumental variables for each other, and the instruments will satisfy the assumptions of IV regression to the extent that the measurement errors (the difference between the true and calculated PGSs) are uncorrelated. Standard two-stage least-squares (2SLS) regression (24) (readily available in statistics software packages) using at least one valid IV for the PGS of can then be used to back out an unbiased estimate of the heritability of . Next, presume the matter of interest is not heritability, but the causal effect of some treatment on , where is also heritable and some genes have direct pleiotropic effects on both. If these genes are not known and not controlled for, regressing on would result in omitted variable bias. (Unfortunately, that is the reality in most social scientific and epidemiological studies that use nonexperimental data.) Suppose the effects of all genes that influence through channels other than could be known. Theoretically, one could estimate these effects in a GWAS on that controls for in an infinitely large sample. That information could be used to construct a “true conditional” PGS in a prediction sample. Adding the true conditional PGS to a regression of on in the prediction sample would effectively eliminate bias arising from direct pleiotropy. However, the true conditional PGS is also unknown and all we can practically obtain is a noisy proxy of it from a finite GWAS sample. While it is not guaranteed, the general conclusion of the literature is that the use of proxy variables is an improvement over omitting the variable being proxied (56, 57). Furthermore, having a valid IV for the conditional score would potentially correct for its noise and get us closer to estimating the true causal effect of on . As before, a valid IV can be practically obtained by splitting the GWAS sample into independent parts and standard IV estimation techniques such as two-stage least squares can be used. We refer to this approach as conditional genetic IV regression (GIV-C). If conditional GWAS results are not available, one can still add the unconditional PGS for as a control variable and use IV regression with multiple indicators for this score to correct for measurement error. We refer to this as unconditional GIV regression (GIV-U). GIV-U still corrects for bias arising from direct pleiotropy, but this strategy will overcontrol and result in estimates for that are biased toward zero because the unconditional PGS also includes indirect pleiotropic effects of genes that affect only because they affect . However, extensive simulations show that the combination of GIV-C and GIV-U turns out to produce reasonable upper and lower bounds for the effect of on across a broad range of scenarios if the only sources of bias are pleiotropic genes. The GIV strategy starts to break down when bias arises from unobserved nongenetic factors as well as from pleiotropic effects. We show below that both GIV and MR produce biased estimates in this case. However, we demonstrate that the combination of GIV-C and GIV-U still outperforms OLS and MR. Furthermore, the GIV approach has additional utility because it can be combined with other strategies to reduce the effects of environmental endogeneity (e.g., additional control variables or family fixed effects). We demonstrate that these combined strategies can potentially provide accurate information about the effects of an exposure in situations with both genetic and nongenetic sources of endogeneity. In contrast, the problems for MR that are produced by pleiotropy bias are not fixable in a similar manner.

Assumptions.

GIV regression builds on the standard identifying assumptions of IV regression (24). In the context of our approach, this implies six specific conditions: Polygenicity: The outcome is a genetically complex trait that is influenced by many genetic variants, each with a very small effect. Complete genetic information: The available genetic data include all variants that influence the variable(s) of interest. Genetic effects are linear: All genetic variants influence the variable(s) of interest via additive linear effects. Thus, there are no genetic interactions (i.e., epistasis) or dominant alleles. Unbiased GWAS results: The available GWAS results are not systematically biased by omitted environmental variables. For example, failure to control for population structure can lead to spurious genetic associations (31). Nonoverlapping samples: It is possible to divide GWAS samples into nonoverlapping subsamples drawn from the same population. The genetic effects on are the same in the GWAS and the prediction samples; i.e., the genetic correlation between samples is one.

Estimating Narrow-Sense SNP Heritability from Polygenic Scores.

Under these assumptions, consistent estimates of the chip heritability of a trait (i.e., the proportion of variance in a trait that is due to linear effects of currently measurable SNPs) can be obtained from polygenic scores (for full details, see ). If is the outcome variable, is a vector of exogenous control variables, and is a summary measure of genetic tendency for in the presence of controls for , then one can writewhere is an matrix of genetic markers, and is the vector of SNP effect sizes, where the number of SNPs is typically in the millions. If the true effects of each SNP on the outcome were known, the entire genetic tendency for would be captured by the true unconditional score , and the marginal of in Eq. would be the chip heritability of the trait. We refer to the estimate of the PGS from actually available GWAS data in the presence of controls for as , whereand is the estimation error in and is substituted for in Eq. . The variance of a trait that is captured by its available PGS increases with the available GWAS sample size to estimate and converges to the SNP-based narrow-sense heritability of the trait at the limit if all relevant genetic markers were included in the GWAS and if the GWAS sample size were sufficiently large (45, 47, 52). Eq. contains what is called in econometrics a “generated regressor” in that is a function of a set of variables and coefficients () from another model. As previous work (58, 59) has established, OLS will provide consistent estimates of the parameters of Eq. (although corrections for standard errors are needed) if is substituted for under a set of reasonable assumptions that include convergence in probability of to as the sample size grows larger. However, the practical utility of this mathematical result is questionable in the current context, when the number of variables in is in the millions while the number of cases available to estimate is far smaller than that. The imposed ratio of coefficients to cases requires nonconventional estimation methods that use a combination of statistical assumptions to obtain estimates of . Empirical studies using PGSs for a variety of traits have consistently demonstrated substantial attenuation in the estimate of (45, 47, 52), and, while the bias diminishes with GWAS sample size, we are a long time away from having large enough sample sizes to bring this attenuation down to ignorable levels. This situation, therefore, calls for alternative strategies to address important questions with the datasets currently available. The most straightforward solution to the problem of attenuation bias is to obtain multiple indicators of the PGS by splitting the GWAS discovery sample for into two mutually exclusive subsamples with at least partially overlapping sets of SNPs. This produces noisier estimates of with lower predictive accuracy, but the multiple indicators can be used as IVs for each other. The 2SLS regression using as an instrument for will then recover a consistent estimate of in Eq. under standard IV assumptions (55, 60). As discussed more technically in , an additional important assumption for to be a valid instrument for is that be a complex trait, meaning that it is influenced by a large number of genetic markers, each of which has a very small effect. If is primarily influenced by a relatively small number of markers, then the method proposed here would not work well. However, there would also be no need for the proposed method, because the markers with large effects could be easily identified and their effects estimated with reasonable precision using discovery sample sizes that are already obtainable. Another required assumption is that the genetic markers are independent of each other. In general, genetic markers are correlated if they are located close to each other on the same chromosome. However, it is currently possible to isolate several hundred thousand markers from the total set of millions of SNPs that have sufficient spatial separation in the DNA to be essentially mutually independent, which means that this assumption can be satisfied to a sufficient level of accuracy. Assuming that the variables in Eq. are standardized to have mean zero and a SD of one, and further assuming that the variables contained in control for population stratification or are not correlated with genotype , a consistent estimate of the chip heritability of can now be obtained from , where is the correlation coefficient. The heritability estimate is not equal to simply because we regressed on instead of . Thus, we standardize with respect to the variance of instead of , which leads to a bias equal to . Multiplying with the correlation between and recovers a consistent estimate for (). (For an alternative approach to correcting attenuation bias based on the use of multiple indicators in a structural equation modeling framework, see ref. 61.)

Reducing Bias Arising from Genetic Correlation Between Exposure and Outcome.

Polygenic scores also play a potentially important role in situations where the question of interest is not the chip heritability of per se, but rather the effect of some nonrandomized exposure on (e.g., a behavioral or environmental variable or a nonrandomized treatment due to policy or medical interventions). We can rewrite Eq. by adding a treatment variable of interest , such thatwhereWe assume that the disturbance term is uncorrelated with genetic variables. (We drop this assumption later. Also, we drop the subscript on the coefficients for the exogenous control variables below when it would not lead to confusion.) We now use the true conditional score rather than in the equation. Given that is in the model, the effect of individual SNPs on will generally involve a direct net effect of and an indirect effect stemming from the combination of their effect on and the effect of on Having in the equation would effectively control for pleiotropic effects on and . In standard MR, a measure of genetic tendency () for a behavior of interest ( in Eq. ) is used as an IV in an effort to purge of bias that arises from correlation between and unobservable variables in the disturbance term under the assumption that is exogenous (60, 62). One such example would be the use of a PGS for height as an instrument for height in a regression of EA on height. The problem with this approach is that the PGS for height will fail to satisfy the exclusion restriction if (some of) the genes affecting height also have a direct effect on EA (e.g., via healthy cell growth and metabolism) or if they are correlated with unmeasured environmental factors that affect EA. (Classic MR typically does not use PGSs as instruments. Instead, the idea is to use single genetic variants that are known to affect the exposure via well-understood biological mechanisms that make it unlikely to violate the exclusion restriction. In practice, limited knowledge about the biological function of most genes makes it difficult to argue that direct pleiotropic effects of the gene on the exposure and the outcome of interest exist.) If the true conditional (net of ) genetic propensity for could be directly controlled in the regression, pleiotropy would not bias coefficient estimates. For example, fixed-effects regression where the same individual is observed multiple times would effectively control for pleiotropy (which does not vary over time), but this strategy is often not available (e.g., in a study of the effect of height on EA).* Direct control for the conditional genetic propensity for is, of course, not possible, because (more specifically, the coefficients, in Eq. ) is not known. What is obtainable instead is a proxy for , namely , which contains measurement error due to finite GWAS sample size and potential bias in the estimate of in the GWAS. We refer to the combined use of as a control and as an IV for as EMR. However, controlling for as a proxy for is not a perfect solution to pleiotropy because it leaves a component of in the error term which is correlated with due to pleiotropy. As a result, the EMR estimate for will be biased. The practical question, then, is whether alternative strategies that split the GWAS sample for to obtain multiple indicators of that can be used as IVs for each other (e.g., and ) are sufficient to rescue as a practically useful IV for . This is a practical question beyond the reach of formal mathematics and best answered by simulation analyses. Unfortunately, and as we show in , the pleiotropy-induced violation of the exclusion restriction when using genetic IVs for is sufficient to produce serious bias in the estimated effect of even if one attempts to control for pleiotropy using such strategies. The magnitude of bias clearly depends on the quality of the proxies for . However, we find that pleiotropy leads to considerable bias in MR in virtually all scenarios we investigated ( and Tables S2–S15). The problems posed by pleiotropy cannot be completely eliminated without knowledge of (). However, this situation does not mean that the estimation problems introduced by pleiotropy are intractable. When endogeneity bias is driven by genetic correlation, the PGS for can still be used to obtain more accurate estimates of the effect of than can be obtained with MR or, for that matter, with OLS that lacks controls for the direct effect of genetic markers on . To gain insight into the best strategy, we consider the reasons for the pleiotropy bias. Regardless of whether the estimation strategy is OLS or the second stage of IV regression involving OLS on predicted variables from the first stage, the coefficient bias comes from the extent to which the expected estimate of the OLS coefficients differs from the true coefficients; namely,In other words, the coefficient bias from OLS is the expected regression coefficient of the error on the included variables in the regression. If is the sum of an omitted variable, , which is correlated with the regressors, and additional variables that are uncorrelated with the regressors, then the bias for each coefficient in Eq. becomes the product of the regression coefficient for in the regression of on all of the omitted variables multiplied by the effect of on the outcome. For simplicity, we assume that the only variables in the regression are and a potential proxy for , which we call . For any given proxy, , the bias in the estimate of (the coefficient for in Eq. ) comes from the expected coefficient of from a regression of on and . We consider three alternative approaches for the proxy , which we call simple OLS, GIV-C, and GIV-U. First, we use as a proxy for in a simple OLS regression. Second, we observe that is correlated with ; its inclusion in the error (by virtue of its being controlled) may affect the bias in . So we construct an estimate for , namely , by using (the unconditional PGS from the second GWAS sample) as its IV. We call this approach, where is used as the regressor in the second stage, GIV-C. We also use a third estimator that uses the same IV as in GIV-C (i.e., ), but that substitutes the unconditional PGS for (i.e., substitutes for the conditional PGS ) in the structural model in Eq. . We then use to predict , obtaining as the regressor in the second stage. We call this third approach GIV-U. We generally expect the use of GIV-C to perform better than the use of the proxy in simple OLS. If the true effect of on is positive and positive pleiotropy is present, the estimated effect of on will have positive bias. This follows from the positive correlation between and and from the positive effect of on . The presence of the proxy in the first approach (simple OLS) adds a partially offsetting negative bias, because the correlation between and is positive and the effect is also positive, but is being subtracted, which causes the offsetting bias to be negative. The net bias is expected to be positive, but we would expect it to be smaller with the inclusion of the proxy than with no proxy at all, both because the correlation between and would be lower than between and and because we expect to be attenuated relative to . When GIV-C is used instead, the term in the error becomes . The presence in the first stage of GIV-C of , which is correlated with , prevents the IV strategy from obtaining a consistent estimate of . Nonetheless, we would generally expect , and therefore we expect the positive bias for the estimate of to be smaller when using GIV-C than when estimating using OLS and the proxy . We confirm this in the simulations in . With GIV-U, the problem term in the error is . As before, the presence of the first term produces a positive bias in the estimate of , while the second term produces an offsetting negative bias. The offset will be stronger when the unconditional PGS for is the regressor in the structural model, because the coefficients of the genetic markers in are , where is the effect of the genetic marker on . The presence of in the second endogenous term in the error (i.e., the second term in ) produces a stronger downward bias. This downward bias is made still stronger by the use of instead of as the coefficient, because we expect the first-stage regression to reduce the downward bias of . In other words, we expect these three proxies to behave differently in the simulations, and, as we will see, this expectation is met in practice. We establish via a comprehensive set of simulations that GIV-C and GIV-U provide upper and lower bounds for the effect of across a range of plausible scenarios for pleiotropy and for heritability (). We further establish through simulation analyses that GIV-C and GIV-U perform similarly in the case when endogeneity arises from pleiotropy and when it arises from pleiotropy in combination with genetic confounds for reasons other than pleiotropy [epistasis, effects from rare alleles, or genetic nurturing effects, where the environment of ego is shaped by genetically related individuals to ego (63)] ( and Tables S8–S10).

Simulations

We explored the performance of GIV regression in finite sample sizes using three sets of simulation scenarios (). The simulations generated genetic and phenotypic data at the individual level from a set of known models in a training sample and a holdout sample using parameters that are realistic for genetically complex traits. We then estimated genetic effects on and using GWAS in the training sample and constructed polygenic scores with the estimated parameters for each SNP in the holdout sample. Thus, the polygenic scores in our simulations have the realistic property that their predictive accuracy increases with the size of the training (i.e., GWAS) sample and the average effect size of each SNP (45, 52). Finally, we analyzed the extent to which various estimation strategies recover the effect of the PGS for on and the effect of on in the holdout sample. We produced these estimates using OLS, MR, EMR, proxy OLS, GIV-C, and GIV-U regression, and we compared these results with the true answer across a range of parameter values. We ran 20 simulations with different random seeds for each set of parameters to obtain a distribution of estimated effects. (The computer code for these simulations is available at https://github.com/cburik/GIVsim.) The simulations specify that the true PGS scores for and covary as a result of genetic correlation. We made the conservative assumption that the entire genetic correlation between and is due to direct pleiotropy; i.e., all genes that are associated with both phenotypes have direct effects on both. In practice, this is unlikely to be the case, but it is equally unlikely that one can put a credible upper bound on (or completely rule out) direct pleiotropy. In the first set of simulations, we assumed that the entire endogeneity problem arises from genetic confounds (). In the second set of simulations, we allowed endogeneity to arise from sources that are both correlated with genes and that cause the disturbance term in the structural equation for to be correlated with the disturbance term in the structural model for even if the true conditional PGS for were included in the structural equation (). This situation would occur if rare alleles were missing from the true PGS for and the true conditional PGS for based on known SNPs and if the effect of these alleles was correlated with the true PGS scores for and . It would also occur under conditions of epistasis where nonlinear effects of genes were in the error and were correlated with the linear effects of genes in the PGS for and the conditional PGS for Third, this situation would occur if the genetic factors that affect are correlated with the environmental factors in the disturbance term for that are caused or selected by parental genes, which are correlated with the genes of sample members and therefore also with variables (like ) that are affected by the genes of sample members, i.e., by genetic nurturing (63). In the third set of simulations, we specified the presence of a correlation between the error terms in the models for and that was not itself correlated with genetic variables (). This would occur in a situation where some environmental or behavioral factor that is unrelated to genetics produces both an effect on and an effect on . A summary of these results is in Table 1 for the case where the effect of is set to 1.0 (see for details on the standardized effect size). Scenarios A–D in Table 1 refer to situations where pleiotropy is the only source of bias, scenario E contains pleiotropy plus other sources of genetic confounds, while scenarios F and G also include endogeneity from nongenetic (i.e., environmental) sources. The results provide considerable reason to be skeptical of estimates from MR. When pleiotropy is present, the MR strategy is undermined by the violation of the exclusion restriction for genetic IVs. Our results find that MR performs poorly even when nongenetic endogeneity is present along with pleiotropy. In contrast, GIV regression provides reasonable upper and lower bounds of the true effect of on if the source of endogeneity is only from pleiotropy or other genetic confounds (i.e., unobserved genetic variants, epistasis, or genetic nurturing) and the heritability of and is not extreme. GIV-C generally overestimates the effect of but the overestimation is modest at low to moderate levels of pleiotropy and heritability and is more accurate than OLS without proxies, MR, or EMR. GIV-U generally underestimates the effect of on but provides an estimate that is reasonably close to the true answer under conditions of low to moderate levels of pleiotropy and heritability. Even at higher levels of pleiotropy and heritability, the combination of GIV-C and GIV-U provides useful information about whether actually has a causal effect on and what the upper bound of this effect is likely to be.

Table 1.

Illustrative results from simulations, estimated coefficient for T

Model	Parameter	OLS	MR	GIV-C	GIV-U
A	Pleiotropy only	1.1004	1.5040	1.0131	0.9419
	hy2=hT2=0.2	(0.0001)	(0.0012)	(0.0001)	(0.0001)
B	Pleiotropy only	1.2011	1.5024	1.0604	0.8575
	hy2=hT2=0.4	(0.0001)	(0.0004)	(0.0001)	(0.0001)
C	Pleiotropy only	1.3016	1.5020	1.1573	0.7263
	hy2=hT2=0.6	(0.0001)	(0.0002)	(0.0001)	(0.0002)
D	Pleiotropy only	1.5004	3.4922	1.0776	0.8689
	hy2=0.8, hT2=0.2	(0.0005)	(0.0093)	(0.0002)	(0.0002)
E	Genetic Confounds	1.3106	1.4609	1.1922	0.6422
	hy2=hT2=0.5, ρνy=ρνT=0.5	(0.0001)	(0.0002)	(0.0001)	(0.0002)
F	Nongenetic confounds	1.4520	1.5032	1.4259	1.1193
	hy2=hT2=0.5, ρe=0.4	(0.0001)	(0.0002)	(0.0001)	(0.0001)
G	Nongenetic confounds with control	1.3643	1.5064	1.3346	0.9587
	hy2=hT2=0.5, ρe=0.4, s=0.5	(0.0001)	(0.0002)	(0.0001)	(0.0001)

Shown are mean estimated coefficient for T and SE within parentheses of 20 simulations using different estimation methods for several models. For all models the genetic correlation () was 0.5 and the coefficient for T () was 1. and are the heritability parameters of y and T. is the correlation between and the genetic confound for . is the correlation between and the genetic confound for . is the correlation between the nongenetic confound and . is the share of the confound that is controlled for in terms of variance of the confound. These results are a selection from ; see for all results. A, Table S2; B, Table S3; C, Table S4; D, Table S7; E, Table S9; F, Table S11; and G, Table S14. See row 2 of each table. See for more details on the parameters, variance, and standardized effect size.

Illustrative results from simulations, estimated coefficient for T Shown are mean estimated coefficient for T and SE within parentheses of 20 simulations using different estimation methods for several models. For all models the genetic correlation () was 0.5 and the coefficient for T () was 1. and are the heritability parameters of y and T. is the correlation between and the genetic confound for . is the correlation between and the genetic confound for . is the correlation between the nongenetic confound and . is the share of the confound that is controlled for in terms of variance of the confound. These results are a selection from ; see for all results. A, Table S2; B, Table S3; C, Table S4; D, Table S7; E, Table S9; F, Table S11; and G, Table S14. See row 2 of each table. See for more details on the parameters, variance, and standardized effect size. In the case where the true value of is zero (i.e., where the true model is Eq. ), we expect that GIV-U will produce an estimate that is close to zero as long as the endogeneity comes either from pleiotropy or from other genetic confounds (see , for details). The simulations in show that when the endogeneity is only from genetic sources, GIV-U estimates the effect of to be close to zero regardless of the level of pleiotropy or inheritance that is specified in the simulations. When the source of endogeneity is nongenetic in origin, we find that neither MR nor proxy controls for pleiotropy provide a satisfactory method for determining the effect of on . In this scenario, the pleiotropy creates endogeneity bias for genetic IVs that defeats the ability of MR to solve the problem of nongenetic endogeneity via an IV strategy. Nongenetic endogeneity can cause even GIV-U to overpredict the effect of on although in our simulations it is clearly the most accurate of all of the estimators that we have surveyed when the effect of is zero. Indeed, GIV-U always provides the most conservative estimate across the entire range of scenarios that we have surveyed, both for the case where an effect of on exists and when the effect of is zero. Inference with the GIV can be further strengthened in cases where nongenetic endogeneity can be controlled either through observable variables or through strategies such as family fixed effects that reduce or eliminate the impact of nongenetic forms of environmental endogeneity. Indeed, environmental endogeneity is a concern in most applied-research questions that use nonexperimental data. Reassuringly, our simulations show that GIV regression is a good estimation strategy in the presence of both direct pleiotropy and environmental endogeneity if control variables are available that manage to absorb a substantial share of the nongenetic confounds (). Therefore, we recommend using GIV regression always in combination with control variables the capture possible environmental confounds, ideally in datasets that allow controlling for family fixed effects (e.g., using siblings or dizygotic twins). In , we provide additional practical guidelines for GIV regression. The simulations described in this paper certainly do not cover all conceivable data-generating processes, but they are nonetheless of considerable utility.

Empirical Applications

We illustrate the practical use of GIV regression in two empirical applications using data from the Health and Retirement Survey (HRS) for 2,751 unrelated individuals of northwest European descent who were born between 1935 and 1945 ().

The Narrow-Sense SNP Heritability of EA.

First, we demonstrate that GIV regression can recover the unbiased genomic–relatedness-matrix restricted maximum-likelihood (GREML) estimate of the chip heritability of EA. Specifically, we follow the common practice in GREML estimates of heritability and analyze the residual of EA from a regression of EA on birth year, birth year squared, gender, and the first 20 principal components from the genetic data (64). Next, we standardize the residual and regress it on a standardized PGS for EA using OLS or GIV. The results are displayed in Table 2. The OLS estimate of the PGS accounts for 6.8% of the variance in EA ( = = 6.8%), which is substantially lower than the 17.3% (95% CI ± 4%) estimate of chip heritability reported by ref. 64 in the same data using GREML. [GREML yields unbiased estimates of SNP-based heritability that are not affected by attenuation (65).] Instead, the GIV regression results in columns 2 and 3 of Table 2 imply a chip heritability of 13.4% (CI ± 3.9%) and 13.8% (±4.0%), respectively. Thus, the 95% CIs of the GREML estimate and the two GIV estimates overlap, demonstrating that GIV regression can recover the chip heritability of EA from polygenic scores.

Table 2.

Effects of the polygenic score for educational attainment (PGS EA) on (residualized) educational attainment in the Health and Retirement Study (HRS)

Variable	OLS	IV1	IV2
PGS EA UKB	0.259***	0.523***
	(0.0183)	(0.0385)
PGS EA SSGAC			0.530***
			(0.0389)
ĥ2	NA	0.134	0.138
	NA	(0.0197)	(0.0202)
N	2,751	2,751	2,751

*, **, ***. We regress the residual of EA on the different PGSs and calculate the implied heritability estimates. SEs are in parentheses. All variables have been standardized. EA is measured in years of schooling needed to obtain the highest achieved educational degree according to International Standard Classification of Education (ISCED) classifications. We use the residual of EA after a regression on birth year, birth year squared, gender, and the first 20 principal components in the genetic data. PGS EA SSGAC: PGS for EA using meta-analysis from ref. 15, excluding data from 23andMe, UK Biobank (UKB), and HRS; PGS EA UKB, PGS for EA using UKB data. IV1 uses PGS EA SSGAC as instrument and IV2 uses PGS EA UKB as instrument. NA, not applicable.

Effects of the polygenic score for educational attainment (PGS EA) on (residualized) educational attainment in the Health and Retirement Study (HRS) *, **, ***. We regress the residual of EA on the different PGSs and calculate the implied heritability estimates. SEs are in parentheses. All variables have been standardized. EA is measured in years of schooling needed to obtain the highest achieved educational degree according to International Standard Classification of Education (ISCED) classifications. We use the residual of EA after a regression on birth year, birth year squared, gender, and the first 20 principal components in the genetic data. PGS EA SSGAC: PGS for EA using meta-analysis from ref. 15, excluding data from 23andMe, UK Biobank (UKB), and HRS; PGS EA UKB, PGS for EA using UKB data. IV1 uses PGS EA SSGAC as instrument and IV2 uses PGS EA UKB as instrument. NA, not applicable.

The Relationship Between Body Height and Educational Attainment.

Previous studies using both OLS and sibling or twin fixed-effects methods have found that taller people generally have higher levels of EA (66–68). They are also more likely to perform well in various other life domains, including earnings, higher marriage rates for men (although with higher probabilities of divorce), and higher fertility (69–74). The question is what drives these results. Can they be attributed to genetic effects that jointly influence these outcomes? Are there social mechanisms that systematically favor taller or penalize shorter individuals? Or are there nongenetic factors (e.g., the uterine and postbirth environments especially related to nutrition or disease) that affect both height and these life-course outcomes? The literature on the relationship between height and EA has found evidence that the association arises largely through the relationship between height and cognitive ability, which may suggest that the height–EA association is driven largely by genetic association between height and cognitive ability. We use GIV regression with individual-level data from the HRS to clarify the influence of height on EA, and we compare these results with those obtained from OLS and from MR. In addition, we conduct a negative control experiment that estimates the causal effect of EA on body height (which should be zero). A complete description of the materials and methods is available in . GWAS summary statistics for height were obtained from the Genetic Investigation of Anthropometric Traits (GIANT) consortium (35) and by running a GWAS on height (conditional on EA and unconditional on EA) in the UKB (75). The UKB was not part of the GIANT sample. GWAS summary statistics for EA were obtained from the Social Science Genetic Association Consortium (SSGAC) for the unconditional PGSs. The most recent study of the SSGAC on EA used a meta-analysis of 64 cohorts for genetic discovery (15). We obtained meta-analysis results from this study with the HRS, UKB, and 23andMe cohorts excluded and we refer to the PGS constructed from these results as EA SSGAC. Furthermore, we obtained GWAS estimates for EA in the full UKB release () from ref. 76. We refer to this PGS as PGS EA unc. UKB. We also created a PGS for EA conditional on height by running a GWAS on EA in the same UKB sample (PGS EA cond. UKB). There is sample overlap between Height_GIANT and EA_SSGAC. Therefore, whenever one of the two was used as regressor, we excluded the other as instrument and used a PGS from UK Biobank data instead to ensure independence of measurement errors in the PGS. In Table 3 we report the estimated standardized effect of height on EA. The OLS results show that height appears to have a strong positive effect on EA, with 2.5 additional centimeters in height generating one additional month of schooling. MR appears to confirm the causal interpretation of the OLS result; indeed, the point estimate from MR is even slightly larger than from OLS. As discussed above, MR suffers from probable violations of the exclusion restriction due to pleiotropy. These violations could stem from the possibility that some genes have direct effects on both height and EA. (Results from refs. 15 and 18 suggest a genetic correlation between height and EA of about 0.15.) They could also stem from the possibility that the PGS for height by itself is correlated with the genetic tendency for parents to have higher EA and income, which enables higher parental investments into their children who may therefore be more likely to reach their full cognitive potential and have higher EA. Controlling for the PGS is an imperfect strategy for eliminating this source of endogeneity because the bias in the estimated effect of the PGS score also biases the estimated effect of height.

Table 3.

Estimates of the effect of height on educational attainment (EA)

Variable	OLS	MR	GIV-C	GIV-U
Height	0.136***	0.160***	0.168***	0.110***
	(0.0262)	(0.0481)	(0.0264)	(0.0262)
PGS EA cond. UKB			0.396***
			(0.0367)
PGS EA uncond. UKB				0.384***
				(0.0354)
N	2,751	2,751	2,751	2,751

*, **, ***. Standardized effect sizes and SEs are in parentheses. Birth year, birth year squared, gender, EA mother, EA father, and the first 20 principal components are included as control variables. For MR, a PGS for height from UK Biobank (UKB) data was used as instrument for height. For GIV-C and GIV-U PGS EA SSGAC was used as an instrument.

Estimates of the effect of height on educational attainment (EA) *, **, ***. Standardized effect sizes and SEs are in parentheses. Birth year, birth year squared, gender, EA mother, EA father, and the first 20 principal components are included as control variables. For MR, a PGS for height from UK Biobank (UKB) data was used as instrument for height. For GIV-C and GIV-U PGS EA SSGAC was used as an instrument. If all of the bias in EA came from positive pleiotropy, then we would expect GIV-C and GIV-U to provide upper and lower bounds for the true effect of height on EA, respectively. Thus, if the only source of endogeneity is pleiotropy, the results in Table 3 would suggest that the true standardized effect is between 0.11 (from GIV-U) and 0.17 (from GIV-C). However, the negative control regression results in Table 4 provide substantial evidence of endogeneity bias from environmental sources. Given that the true effect of EA on height should be zero, then GIV-U would accurately estimate this effect to be zero in the absence of environmental endogeneity. Instead, GIV-U reports a significant positive effect of EA on height. MR also reports a positive and statistically significant effect of EA on height. This upward bias in the MR estimate is strong evidence of pleiotropy bias that invalidates the IV in MR. The guidance from the simulations points to a true estimate of the effect of height on EA that is as small or smaller than the GIV-U estimate, which is 25% smaller than the estimate from MR. The extent of upward bias in the GIV-U estimate depends on the strength of environmental variables that simultaneously affected the height of HRS respondents and also affected their EA.

Table 4.

Estimates of the effect of educational attainment (EA) on height

Variable	OLS	MR	GIV-C	GIV-U
EA	0.072***	0.179**	0.050***	0.040***
	(0.0138)	(0.0543)	(0.0119)	(0.0120)
PGS height cond. UKB			0.448***
			(0.0174)
PGS height unc. UKB				0.446***
				(0.0175)
N	2,751	2,751	2,751	2,751

*, **, ***. Standardized effect sizes and SEs are in parentheses. Birth year, birth year squared, gender, EA mother, EA father, and the first 20 principal components are included as control variables. For MR, a PGS for EA from UK Biobank (UKB) data was used as an instrument for EA. For GIV-C and GIV-U PGS height GIANT was used as an instrument.

Estimates of the effect of educational attainment (EA) on height *, **, ***. Standardized effect sizes and SEs are in parentheses. Birth year, birth year squared, gender, EA mother, EA father, and the first 20 principal components are included as control variables. For MR, a PGS for EA from UK Biobank (UKB) data was used as an instrument for EA. For GIV-C and GIV-U PGS height GIANT was used as an instrument. These results also point toward a productive strategy for learning more about the true effect of height on EA. The demonstration that pleiotropy as well as environmental bias is affecting the estimates in Table 3 implies that there is no effective fix for MR; its genetic instruments are contaminated by pleiotropy and therefore cannot be used to adjust for environmental endogeneity. Comparisons between monozygotic twins would effectively control for pleiotropy, but unobserved environmental factors affecting height and EA could still bias the results from MZ twin data. The most productive strategy is arguably to use GIV-U and GIV-C as proxy strategies to address bias from pleiotropy while also correcting for environmental confounds either by controlling for the relevant environmental factors directly or by using data on siblings that allow controlling for shared environmental variables via family fixed-effects. With such data, the estimates from GIV-C and GIV-U would provide approximate bounds as long as the GIV-U estimate of the effect of EA on height was close to zero. If the negative control regression suggests bias from environmental sources, the GIV-U estimate will be more conservative than all of the other estimators considered in this paper, and it will underpredict the true effect unless positive bias from both pleiotropy and genetic-unrelated endogeneity is quite strong. These results do not provide as clean and neat a conclusion as might be desired, although uncertainty is inevitable in the absence of experimental data or a valid IV. At the same time, the empirical example provides considerable insight into the implications of the available estimates. Our results strongly imply that OLS provides an upwardly biased estimate of the effect of height on EA. They also strongly imply that the MR estimate suffers from pleiotropy bias and that MR is not an effective strategy for determining whether and to what extent height affects EA. Other studies suggest that pleiotropy between EA and height is not extremely high (18). Our simulation results therefore suggest that GIV-U either is a plausible estimate for the effect of height on EA if genetic-unrelated endogeneity is relatively strong or underpredicts this effect if the endogeneity is weaker. If the estimate of GIV-U continued to be positive and statistically significant in a sibling fixed-effects analysis, we would be rather confident that the effect is real and not an artifact of bias either from pleiotropy or from genetic-unrelated endogeneity. In other words, these results represent progress toward the goal of understanding how large the social advantage provided by height is in the process of EA given the existence of genetic confounds. (Obviously, none of the estimation strategies we discussed here address possible bias from nonrandom selection into samples.)

Conclusion

Accurate estimation of causal relationships with observational data is one of the biggest and most important challenges in epidemiology and the social sciences—two fields of inquiry where many questions of interest cannot be adequately addressed with properly designed experiments due to practical or ethical constraints. One important confound in nonexperimental data comes from direct pleiotropic effects of genes on the exposure and the outcome of interest. Both OLS and MR yield biased results in this case. We proposed GIV regression as an empirical strategy that controls for such pleiotropic effects using polygenic scores. GIV regression uses standard IV estimation algorithms such as two-stage least squares that are widely available in existing statistical software packages. Our approach provides reasonable upper and lower bounds of causal effects in situations when pleiotropic effects of genes are the only source of bias. We showed that OLS, MR, and GIV regression yield biased estimates if both genetic and environmental sources of endogeneity are present. However, GIV regression still outperforms OLS and MR in this scenario. Furthermore, GIV regression can (and should) be combined with additional strategies that allow controlling for bias from purely environmental or behavioral factors, such as using covariates or family-fixed effects. Together, these approaches can provide reasonable estimates of causal effects across a broad range of scenarios. GIV regression is called for whenever an experimental design, a valid IV strategy, or a large-enough sample of MZ twins is not available and when pleiotropy is a potential problem—a situation that is frequently encountered in practice. The main requirements for GIV regression are a prediction sample that has been comprehensively genotyped and large-scale GWAS results for the outcome of interest from two nonoverlapping samples. Due to rapidly falling genotyping costs that enable a growing availability of genetic data and large GWAS samples for many traits, these requirements have become increasingly feasible for many applications. Indeed, the combination of new estimation tools and continued rapid advancements in genetics should provide a significant improvement in our understanding of the effects of behavioral and environmental variables on important socioeconomic and medical outcomes.

52 in total

1. Genetics and general cognitive ability.

Authors: R Plomin
Journal: Nature Date: 1999-12-02 Impact factor: 49.962

2. Latent variables in psychology and the social sciences.

Authors: Kenneth A Bollen
Journal: Annu Rev Psychol Date: 2002 Impact factor: 24.137

3. Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology?

Authors: Debbie A Lawlor; George Davey Smith; Shah Ebrahim
Journal: Int J Epidemiol Date: 2004-05-27 Impact factor: 7.196

4. Correlation and causation: a comment.

Authors: Stephen Stigler
Journal: Perspect Biol Med Date: 2005 Impact factor: 1.416

5. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

6. Cohort Profile: the Health and Retirement Study (HRS).

Authors: Amanda Sonnega; Jessica D Faul; Mary Beth Ofstedal; Kenneth M Langa; John W R Phillips; David R Weir
Journal: Int J Epidemiol Date: 2014-03-25 Impact factor: 7.196

7. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

8. Maternal expressed emotion predicts children's antisocial behavior problems: using monozygotic-twin differences to identify environmental effects on behavioral development.

Authors: Avshalom Caspi; Terrie E Moffitt; Julia Morgan; Michael Rutter; Alan Taylor; Louise Arseneault; Lucy Tully; Catherine Jacobs; Julia Kim-Cohen; Monica Polo-Tomas
Journal: Dev Psychol Date: 2004-03

9. De novo and inherited CNVs in MZ twin pairs selected for discordance and concordance on Attention Problems.

Authors: Erik A Ehli; Abdel Abdellaoui; Yueshan Hu; Jouke Jan Hottenga; Mathijs Kattenberg; Toos van Beijsterveldt; Meike Bartels; Robert R Althoff; Xiangjun Xiao; Paul Scheet; Eco J de Geus; James J Hudziak; Dorret I Boomsma; Gareth E Davies
Journal: Eur J Hum Genet Date: 2012-04-11 Impact factor: 4.246

10. Causal associations between risk factors and common diseases inferred from GWAS summary data.

Authors: Zhihong Zhu; Zhili Zheng; Futao Zhang; Yang Wu; Maciej Trzaskowski; Robert Maier; Matthew R Robinson; John J McGrath; Peter M Visscher; Naomi R Wray; Jian Yang
Journal: Nat Commun Date: 2018-01-15 Impact factor: 14.919

22 in total

1. The promise of genes for understanding cause and effect.

Authors: Dalton Conley; Simone Zhang
Journal: Proc Natl Acad Sci U S A Date: 2018-05-21 Impact factor: 11.205

2. Genetic Associations Between Executive Functions and a General Factor of Psychopathology.

Authors: K Paige Harden; Laura E Engelhardt; Frank D Mann; Megan W Patterson; Andrew D Grotzinger; Stephanie L Savicki; Megan L Thibodeaux; Samantha M Freis; Jennifer L Tackett; Jessica A Church; Elliot M Tucker-Drob
Journal: J Am Acad Child Adolesc Psychiatry Date: 2019-05-16 Impact factor: 8.829

Review 3. Using genetics for social science.

Authors: K Paige Harden; Philipp D Koellinger
Journal: Nat Hum Behav Date: 2020-05-11

4. Assessing the contribution of genetic nurture to refractive error.

Authors: Jeremy A Guggenheim; Rosie Clark; Tetyana Zayats; Cathy Williams
Journal: Eur J Hum Genet Date: 2022-05-27 Impact factor: 4.246

5. The future of human behaviour research.

Authors: Janet M Box-Steffensmeier; Jean Burgess; Maurizio Corbetta; Kate Crawford; Esther Duflo; Laurel Fogarty; Alison Gopnik; Sari Hanafi; Mario Herrero; Ying-Yi Hong; Yasuko Kameyama; Tatia M C Lee; Gabriel M Leung; Daniel S Nagin; Anna C Nobre; Merete Nordentoft; Aysu Okbay; Andrew Perfors; Laura M Rival; Cassidy R Sugimoto; Bertil Tungodden; Claudia Wagner
Journal: Nat Hum Behav Date: 2022-01

6. Social and genetic associations with educational performance in a Scandinavian welfare state.

Authors: Martin A Isungset; Dalton Conley; Henrik D Zachrisson; Eivind Ystrom; Alexandra Havdahl; Pål R Njølstad; Torkild Hovde Lyngstad
Journal: Proc Natl Acad Sci U S A Date: 2022-06-16 Impact factor: 12.779

7. The association of serum magnesium and chronic kidney disease: a two-sample mendelian randomization study of European descent.

Authors: Chenyang Hou; Yun Wang; Xinxia Sui; Wei Xin; Qingzhi Hou; Jihu Yi; Huichen Yao; Weihua Liu; Zhiyuan Yu; Lichuan Xia; Qing Guo
Journal: Eur J Clin Nutr Date: 2022-03-08 Impact factor: 4.884

8. Resource profile and user guide of the Polygenic Index Repository.

Authors: Joel Becker; Casper A P Burik; Grant Goldman; Nancy Wang; Hariharan Jayashankar; Michael Bennett; Daniel W Belsky; Richard Karlsson Linnér; Rafael Ahlskog; Aaron Kleinman; David A Hinds; Avshalom Caspi; David L Corcoran; Terrie E Moffitt; Richie Poulton; Karen Sugden; Benjamin S Williams; Kathleen Mullan Harris; Andrew Steptoe; Olesya Ajnakina; Lili Milani; Tõnu Esko; William G Iacono; Matt McGue; Patrik K E Magnusson; Travis T Mallard; K Paige Harden; Elliot M Tucker-Drob; Pamela Herd; Jeremy Freese; Alexander Young; Jonathan P Beauchamp; Philipp D Koellinger; Sven Oskarsson; Magnus Johannesson; Peter M Visscher; Michelle N Meyer; David Laibson; David Cesarini; Daniel J Benjamin; Patrick Turley; Aysu Okbay
Journal: Nat Hum Behav Date: 2021-06-17

9. The impact of BMI on mental health: Further evidence from genetic markers.

Authors: Vikesh Amin; Carlos A Flores; Alfonso Flores-Lagunes
Journal: Econ Hum Biol Date: 2020-06-13 Impact factor: 2.774

10. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses.

Authors: George Davey Smith; Bjørn Olav Åsvold; Gibran Hemani; Neil M Davies; Ben Brumpton; Eleanor Sanderson; Karl Heilbron; Fernando Pires Hartwig; Sean Harrison; Gunnhild Åberge Vie; Yoonsu Cho; Laura D Howe; Amanda Hughes; Dorret I Boomsma; Alexandra Havdahl; John Hopper; Michael Neale; Michel G Nivard; Nancy L Pedersen; Chandra A Reynolds; Elliot M Tucker-Drob; Andrew Grotzinger; Laurence Howe; Tim Morris; Shuai Li; Adam Auton; Frank Windmeijer; Wei-Min Chen; Johan Håkon Bjørngaard; Kristian Hveem; Cristen Willer; David M Evans; Jaakko Kaprio
Journal: Nat Commun Date: 2020-07-14 Impact factor: 14.919