Literature DB >> 30356716

Including Phenotypic Causal Networks in Genome-Wide Association Studies Using Mixed Effects Structural Equation Models.

Mehdi Momen¹, Ahmad Ayatollahi Mehrgardi¹, Mahmoud Amiri Roudbar¹, Andreas Kranis², Renan Mercuri Pinto^3,4, Bruno D Valente⁴, Gota Morota⁵, Guilherme J M Rosa^4,6, Daniel Gianola^4,6,7.

Abstract

Network based statistical models accounting for putative causal relationships among multiple phenotypes can be used to infer single-nucleotide polymorphism (SNP) effect which transmitting through a given causal path in genome-wide association studies (GWAS). In GWAS with multiple phenotypes, reconstructing underlying causal structures among traits and SNPs using a single statistical framework is essential for understanding the entirety of genotype-phenotype maps. A structural equation model (SEM) can be used for such purposes. We applied SEM to GWAS (SEM-GWAS) in chickens, taking into account putative causal relationships among breast meat (BM), body weight (BW), hen-house production (HHP), and SNPs. We assessed the performance of SEM-GWAS by comparing the model results with those obtained from traditional multi-trait association analyses (MTM-GWAS). Three different putative causal path diagrams were inferred from highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 using the inductive causation algorithm. A positive path coefficient was estimated for BM → BW, and negative values were obtained for BM → HHP and BW → HHP in all implemented scenarios. Further, the application of SEM-GWAS enabled the decomposition of SNP effects into direct, indirect, and total effects, identifying whether a SNP effect is acting directly or indirectly on a given trait. In contrast, MTM-GWAS only captured overall genetic effects on traits, which is equivalent to combining the direct and indirect SNP effects from SEM-GWAS. Although MTM-GWAS and SEM-GWAS use the similar probabilistic models, we provide evidence that SEM-GWAS captures complex relationships in terms of causal meaning and mediation and delivers a more comprehensive understanding of SNP effects compared to MTM-GWAS. Our results showed that SEM-GWAS provides important insight regarding the mechanism by which identified SNPs control traits by partitioning them into direct, indirect, and total SNP effects.

Entities: Chemical Disease Gene Mutation Species

Keywords: GWAS; SEM; SNP effect; causal structure; multiple traits; path analysis

Year: 2018 PMID： 30356716 PMCID： PMC6189326 DOI： 10.3389/fgene.2018.00455

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.599

Introduction

Genome-wide association studies (GWAS) have become a standard approach for investigating relationships between common genetic variants in the genome (e.g., single-nucleotide polymorphisms, SNPs) and phenotypes of interest in human, plant, and animal genetics (Hayes and Goddard, 2010; Brachi et al., 2011; Wang et al., 2012). A typical GWAS is based on univariate linear or logistic regression of phenotypes on genotypes for each SNP individually while often adjusting for the presence of nuisance covariates (Hayes and Goddard, 2010; Sikorska et al., 2013). A statistically significant association indicates that SNPs may be in strong linkage disequilibrium (LD) with quantitative trait loci (QTL) that contribute to the trait etiology. Alternatively, multi-trait model GWAS (MTM-GWAS) can be used to test for genetic associations among a set of traits (Korte et al., 2012; O'Reilly et al., 2012; Zhou and Stephens, 2012). It has been established that MTM-GWAS reduces false positives and increases the statistical power of association tests, explaining the recent popularity of this method. MTM-GWAS can be used to study genetic associations among a set of traits. However, it does not consider various cryptic biological signals that may affect a trait of interest, either directly or indirectly through other intermediate traits. Complex traits are the product of various cryptic biological signals that may affect a trait of interest either directly or indirectly through other intermediate traits (Falconer and Mackay, 1996). A standard regression cannot describe such complex relationships between traits and QTLs properly. For instance, some traits may simultaneously act as both dependent and independent variables. Structural equation modeling (SEM) is an extended version of Wright's path analysis (Wright, 1921; Gianola and Sorensen, 2004) that offers a powerful technique for modeling causal networks. In a complex genotype-phenotype setting involving many traits, a given trait can be influenced not only by genetic and systematic factors but also by other traits (as covariates). Here, QTLs may not affect the target trait directly; instead, the effects may be mediated by upstream traits in a causal network. Indirect effects may therefore constitute a proportion of perceived pleiotropy, and these concepts apply to sets of heritable traits, organized as networks, that are common in biological systems. An example from dairy cattle production systems, described by Gianola and Sorensen (2004), is that higher milk yield increases the risk of a particular disease, such as mastitis, while the prevalence of the disease may negatively affect milk yield As another example, Varona et al. (2007) explored a causal link from litter size to average piglet weight in two pig breeds. In humans, obesity is a key factor influencing insulin resistance, which subsequently causes type 2 diabetes. Lists of causal networks across human diseases and candidate genes are described in Kumar and Agrawal (2013) and Schadt (2016). Although MTM-GWAS is a valuable approach, it only captures correlations or associations among traits and does not provide information about causal relationships. Knowledge of the causal structures underlying complex traits is essential, as correlation does not imply causation. For example, a correlation between two traits, T1 and T2, could be attributed to a direct effect of T1 on T2 or T2 on T1, or to additional variables that jointly influence both traits (Rosa et al., 2011). Likewise, if we know a “causal” SNP is linked to a QTL, we can imagine three possible scenarios with respect to T11: (1) causal (SNP → T1 → T2), (2) reactive (SNP → T2 → T1), or (3) independent (T1 ← SNP → T2). Scenarios (1) and (2) do not cause pleiotropy but produce association. A SEM methodology has the ability to handle complex genotype-phenotype maps in GWAS, placing an emphasis on causal networks (Li et al., 2006). Therefore, SEM-based GWAS (SEM-GWAS) may provide a better understanding of biological mechanisms and of relationships among a set of traits than MTM-GWAS. SEM can potentially decompose the total SNP effect on a trait into direct and indirect (i.e., mediated) contributions. However, SEM-derived GWAS has yet not been discussed or applied fully in quantitative genetic studies yet. Our objective was to illustrate the potential utility of SEM-GWAS by using three production traits in broiler chickens genotyped for a battery of SNPs as a case example.

Materials and methods

Data set

The analysis included records for 1,351 broiler chickens provided by Aviagen Ltd. (Newbridge, Scotland) for three phenotypic traits: ultrasound of breast muscle (BM) at 35 days of age, body weight (BW), and hen-house egg production (HHP), defined as the total number of eggs laid between weeks 28 and 54 per bird. The sample consisted of 274 full-sib families, 326 sires, and 592 dams. More details regarding population and family structure were provided by Momen et al. (2017). A pre-correction procedure was performed on the phenotypes to account for systematic effects such as sex, hatch week, pen, and contemporary group for BM and BW. HHP was corrected for random hatch effects, with a general mean as the sole fixed effect. Each bird was genotyped for 580,954 SNP markers with a 600k Affymetrix SNP (Kranis et al., 2013) chip (Affymetrix, Inc., Santa Clara, CA, USA). The Beagle software program (Browning and Browning, 2007) was used to impute missing SNP genotypes, and quality control was performed using PLINK version 1.9 (Purcell et al., 2007). Markers with minor allele frequencies (MAF) < 1%, call rate < 95%, and Hardy–Weinberg equilibrium (Chi-square test p-value threshold was 10−6) were removed. The main reason for conducting the HWE test was to remove SNPs with potential genotyping error. Finally, 354,364 autosomal SNP markers were included in the analysis.

Multiple-trait model for gwas

MTM-GWAS is a single-trait GWAS model extended to multi-dimensional responses. When only considering additive effects of SNPs, the phenotype of a quantitative trait using the single-trait model can be described as: where y is the phenotypic trait of individual i, w = (w1, …, w) is the number of A alleles (i.e., w∈{0, 1, 2}) in the genotype of SNP marker j, and s is the allele substitution effect for SNP marker j. Strong LD between markers and QTLs coupled with an adequate marker density increases the chance of detecting marker and phenotype associations. Hypothesis testing is typically used to evaluate the strength of the evidence of a putative association. Typically, a t-test is applied to obtain p-values, and the statistic is, where ŝ is the point estimate of the j-th SNP effect and se(ŝ) is its standard error. The single locus model described above is naive for a complex trait because the data typically contain hidden population structure and individuals have varying degrees of genetic similarity (Listgarten et al., 2012; Gianola et al., 2016). Therefore, accounting for covariance structure induced by genetic similarity is expected to produce better inferences (Kennedy et al., 1992). Ignoring effects that reveal genetic relatedness inflates the residual terms and compromises the ability to detect association. A random effect g, including a covariance matrix reflecting pairwise similarities between additive genetic effects of individuals, can be included to control population stratification. The similarity metrics can be derived from pedigree information or from whole-genome marker genotypes. This model, extended for analysis of t traits, is given by: where is the pre-adjusted phenotypic value measured on each birds, as previously defined, represent the incidence matrix of genotype codes, is the vector of additive marker effect, is the vector of random polygenic effect, , and ε represents the residual vector, . Here ⊗ denotes the Kronecker product. The covariance matrices were: The positive definite matrix K may be a genomic relationship matrix (G) computed from marker data, or a pedigree-based matrix (A) computed from genealogical information. The A matrix describes the expected additive similarity among individuals, while G measures the realized fraction of alleles shared. Genomic relationship matrices can be derived in several ways (VanRaden, 2008; Yang et al., 2010; Forni et al., 2011). Here, we used the form proposed by VanRaden (2008): where M is an n × p matrix of centered SNP genotypes and p and q = 1 − p are the allele frequencies at marker locus j. We evaluated both A and G in the present study.

Structural equation model association analysis

A SEM consists of two essential parts: a measurement model and a structural model. The measurement model depicts the connections between observable variables and their corresponding latent variables (Anderson and Gerbing, 1988). The measurement model is also known as confirmatory factor analysis. The critical part of a SEM is the structural model, which can have three forms (Raykov and Marcoulides, 2012). The first consists of observable exogenous and endogenous variables. This model is a restricted version of a SEM known as path analysis (Wright, 1921). The second form explains the relationship between exogenous and endogenous variables that are only latent. The third type is a model consisting of both manifest and latent variables. SEM can be applied to GWAS as an alternative to MTM-GWAS to study how different causal paths mediate SNP effects on each trait. The following SEM model was considered: where Λ is a t×t matrix of regression coefficients or structural coefficients (typically lower-triangular) according to the learned causal structure from the residuals and the diagonal matrix filled with zeros: The vectors g and ε are assumed to have a joint distribution , and the residual covariance matrix is a diagonal as . The remaining terms are as presented earlier with one important difference: the SNP effects are not interpreted as overall effects on trait t but instead represent direct effects on trait t. Additional indirect effects from the same SNP may be mediated by phenotypic traits in C. Each marker is entered into Equation (4) separately, and its significance is tested. For a discussion of how SEM represents genetic signals on each trait through multiple causal paths, see Wu et al. (2010) and Jamrozik and Schaeffer (2011). Despite the difference in interpretation, the distribution of the vector of polygenic effects is assumed to be the same as in the MTM-GWAS model. The same applies to residual terms within a trait. We also consider trait-specific residuals to be independent within an individual. This restriction is required to render structural coefficients likelihood-identifiable. In addition, the interpretation of inferences as having a causal meaning requires imposing the restriction that the residuals' joint distribution be interpreted as the causal sufficiency assumption (Pearl, 2009). In the present study, all exogenous and endogenous variables were observable, and there was no latent variable. Hence, causal structure was assumed between the endogenous variables BM, BW, and HHP. We considered the following GWAS models with their causal structures were recovered by the inductive causation (IC) algorithm (Pearl, 2009): (1) MTM-GWAS with pedigree-based kinship A (MTM-A) or marker-based kinship G (MTM-G), and (2) SEM-GWAS with A (SEM-A) or G (SEM-G). Although nuisance covariates such as environmental factors can be omitted in the graph, they may be incorporated into the models as exogenous variables. The SEM representation allowed us to decompose SNP effects into direct, indirect, and total effects. A direct SNP effect is the path coefficient between a SNP as an exogenous variable and a dependent variable without any causal mediation by any other variable. The indirect effects of a SNP are those mediated by at least one other intervening endogenous variable. Indirect effects are calculated by multiplying path coefficients for each path linking the SNP to an associated variable, and then summing over all such paths (Mi et al., 2010a; Jiang et al., 2013). The overall effect is the sum of all direct and indirect effects. By explicitly accounting for complex relationship structure among traits in such a way, SEM provides a better understanding of a genome-wide SNP analysis by allowing us to decompose effects into direct, indirect, and overall effects within a predefined casual framework (Nock and Zhang, 2011). MTM-GWAS and SEM-GWAS were compared with the logarithm of the likelihood function (log L), Akaike's Information Criterion (AIC), and the Bayesian Information Criterion (BIC). The model providing the lowest values for these information criteria is considered to fit the data better. MTM-GWAS and SEM-GWAS were fitted using the SNP Snappy strategy (Meyer and Tier, 2012), which is implemented in the Wombat software program (Meyer, 2007). The outputs were a vector of multiple SNP effect estimates, ŝ = [ŝ, ŝ, ŝ], with corresponding standard errors and respective t-values.

Searching for a phenotypic causal network in a mixed model

In the SEM-GWAS formulation described earlier, the structure of the underlying causal phenotypic network needs to be known. Because this is not so in practice, we used a causal inference algorithm to infer the structure. Residuals are assumed to be independent in all SEM analyses, so associations between observed traits are viewed as due to causal links between traits and by correlations among genetic values (i.e., g1, g2, and g3). Thus, to eliminate confounding problems when inferring the underlying network among traits, we used the approach of Valente et al. (2010) to search for acyclic causal structures through conditional independencies on the distribution of the phenotypes, given the genetic effects. A causal phenotypic network was inferred in two stages: (1) an MTM model (Henderson and Quaas, 1976) was employed to estimate covariance matrices of additive genetic effects and of residuals, and (2) the causal structure among phenotypes from the covariance matrix between traits, conditionally on additive genetic effects, was inferred by the IC algorithm. The residual (co)variance matrix was inferred using Bayesian Markov-chain Monte Carlo (Valente et al., 2010; Wu et al., 2010), with samples drawn from the posterior distribution. The reason for our use of the residual (co)covariances is that the residual structure could bear information from the joint distribution of all phenotypic traits conditional on their polygenic effects, such that they correct the confounding issues caused by such effects when the traits are genetically correlated (Pearl, 2009). For each query testing statistical independence between traits y and , the posterior distribution of the residual partial correlation was obtained, where h is a set of variables (traits) that are independent. Three highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 were used to make statistical decisions for SEM-GWAS. We thus considered SEM-A75 (HPD > 0.75), SEM-A85 (HPD > 0.85), SEM-A95 (HPD > 0.95), and SEM-G75 (HPD > 0.75). An HPD interval that does not contain zero declares y and to be conditionally dependent.

Results

Figure 1 shows phenotypic relationship structures recovered by the IC algorithm for the three different HPD intervals. Edges connecting two traits represent non-null partial correlations as indicated by HPD intervals. We compared the two MTM-GWAS and four SEM-GWAS by using the three chicken traits (BW, BM, and HHP). Fully recursive (there is at least one incoming OR outgoing edge for each node) SEM-A75 and SEM-G75 graphs revealed direct effects of BM on BW and HHP, and those of BW on HHP, as well as an indirect effect of BM on HHP. In addition, SEM-A85 detected a direct effect of BM on BW, the direct effect of BW on HHP, and the indirect effect of BM on HHP mediated by BW. Finally, SEM-A95 only identified a direct effect of BM on BW because of a statistically stringent HPD cutoff imposed. SEM-G85 and SEM-G95 were not explored further because they produced the same results as SEM-A85 and SEM-A95.

Figure 1

Causal graphs inferred using the IC algorithm among three traits: breast meat (BM), body weight (BW), and hen-house production (HHP) in the chicken data. SEM-A75 and SEM-G75 were the inferred fully recursive causal structures with HPD > 0.75 and corrected for genetic confounder using A (pedigree-based) and G (marker-based) matrices. SEM-A85 and SEM-A95 were obtained with HPD > 0.85 and HPD > 0.95, respectively, corrected with A. Arrows indicate direction of causal relationships. Dashed lines indicate negative coefficients, and the continuous arrows indicate positive coefficients. Given the causal structures inferred from the IC algorithm, the following SEM was fitted: Note that only a small number of the entries in the structural coefficient matrix (λ in Equation 5) are non-zero due to sparsity. These non-zero entries specify the effect of one phenotype on other phenotypes. The corresponding directed acyclic graph is shown in Figure 2 assuming the causal relationships among the three traits, where y1, y2, and y3 represent BM, BW, and HHP, respectively; SNPj is the genotype of the j-th SNP; sjl is the direct SNP effect on trait l; and the remaining variables are as presented earlier. This diagram depicts a fully recursive structure in which all recursive relationships among the three phenotypic traits are shown. Arrows represent causal connections, whereas double-headed arrows between polygenic effects are correlations.

Figure 2

A diagram for causal path analysis of SNP effects in a fully recursive structural equation model for three traits, p exogenous independent SNP variables, and three correlated polygenic effects. Arrows indicate the direction of causal effects and dashed lines represent associations among the three phenotypes. Genetic correlation between traits (r), polygenic effects (g), environmental effect on trait t (e), effects of j th SNP on t th trait (S), and recursive effect of phenotype t′ on phenotype l (). Dashed lines indicate negative coefficients and the continuous arrows indicate positive coefficients. We examined the fit of each model implemented to assess how well it describes the data (Table 1). Varona et al. (2007) and recently Valente et al. (2013) showed that re-parametrization and reduction of a SEM mixed model yield the same joint probability distribution of observation as in MTM, suggesting that the expected likelihood of SEM and MTM should be similar. As expected, SEM-GWAS and MTM-GWAS showed very similar results (e.g., SEM-A75 vs. MTM-A and SEM-G75 vs. MTM-G). Among the models considered, those involving G exhibited slightly better fits. SEM-A85 and SEM-A95, sharing a subset of the SEM-A75 structure, presented almost identical AIC and BIC values. Since these results imply that the recursive model and standard mixed model for GWAS are statistically equivalent in terms of the fitting criteria, the focus of the remainder of the analysis will be on the modeling of SNP (or QTL) effects in the SEM context (SEM-A75 or SEM-G75) as an extension of MTM, which accounts for recursive links among the three measured traits.

Table 1

Model comparison criteria: logarithm of the restricted maximum likelihood function (log L), Akaike's information criteria (AIC), Schwarz Bayesian information criteria (BIC) were used evaluate model fit for two multiple trait models (MTM) and four structural equation models (SEM).

Model	Maximum log L	-1/2 AIC	-1/2 BIC
MTM-A	−7093.480	−7105.48	−7142.436
SEM-A75	−7098.370	−7110.415	−7147.321
SEM-A85	−7095.188	−7107.188	−7144.143
SEM-A95	−7097.517	−7109.517	−7146.470
MTM-G	−6529.270	−6541.276	−6578.232
SEM-G75	−6537.391	−6549.391	−6586.34

Structural coefficients

Table 2 presents the causal structural path coefficients for endogenous variables (BM, BW, and HHP). All models have positive effects for BM → BW, whereas the BM → HHP and BW → HHP relationships have negative path coefficients. The latter confirmed the fact that chicken breeding is divided into broiler and layer sections due to the negative genetic correlation between BW and HHP.

Table 2

Estimates of three causal structural coefficients (λ) derived from four different structural models.

Path	Structural models
	SEM-A75	SEM-G75	SEM-A85	SEM-A95
λ_{BM → BW}(λ₂₁)	2.13	2.19	2.14	2.14
λ_{BM → HHP}(λ₃₁)	−0.17	−0.280	***	***
_{λ_{BW → HHP}(λ32})	−0.27	−0.096	−0.31	***

BM, breast meat; BW, body weight; HHP, hen-house production. SEM-75: HPD > 0.75. SEM-G75: HPD > 0.75. SEM-A85: HPD > 0.85. SEM-A95: HPD > 0.95.

Represents path coefficient was not estimated because there was no corresponding path in the inferred structure.

Estimates of three causal structural coefficients (λ) derived from four different structural models. BM, breast meat; BW, body weight; HHP, hen-house production. SEM-75: HPD > 0.75. SEM-G75: HPD > 0.75. SEM-A85: HPD > 0.85. SEM-A95: HPD > 0.95. Represents path coefficient was not estimated because there was no corresponding path in the inferred structure. Also shown in Table 2 are the magnitudes of the SEM structural coefficient reflecting the intensity of the causality. The positive coefficient λ21 quantifies the (direct) causal effect of BM on BW. This suggests that a 1-unit increase in BM results in a λ21-unit increase in BW. Likewise, the negative causal effects λ31 and λ32 offer the same interpretation.

Decomposition of snp effect paths using a fully recursive model

We can decompose SNP effects into direct and indirect effects using Figure 2. The direct effect of the SNP j on y3 (HHP) is given by dSN: Ŝj(y3), where d denotes the direct effect. Note there are only one direct and many indirect paths. We find three indirect paths from SNPj to y3 mediated by y1 and y2 (i.e., the nodes formed by other traits). The first indirect effect is ind(1)SNP: λ32(λ21Ŝj(y1)) in the path mediated by y1 and y2, where ind denotes the indirect effect. The second indirect effect ind(2)SN: λ32Ŝj(y2), is mediated by y2. The last indirect effect, is ind(3)SNP: λ31Ŝj(y1), mediated by y1. Therefore, the overall effect is given by summing all four paths, TSN: λ32(λ21Ŝj(y1)) +λ32Ŝj(y2)+λ31Ŝj(y1)+Ŝj(y3). The fully recursive model of the overall SNP effect is then: For y1 (BM), there is only one effect, so the overall effect is equal to the direct effect. For y2 (BW) and y3 (HHP), direct and indirect SNP effects are involved. There are two paths for y2: one indirect, indS:Ŝj(y1) → y1 → y2, and one direct, dS:Ŝj(y2) → y2. Here, the SNP effect is direct and mediated thorough other phenotypes according to causal networks in SEM-GWAS (Figures 1, 2). For instance, the overall SNP effect for y3 into four direct and indirect paths is TŜ:λ32λ21Ŝj(y1)+λ32Ŝj(y1)+λ31Ŝj(y1)+Ŝj(y3). The scatter plots in Figure 3 compare the estimated total effects for HHP (TŜ) obtained from SEM-GWAS and those from MTM-GWAS. We observed good agreement between SEM-GWAS and MTM-GWAS. The total SNP signals derived from SEM and MTM are the same but SEM provides biologically relevant additional information.

Figure 3

Comparison of multiple trait (MTM) and fully recursive overall SNP effects obtained with A (pedigree-based) and G (marker-based) from structural equation modeling (SEM)-based GWAS. Overall effects in SEM are the sum of all direct and indirect effects. HHP, hen-house egg production. Figures S1–S4 present scatter plots of MTM-GWAS and SEM-GWAS signals (SEM-A75, SEM-G75, SEM-A85, and SEM-A95) for the BM → BW path, which was a common path across all SEM-GWAS considered. These two traits have a genetic correlation of 0.5 (results not shown). We partitioned the SEM causal link into direct, indirect, and overall effects based on directed links inferred from the IC algorithm with HPD > 0.85, whereas MTM-GWAS captures an overall SNP effect on BW. Scatter plots of the overall effects from SEM-GWAS and those of the total effects from MTM-GWAS indicated almost perfect agreement (top left plots, Figures S1–S4). We also observed concomitance between estimated overall and direct effects (top right plots, Figures S1–S4). In contrast, there was less agreement in the magnitude of the SNP effects when comparing overall vs. indirect effects (bottom left plots, Figures S1–S4). There was no linear relationship between the indirect and direct SNP effects (bottom right plots, Figures S1–S4). In short, genetic signals detected in SEM-GWAS were close to those of MTM-GWAS for overall effects because both models are based on a multivariate approach with the same covariance matrix. In all SEM-GWAS, results showed that direct effects contributed to overall effects more than the indirect effects.

Manhattan plot of direct, indirect, and overall snp effects

Figure 4 depicts a Manhattan plot summarizing the magnitude of direct (SEM-75A), indirect (SEM-75A), and overall SNP effects (MTM-75A). We plotted the decomposed SNP effects on BW along chromosomes to visualize estimated marker effects from SEM-GWAS and MTM-GWAS. The indirect and direct effects provide a view of SNP effects from a perspective that is not available for the total effect of MTM-GWAS. For instance, there were two estimated SNP effects on chromosomes 1 and 2 that deserve particular attention. These two SNPs are highlighted with black circles and red ovals. The overall effect of the first SNP consisted of large indirect and small direct effects on BM, whereas the opposite pattern was observed for the second SNP, which showed large direct and small indirect effects. Although the overall effects of these SNPs were similar (top Manhattan plot, Figure 4), use of decomposition allowed us to determine that the trait of interest is affected in different manners: the second SNP effect acted directly on BW without any mediation by BM, whereas the first SNP reflected a large effect mediated by BM on BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

Figure 4

Manhattan plot showing overall, direct and indirect SNP effects using a full recursive model based on A matrix for body weight (BW).

Manhattan plot showing overall, direct and indirect SNP effects using a full recursive model based on A matrix for body weight (BW). The corresponding Manhattan plot based on –log10 (p-values) is shown in Figure S5. As with the magnitude of effect sizes, the results showed that –log10 (p-values) of estimated overall effects from SEM-A75 and those from MTM-A75 yielded the same significant peaks. We found that some significant indirect SNP effects reached genome-wide significance after correction for multiple-testing using a 5% FDR threshold level (2.752). The most significant SNPs were on chromosomes 1 and 4 (GGA1 and GGA4). As an illustration, the six most significant SNPs with the highest –log10 (p-values) for each type of decomposed SNP effect are presented in Table 3. Seven candidate genes were identified near the significant SNPs derived from the SNP effects decomposition, with two on GGA7 (OLA1 and ZNF385B), one on GGA3 (EPHA7), three on GGA4 (LOC422264, LOC422265, and MAEA), and one on GGA14 (GRIN2A). We found that only genes on GGA4 and GGA1 are linked to significant indirect SNP effects that impact HHP. Some studies reported QTLs for BM on GGA1 and for BW on GGA4, stating that these genomic regions contain QTLs related to abdominal fat and growth traits that were detected across diverse chicken populations (Sun et al., 2013; Van Goor et al., 2015). One of the two detected genes on GGA14, i.e., GRIN2A, which was linked to the SNP Gga_rs313620413, showed significant direct and overall SNPs effects using SEM as well as MTM. Collectively, Gga_rs15390496, Gga_rs16591372, and Gga_rs313620413 SNPs on GGA3, GGA7, and GGA14, which were linked to EPHA7, OLA1, and GRIN2A, respectively, represent candidate genes identified from overall effects of both SEM and MTM (Table 3).

Table 3

Six most significant SNPs selected according –log10 (p-values) and their effects, using the full recursive SEM (SEM-A75) and MTM (MTM-A75).

				–log10 (p-values)				Type of SNP effect
	CHR	SNP name	Candidate genes	d_{S_j → y(_HHP)}	ind_{S_j → y(_HHP)}	T_{S_{j → _y(_HHP)}}	MTM_{S_{j → _y(_HHP)}}	d_{S_j → y(_HHP)}	ind_{S_j → y(_HHP)}	T_{S_{j → _y(_HHP)}}	MTM_{S_{j → _y(_HHP)}}
Top SNPs for direct effects	14	Gga_rs313620413	GRIN2A	7.4242	0.1499	9.6599	7.4525	−5.7827	−0.0498	−5.8326	−5.78511
	7	Gga_rs16591372	OLA1	7.0868	0.2220	9.0119	6.9783	−22.5681	0.2983	−22.2698	−22.3520
	3	Gga_rs15390496	EPHA7	7.0209	0.2214	8.6122	7.0297	−22.4233	−0.2149	−22.6382	−22.4098
	1	Gga_rs314001234	–	7.0147	1.1067	9.0710	7.1653	−26.6538	−0.9018	−27.5556	−26.9360
	7	Gga_rs315626061	–	6.8300	0.3360	8.9974	6.9529	5.1767	0.0910	5.26783	5.22295
	7	Gga_rs316509306	–	6.8241	0.3442	8.9952	6.9485	5.1742	0.0928	5.267116	5.22105
Top SNPs for indirect effects	4	Gga_rs316082590	LOC422264	0.7137	3.6868	0.4754	0.5696	−1.2913	0.4505	−0.84073	−1.07339
	4	Gga_rs313358833	LOC422265	0.6449	3.2345	0.4310	0.5202	−1.2067	0.4235	−0.78322	−1.01618
	4	Gga_rs314615897	MAEA	0.1170	2.9505	0.0474	0.0387	−0.2799	0.3853	0.105456	−0.09807
	1	Gga_rs15301842	–	0.0393	2.9408	0.1436	0.0149	−0.1301	0.5053	0.375199	0.050463
	1	Gga_rs314551852	–	0.0632	2.8858	0.1100	0.0065	−0.2038	0.4994	0.295514	−0.02218
	1	Gga_rs317379325	–	0.1599	2.8473	0.0070	0.0931	−0.4789	0.5000	0.021148	−0.29321
Overall effects	14	Gga_rs313620413	GRIN2A	7.4242	0.1499	9.6599	7.4525	−5.7827	−0.0498	−5.83262	−5.7851
	1	Gga_rs314001234	–	7.0147	1.1067	9.0710	7.1653	−26.653	−0.9018	−27.5556	−26.9360
	7	Gga_rs315626061	–	7.0868	0.2220	9.0119	6.9783	−22.5681	0.2983	−22.2698	−22.3520
	7	Gga_rs315626061	–	6.8300	0.3360	8.9974	6.9529	5.1767	0.0910	5.26783	5.2229
	7	Gga_rs316509306	–	6.8241	0.3442	8.9952	6.9485	5.1742	0.0928	5.267116	5.2210
	7	Gga_rs15850017	ZNF385B	6.6582	0.0499	8.6397	6.6176	−20.8591	−0.0718	−20.9310	−20.7681
MTM	14	Gga_rs313620413	GRIN2A	7.4242	0.1499	9.6599	7.4525	−5.7827	−0.0498	−5.8326	−5.7851
	1	Gga_rs314001234	–	7.0147	1.1067	9.0710	7.1653	−26.6538	−0.9018	−27.5556	−26.936
	3	Gga_rs15390496	EPHA7	7.0209	0.2214	8.6122	7.0297	−22.4233	−0.2149	−22.6382	−22.4098
	7	Gga_rs16591372	OLA1	7.0868	0.2220	9.0119	6.9783	−22.5681	0.2983	−22.2698	−22.352
	7	Gga_rs315626061	–	6.8300	0.3360	8.9974	6.9529	5.1767	0.0910	5.26780	5.2229
	7	Gga_rs316509306	–	6.8241	0.3442	8.9952	6.9485	5.1742	0.0928	5.2671	5.2210

Six most significant SNPs selected according –log10 (p-values) and their effects, using the full recursive SEM (SEM-A75) and MTM (MTM-A75). d. We noted that the six SNPs selected according to the –log10 (p-values) from the direct effect on HHP (i.e., dSN) had small indirect effects ranging from -0.9018 to 0.2983. These indirect effects were negligible compared with their corresponding direct and total effects. Also, exploring the indirect effect sizes of the six most significant SNPs showed that indirect effects that are transmitted through inferred causal networks have the ability to change the magnitude of overall SNP effects, even changing them to the opposite direction (i.e., from positive to negative or vice versa). It should also be noted that the estimated additive SNP effects obtained from the four SEM-GWAS can be used for inferring pleiotropy. For instance, a pleiotropic QTL may have a large positive direct effect on BW but may exhibit a negative indirect effect coming from BM, which in turn reduces the total QTL effect on BW. Arguably, the methodology employed here would be most effective when the direct and indirect effects of a QTL are in opposite directions. If the direct and indirect QTL effects are in the same direction, the power of SEM-GWAS may be the same as the overall power of MTM-GWAS. The overall effect (TŜ) of a given SNP consisted of large indirect (indŜ) and small direct (dŜ) effects on HHP, as observed for the top most significant indirect SNPs localized on GGA4 and GAA1, whereas the opposite pattern was observed for the most significant direct SNPs on GAA3, GGA7, and GGA14, which showed large direct and small indirect effects. Although the overall effects of these SNPs from SEM-GWAS and MTM-GWAS were similar, the use of decomposition allowed us to determine that the trait of interest is affected in different manners. For instance, a given SNP effect may largely act directly on HHP without any mediation by BM and BW, whereas another SNP may be transmitting a large effect through a causal path mediated by BM and BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

Discussion

It is becoming increasingly common to analyze a set of traits simultaneously in GWAS by leveraging genetic correlations between traits (Gao et al., 2014; Wu and Pankow, 2017). In the present study, we illustrated the potential utility of a SEM-based GWAS approach for causal inference and mediation analysis of SNP effects, which has the potential advantage of embedding a pre-inferred causal structure across phenotypic traits (Valente et al., 2010). SEM-GWAS, as an extension of standard MTM, accounts for recursive linking of mediating variables that could be either dependent or independent with restriction on a residual covariance. This is a useful approach when multiple mediators influence the final outcomes via either common or distinct biological pathways (Barfield et al., 2017; Bellavia and Valeri, 2017). SEM-GWAS is achieved by first inferring the structure of networks between phenotypic traits. For this purpose, we used a modified version of the IC algorithm described by Pearl (2009) and modified for implementing in quantitative genetics by Valente et al. (2010). The IC algorithm was used to explore putative causal links among phenotypes obtained from a residual covariance matrix, in a model that accounted for systematic and genetic confounding factors such as polygenic additive effects. It then produced a posterior distribution of partial residual correlations between any possible pairs of variables. Three different causal path diagrams were inferred from HPD intervals of 0.75, 0.85, and 0.95. We observed that the number of identified paths decreased with an increase in the HPD interval value. Only a path connecting BM and BW was present in all HPD intervals considered. Moreover, we found that the partial residual correlation between BM and HHP was weaker than that between BM and BW. This may explain why the path between BM and HHP was not detected with HPD intervals larger than 0.75. The primary purpose of estimating the goodness of fit criterions was to determine whether full recursive SEM and MTM models with different assumptions yield the same or nearly the same BIC and AIC scores. Because our results showed that SEM and MTM produced nearly the same goodness of fit criterions, we conclude that the essential difference between these models cannot be articulated in terms of an expressive power of joint distributions or goodness of fit (Valente et al., 2013). Estimated path coefficients reflect the strength of each causal link, quantifying the proportion of direct and indirect effects of a given SNP or genes on the outcome of interest via the mediator phenotypic traits or the predefined causal pathway between a set of mediators and the target outcome. For instance, a positive path coefficient from BM to BW suggests that a unit increase in BM directly results in an increase in BW. Our results showed that MTM-GWAS and SEM-GWAS were similar in terms of the goodness of fit as per the AIC and BIC criteria. This finding is in agreement with theoretical work of Gianola and Sorensen (2004) and Varona et al. (2007) showing the equivalence between models. Thus, MTM-GWAS and SEM-GWAS produced the same marginal phenotypic distributions and goodness of fit values. A similar approach has been proposed by Li et al. (2006), Mi et al. (2010b), and Wang and van Eeuwijk (2014). The main difference between our approach and theirs is that they used SEM in the context of standard QTL mapping, whereas our SEM-GWAS is developed for GWAS based on a linear mixed model. The results obtained in this study using the three economic traits in chickens suggest that causal inference and the SEM framework can be used for a set of phenotypes by considering both the raw and partial correlation relationships among traits in breeding programs. For example, in model SEM-A85, BM and HHP are unconditionally independent. However, conditioning on BW results in a non-zero partial correlation. Conditioning on BW breaks the causal chain from BM to HHP as observed in the case of full recursive models (SEM-A75 and SEM-G75) and their partial correlation becomes non-zero. This indicates that when all three variables are causally connected, both raw and partial correlations will all be non-zero, but they will change the magnitude depending on the signs of the path coefficients. The advantage of SEM-GWAS over MTM-GWAS is that the former decomposes SNP effects by tracing inferred causal networks. Our results showed that by partitioning SNP effects into direct, indirect, and total components, an alternative perspective of SNP effects can be obtained. As shown in Table 3 and Figure 4, direct and indirect effects may differ in magnitude and sign, acting in the same direction or in an antagonistic manner. Note that the total SNP effects inferred from SEM-GWAS were the same as the estimated SNP effects from MT-GWAS (Table 3). However, knowledge derived from the decomposition of SNP effects may be critical for animal and plant breeders in breaking unfavorable indirect QTL effects by reducing the frequency of undesired alleles or obtaining better SNP effect estimates than those from MTM-GWAS (e.g., Mi et al., 2010b).

Conclusion

SEM offers insights into how phenotypic traits relate to each other. We illustrated potential advantages of SEM-GWAS relative to the commonly used standard MTM-GWAS by using three chicken traits as an example. SNP effects pertaining to SEM-GWAS have a different meaning than those in MTM-GWAS. Our results showed that SEM-GWAS enabled the identification of whether a SNP effect is acting directly or indirectly, i.e., mediated, on given trait. In contrast, MTM-GWAS only captures overall genetic effects on traits, which is equivalent to combining direct and indirect SNP effects from SEM-GWAS together. Thus, SEM-GWAS offers more information and provides an alternative view of putative causal networks, enabling a better understanding of the genetic quiddity of traits at the genomic level.

Author contributions

MM carried out the study and wrote the first draft of the manuscript. GR and DG designed the experiment, supervised the study, and critically contributed to the final version of manuscript. GM contributed to the interpretation of results, provided critical insights, and revised the manuscript. BV and AA participated in discussion and reviewed the manuscript. MA, AK, and RM contributed materials and revised the manuscript. All authors read and approved the final manuscript.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

38 in total

1. Improved linear mixed models for genome-wide association studies.

Authors: Jennifer Listgarten; Christoph Lippert; Carl M Kadie; Robert I Davidson; Eleazar Eskin; David Heckerman
Journal: Nat Methods Date: 2012-05-30 Impact factor: 28.547

2. Genome-wide association and genomic selection in animal breeding.

Authors: Ben Hayes; Mike Goddard
Journal: Genome Date: 2010-11 Impact factor: 2.166

3. Estimation of effects of single genes on quantitative traits.

Authors: B W Kennedy; M Quinton; J A van Arendonk
Journal: J Anim Sci Date: 1992-07 Impact factor: 3.159

4. Efficient methods to compute genomic predictions.

Authors: P M VanRaden
Journal: J Dairy Sci Date: 2008-11 Impact factor: 4.034

Review 5. Bayesian structural equation models for inferring relationships between phenotypes: a review of methodology, identifiability, and applications.

Authors: Xiao-Lin Wu; Bjørg Heringstad; Daniel Gianola
Journal: J Anim Breed Genet Date: 2010-02 Impact factor: 2.380

6. Alternative parameterizations of the multiple-trait random regression model for milk yield and somatic cell score via recursive links between phenotypes.

Authors: J Jamrozik; L R Schaeffer
Journal: J Anim Breed Genet Date: 2011-03-28 Impact factor: 2.380

7. Multiple-trait genome-wide association study based on principal component analysis for residual covariance matrix.

Authors: H Gao; Y Wu; T Zhang; Y Wu; L Jiang; J Zhan; J Li; R Yang
Journal: Heredity (Edinb) Date: 2014-07-02 Impact factor: 3.821

8. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

9. Inferring causal phenotype networks using structural equation models.

Authors: Guilherme J M Rosa; Bruno D Valente; Gustavo de los Campos; Xiao-Lin Wu; Daniel Gianola; Martinho A Silva
Journal: Genet Sel Evol Date: 2011-02-10 Impact factor: 4.297

10. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.

Authors: Paul F O'Reilly; Clive J Hoggart; Yotsawat Pomyen; Federico C F Calboli; Paul Elliott; Marjo-Riitta Jarvelin; Lachlan J M Coin
Journal: PLoS One Date: 2012-05-02 Impact factor: 3.240

11 in total

1. Joint eQTL mapping and Inference of Gene Regulatory Network Improves Power of Detecting both cis- and trans-eQTLs.

Authors: Xin Zhou; Xiaodong Cai
Journal: Bioinformatics Date: 2021-09-06 Impact factor: 6.931

2. A network modeling approach provides insights into the environment-specific yield architecture of wheat.

Authors: Noah DeWitt; Mohammed Guedira; Joseph Paul Murphy; David Marshall; Mohamed Mergoum; Christian Maltecca; Gina Brown-Guedira
Journal: Genetics Date: 2022-07-04 Impact factor: 4.402

Review 3. Application of Bayesian genomic prediction methods to genome-wide association analyses.

Authors: Anna Wolc; Jack C M Dekkers
Journal: Genet Sel Evol Date: 2022-05-13 Impact factor: 5.100

4. Genomic structural equation modelling provides a whole-system approach for the future crop breeding.

Authors: Tianhua He; Tefera Tolera Angessa; Camilla Beate Hill; Xiao-Qi Zhang; Kefei Chen; Hao Luo; Yonggang Wang; Sakura D Karunarathne; Gaofeng Zhou; Cong Tan; Penghao Wang; Sharon Westcott; Chengdao Li
Journal: Theor Appl Genet Date: 2021-05-31 Impact factor: 5.699

5. Structural equation modeling for investigating multi-trait genetic architecture of udder health in dairy cattle.

Authors: Sara Pegolo; Mehdi Momen; Gota Morota; Guilherme J M Rosa; Daniel Gianola; Giovanni Bittante; Alessio Cecchinato
Journal: Sci Rep Date: 2020-05-08 Impact factor: 4.379

6. Gut microbiome mediates host genomic effects on phenotypes: a case study with fat deposition in pigs.

Authors: Francesco Tiezzi; Justin Fix; Clint Schwab; Caleb Shull; Christian Maltecca
Journal: Comput Struct Biotechnol J Date: 2020-12-30 Impact factor: 7.271

7. A web-based survey on various symptoms of computer vision syndrome and the genetic understanding based on a multi-trait genome-wide association study.

Authors: Keito Yoshimura; Yuji Morita; Kenji Konomi; Sachiko Ishida; Daisuke Fujiwara; Keisuke Kobayashi; Masami Tanaka
Journal: Sci Rep Date: 2021-05-03 Impact factor: 4.379

8. Perspectives on Applications of Hierarchical Gene-To-Phenotype (G2P) Maps to Capture Non-stationary Effects of Alleles in Genomic Prediction.

Authors: Owen M Powell; Kai P Voss-Fels; David R Jordan; Graeme Hammer; Mark Cooper
Journal: Front Plant Sci Date: 2021-06-04 Impact factor: 5.753

9. Utilizing trait networks and structural equation models as tools to interpret multi-trait genome-wide association studies.

Authors: Mehdi Momen; Malachy T Campbell; Harkamal Walia; Gota Morota
Journal: Plant Methods Date: 2019-09-18 Impact factor: 4.993

10. A Multiple-Trait Bayesian Variable Selection Regression Method for Integrating Phenotypic Causal Networks in Genome-Wide Association Studies.

Authors: Zigui Wang; Deborah Chapman; Gota Morota; Hao Cheng
Journal: G3 (Bethesda) Date: 2020-12-03 Impact factor: 3.154