| Literature DB >> 35921544 |
Sarah Nadeau1,2, Christian W Thorball3, Roger Kouyos4,5, Huldrych F Günthard4,5, Jürg Böni4, Sabine Yerly6, Matthieu Perreau7, Thomas Klimkait8, Andri Rauch9, Hans H Hirsch8,10,11, Matthias Cavassini12, Pietro Vernazza13, Enos Bernasconi14, Jacques Fellay2,3,15, Venelin Mitov1,2, Tanja Stadler1,2.
Abstract
Infectious diseases are particularly challenging for genome-wide association studies (GWAS) because genetic effects from two organisms (pathogen and host) can influence a trait. Traditional GWAS assume individual samples are independent observations. However, pathogen effects on a trait can be heritable from donor to recipient in transmission chains. Thus, residuals in GWAS association tests for host genetic effects may not be independent due to shared pathogen ancestry. We propose a new method to estimate and remove heritable pathogen effects on a trait based on the pathogen phylogeny prior to host GWAS, thus restoring independence of samples. In simulations, we show this additional step can increase GWAS power to detect truly associated host variants when pathogen effects are highly heritable, with strong phylogenetic correlations. We applied our framework to data from two different host-pathogen systems, HIV in humans and X. arboricola in A. thaliana. In both systems, the heritability and thus phylogenetic correlations turn out to be low enough such that qualitative results of GWAS do not change when accounting for the pathogen shared ancestry through a correction step. This means that previous GWAS results applied to these two systems should not be biased due to shared pathogen ancestry. In summary, our framework provides additional information on the evolutionary dynamics of traits in pathogen populations and may improve GWAS if pathogen effects are highly phylogenetically correlated amongst individuals in a cohort.Entities:
Keywords: genome-wide association study; heritability; infectious disease; phylogenetic mixed model
Mesh:
Year: 2022 PMID: 35921544 PMCID: PMC9366186 DOI: 10.1093/molbev/msac163
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 8.800
Fig. 1.A high-level schematic of our phylogenetic Ornstein–Uhlenbeck mixed model (POUMM)-based simulation framework in the context of HIV-1 spVL. (A) shows how the viral effects on spVL evolve along the viral phylogeny according to an Ornstein–Uhlenbeck process. (B) shows how human host genetic effects are the sum of independent effects from several causal variants. Each variant can be present in 0, 1, or 2 copies. Half the variants have a positive effect of size and half have a negative effect of size . (C) shows how other environmental effects are independently drawn from a Gaussian distribution centered at 0. These three effects sum to the trait value for each simulated individual.
Fig. 2.Results from the simulation study. We simulated host, pathogen, and environmental effects on a trait under the POUMM with different heritability (; -axis) and selection strength (; -axis) parameters. For each simulated dataset, we applied our method to estimate the non-pathogen effects and performed GWAS with these values. (A) shows the RMSE of our estimator (left) compared with un-corrected trait values, scaled by their mean (right) under each simulated evolutionary scenario. The RMSE is with reference to the true (simulated) host part of the trait values. Thus, more accurate estimates (lower RMSE) mean the trait value used for GWAS will be closer to the true host part of the trait value. (B) shows how GWAS power can improve given the true, simulated non-pathogen effect on spVL (left) and using our estimate for this value (middle) compared with using the scaled trait value (right). Each tile’s color corresponds to the average value across 20 simulated datasets of 500 samples. The points highlight specific heritability and selection strength values from the A. thaliana–X. arboricola QDR analysis, HIV-1 spVL analysis, and four simulated scenarios that are presented in more detail in figure 4.
Fig. 3.Simulated data from two evolutionary scenarios where a phylogenetic correction to trait values improve GWAS power (right side) and where it does not (left side). These examples correspond to two of the unfilled points in figure 2. (A,B) show total trait values for 12 randomly selected tips from the simulated phylogeny with pathogen heritability of 15 and 75%, respectively. Depending on the pathogen heritability, trait values are more or less correlated at clustered tips. (C) compares our method’s estimate for the non-pathogen part of trait values (-axis) with true simulated host trait values (-axis) with pathogen heritability of 15 and 75%. The solid line is the line. Selection strength was fixed to 0.1 time for both scenarios and all other parameters were fixed as in the full simulation study.
Fig. 4.Correlations between trait values in pairs of tips in four simulated scenarios. These examples correspond to the four unfilled points in figure 2. Correlations are calculated for pairs of tips binned by phylogenetic distance (into deciles) across the 20 replicate simulations for each of the four evolutionary scenarios. Trait values are only noticeably correlated for closely clustered tips under the scenario with high pathogen heritability and low selection strength /low stochastic fluctuations (upper left facet).
Fig. 5.A high-level schematic of the experimental setup for the two application datasets. For (A) HIV-1 spVL in the Swiss HIV Cohort Study, data are paired viral and human genotypes and associated spVL measurements. We fit the POUMM to the viral phylogeny and spVL values associated with each infected individual (). For (B) A. thaliana–X. arboricola quantitative disease resistance (QDR) from Wang et al. (2018), data are bacterial and plant genotypes with QDR measurements for all possible combinations of pathogen and host plant strains. We fit the POUMM to the bacterial phylogeny and mean QDR calculated for each pathogen strain across all the hosts plant types ().
Fig. 6.Results from comparative GWAS on HIV-1 set-point viral load (spVL) data. (A) shows association values for the same host variants from the Swiss HIV cohort in GWAS with two different response variables. On the left, we used unmodified (total) spVL values. On the right, we used our estimates for the non-pathogen effects on spVL. The alternating shades correspond to different chromosomes. (B) compares the strength of association for variants in the CCR5 and MHC regions between the two GWAS (positions 45.4–47 Mb on chromosome 3 and 29.5–33.5 Mb on chromosome 6 for the CCR5 and MHC, respectively). Base positions are with reference to genome build GRCh37. The color of each point represents the difference in -log value between the two GWAS. Red means taking into account phylogenetic information decreased the strength of association and blue means it increased it. The dashed lines show genome-wide significance at .
Top Association Results from McLaren et al. (2015) Compared with Results from this Study.
| Region | Variant | McLaren et al. | Standard Trait Value | Estimated Non-pathogen Part of Trait | ||
|---|---|---|---|---|---|---|
|
| Effect Size |
| Effect Size |
| ||
| MHC | rs59440261 |
|
|
|
|
|
|
| rs1015164 |
| 0.15 |
| 0.078 |
|
Results from this study are for host variants from the SHCS in GWAS with two different response variables. “Standard trait value” means we used the unmodified (total) spVL Value and “Estimated Non-pathogen Part of Trait” Means we used our estimates for the non-pathogen effects on spVL.
Fig. 7.Results from comparative GWAS on A. thaliana QDR to X. arboricola. The two facets show association values for the same host A. thaliana variants in GWAS with two different response variables. On the left, we used unmodified (total) QDR values for each of the 22 selected host–pathogen pairings on which these results are based. On the right, we used our estimates for the non-pathogen effects on QDR for these samples. In this case, estimated non-pathogen effects are the specific QDR for each selected host–pathogen pairing, minus mean QDR for the respective pathogen strain, calculated across all the host A. thaliana types. The alternating shades correspond to different chromosomes. The dashed lines show significance at significance level 0.05 with a Bonferroni correction for multiple testing.