| Literature DB >> 30962613 |
Michel G Nivard1, Elliot M Tucker-Drob2,3, Andrew D Grotzinger4, Mijke Rhemtulla5, Ronald de Vlaming6,7, Stuart J Ritchie8,9, Travis T Mallard2, W David Hill8,9, Hill F Ip1, Riccardo E Marioni8,10, Andrew M McIntosh8,11, Ian J Deary8,9, Philipp D Koellinger6,7, K Paige Harden2,3.
Abstract
Genetic correlations estimated from genome-wide association studies (GWASs) reveal pervasive pleiotropy across a wide variety of phenotypes. We introduce genomic structural equation modelling (genomic SEM): a multivariate method for analysing the joint genetic architecture of complex traits. Genomic SEM synthesizes genetic correlations and single-nucleotide polymorphism heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to model multivariate genetic associations among phenotypes, identify variants with effects on general dimensions of cross-trait liability, calculate more predictive polygenic scores and identify loci that cause divergence between traits. We demonstrate several applications of genomic SEM, including a joint analysis of summary statistics from five psychiatric traits. We identify 27 independent single-nucleotide polymorphisms not previously identified in the contributing univariate GWASs. Polygenic scores from genomic SEM consistently outperform those from univariate GWASs. Genomic SEM is flexible and open ended, and allows for continuous innovation in multivariate genetic analysis.Entities:
Mesh:
Year: 2019 PMID: 30962613 PMCID: PMC6520146 DOI: 10.1038/s41562-019-0566-x
Source DB: PubMed Journal: Nat Hum Behav ISSN: 2397-3374
Figure 1.Genomic SEM solutions for p-factor and neuroticism factor models with SNP effect.
Standardized results from using Genomic SEM (with WLS estimation) to construct a genetically defined p-factor of psychopathology (panel a) and a genetic neuroticism factor (panel b) with a lead independent SNP predicting the factors. SEs are shown in parentheses. For a model that was standardized with respect to the outcomes only, the effect of the SNP was −.093 (SE = .017; SNP variance = .252) for the p-factor, and for neuroticism the SNP effect was −.042 (SE = .007, SNP variance = .432); this can be interpreted as the expected standard deviation unit difference in the latent factor per effect allele. SCZ = schizophrenia; BIP = bipolar disorder; DEP = major depressive disorder; PTSD = post-traumatic stress disorder; ANX = anxiety. Irr = irritability; Feel = sensitivity/hurt feelings; fed-up = fed-up feelings; emb = worry too long after embarrassment.
Figure 2.Manhattan plots of unique, independent hits from Genomic SEM.
Genomic SEM (with WLS estimation) was used to conduct multivariate GWASs of the p-factor (panels a and c) and neuroticism (panels b and d). Manhattan plots are shown for SNP effects (top panels) and for QSNP (bottom panels). The gray dashed line marks the threshold for genome-wide significance (p < 5 × 10−8). In all four panels, black triangles denote independent hits for SNP effects from the GWAS of the general factor that were not in LD with independent hits for the univariate GWAS or hits for QSNP. In all four panels, purple diamonds denote independent hits for the SNP effects from univariate GWASs that were not in LD with independent hits from the GWAS of the general factor. Grey stars denote independent hits for QSNP.
Summary of multivariate (Genomic SEM) and univariate GWAS results.
| Lead SNPs | QSNP hits | Unique | No. of | No. | No. | Mean | |
|---|---|---|---|---|---|---|---|
| Genomic SEM (WLS) | 128 | 1 (1) | 27 | 71 | 37 | 24 | 1.88 |
| Schizophrenia | 127 | - | 34 (0) | 2 | 25 | 21 | 1.82 |
| Bipolar | 4 | - | 4 (0) | 0 | 0 | 0 | 1.15 |
| MDD | 5 | - | 5 (0) | 0 | 0 | 0 | 1.31 |
| PTSD | 0 | - | 0 (0) | 0 | 0 | 0 | 1.01 |
| Anxiety | 1 | - | 1 (0) | 0 | 0 | 0 | 1.03 |
| Genomic SEM (WLS) | 118 | 69 (5) | 38 | 1 | 19 | 20 | 1.64 |
| Mood | 43 | - | 19 (5) | 0 | 0 | 15 | 1.37 |
| Misery | 31 | - | 6 (4) | 0 | 0 | 0 | 1.32 |
| Irritability | 36 | - | 17 (4) | 0 | 0 | 0 | 1.37 |
| Hurt Feelings | 24 | - | 11 (0) | 0 | 0 | 0 | 1.33 |
| Fed-up | 38 | - | 21 (6) | 0 | 0 | 0 | 1.36 |
| Nervous | 41 | - | 25 (12) | 0 | 0 | 0 | 1.36 |
| Worry | 56 | - | 26 (6) | 0 | 13 | 0 | 1.46 |
| Tense | 19 | - | 10 (3) | 0 | 0 | 0 | 1.32 |
| Embarrass | 17 | - | 6 (2) | 0 | 0 | 0 | 1.33 |
| Nerves | 12 | - | 7 (3) | 0 | 0 | 0 | 1.26 |
| Lonely | 6 | - | 4 (3) | 0 | 0 | 0 | 1.19 |
| Guilt | 21 | - | 8 (1) | 0 | 0 | 0 | 1.28 |
Note. In parentheses for QSNP reports how many QSNP hits were in LD with hits identified as significant for the common factor. Unique hits for the common factor refers to lead SNPs that were not in LD with hits for the individual indicators. Unique hits for the individual indicators refers to hits for the respective indicator that were not in LD with hits for the common factor. Unique hits for the common factor excluded hits in LD with QSNP hits. For unique hits for indicators, values in parentheses indicate whether any of these hits were identified as significant for QSNP. For unique hits for the common factor, hits were excluded that were in LD with previously reported indicator hits that were removed due to missing values across the other phenotypes. The single QSNP hit for WLS estimation of the p-factor was significant for both the common factor and schizophrenia. For the common factor and the indicators, independent hits were defined using a pruning window of 500Kb and r2 > 0.1. For chromosomes 6 and 8, an additional pruning filter was used of 1Mb and r2 > 0.1 to account for long-range LD due to the MHC region and pericentric inversion, respectively. For univariate statistics, we used only the SNPs present across all indicators in order to facilitate a direct comparison to Genomic SEM results.
Figure 3.Out-of-sample prediction using Genomic SEM based and univariate based polygenic scores for psychiatric traits.
Polygenic scores (PGSs) were constructed using the same set of SNPs for all predictors. R2 (%) on the y-axis indicates the percentage of variance (possible range: 0-100) explained in the outcome unique of covariates. The summary statistics for Genomic SEM were estimated using WLS. The Genomic SEM-based PGS was derived from a model estimating SNP effects on a common “p”-factor, constructed from SCZ, BIP, MDD, PTSD, and ANX (as in Fig. 1a.). In order to prevent bias, the Genomic SEM summary statistics were produced using SCZ and MDD GWAS summary statistics that did not include UKB participants. Error bars indicate 95% confidence intervals estimated using the delta method. Phenotypes were constructed for European participants in the UKB for five symptom domains and for a general p factor spanning all five symptom domains.