| Literature DB >> 34521848 |
Nicole M Warrington1,2,3,4,5, Liang-Dar Hwang6,7, Michel G Nivard8, David M Evans6,7,9.
Abstract
Estimation of direct and indirect (i.e. parental and/or sibling) genetic effects on phenotypes is becoming increasingly important. We compare several multivariate methods that utilize summary results statistics from genome-wide association studies to determine how well they estimate direct and indirect genetic effects. Using data from the UK Biobank, we contrast point estimates and standard errors at individual loci compared to those obtained using individual level data. We show that Genomic structural equation modelling (SEM) outperforms the other methods in accurately estimating conditional genetic effects and their standard errors. We apply Genomic SEM to fertility data in the UK Biobank and partition the genetic effect into female and male fertility and a sibling specific effect. We identify a novel locus for fertility and genetic correlations between fertility and educational attainment, risk taking behaviour, autism and subjective well-being. We recommend Genomic SEM be used to partition genetic effects into direct and indirect components when using summary results from genome-wide association studies.Entities:
Mesh:
Year: 2021 PMID: 34521848 PMCID: PMC8440517 DOI: 10.1038/s41467-021-25723-z
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Summary of each of the methods used to derive maternal and offspring specific genetic effects using summary statistics from a GWAS of own birth weight and a GWAS of offspring birth weight.
| Method | Major assumptions | Data used | Variant exclusions |
|---|---|---|---|
| SEM using summary statistics | • Multivariate normal outcomes • Allele frequency, beta coefficient and SNP variance are consistent across the three groups (individuals with own birth weight only, with offspring birth weight only or with both own and offspring birth weight) • LD reference sample and GWAS samples all drawn from the same population for LD score regression analysis • The effect sizes for each genotype are identically normally distributed with mean zero and the same variance for LD score regression analysis | • Covariance matrices derived from: • GWAS summary results data for own and offspring birth weight • Estimated sample overlap using bivariate LD score regression with a phenotypic correlation between own and offspring birth weight of 0.24 • European reference panel from LD score regression | Minor allele frequency <0.5% |
| Linear approximation of SEM | • No sample overlap between GWAS of offspring birth weight and GWAS of own birth weight | • GWAS summary results data for own and offspring birth weight | None |
| MTAG | • LD reference sample and GWAS samples all drawn from the same population for LD score regression analysis • The effect sizes for each genotype are identically normally distributed with mean zero and the same variance for LD score regression analysis • All SNPs share the same variance-covariance matrix of effect sizes across traits | • GWAS summary results data for own and offspring birth weight • European reference panel from LD score regression | Variants with missing values, that are not SNPs, with duplicated rs numbers or that are strand ambiguous |
| mtCOJO | • LD reference sample and GWAS samples all drawn from the same population for LD score regression analysis • The effect sizes for each genotype are identically normally distributed with mean zero and the same variance for LD score regression analysis | • GWAS summary results data for own and offspring birth weight • Reference panel of 50,000 randomly sampled individuals from the UK Biobank • European reference panel from LD score regression | Multi-allelic variants |
| Genomic SEM | • LD reference sample and GWAS samples all drawn from the same population for LD score regression analysis • The effect sizes for each genotype are identically normally distributed with mean zero and the same variance for LD score regression analysis • All SNPs share the same variance-covariance matrix of effect sizes across traits | • European reference panel from LD score regression • GWAS summary results data for own and offspring birth weight | None |
Fig. 1Comparison of the effect size estimates from the SEM using individual level data (x-axis) and the various different methods using the summary statistics from the GWAS of own and offspring birth weight (y-axis) for the 300 autosomal genome-wide significant SNPs from Warrington et al.[3].
The columns summarize the results from the analysis including unique individuals in the GWAS of own and offspring birth weight for the offspring and maternal effect, respectively, followed by the results from the analysis where there were overlapping samples in the GWAS of own and offspring birth weight for the offspring and maternal effect.
Summary of the results from each of the methods used to derive maternal and offspring specific genetic effects using summary statistics from a GWAS of own birth weight and a GWAS of offspring birth weight.
| Method | No sample overlap | Sample overlap | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Comparison with SEM using individual level data | Computational time (min)a | LD score regression intercept (standard error) | Number of SNPs with | Comparison with SEM using individual level data | Computational time (min)a | LD score regression intercept (standard error) | Number of SNPs with | ||||
| Effect estimate | Standard error | Effect estimate | Standard error | ||||||||
| SEM using summary statisticsd | Offspring effect | Accurately estimated | Deflated | 4641 | 1.754 (0.012) | 2143 (546) [486] | Accurately estimated | Comparable | 5753 | 1.066 (0.009) | 792 (38) [19] |
| Maternal effect | Accurately estimated | Deflated | 1.197 (0.009) | 728 (76) [51] | Accurately estimated | Comparable | 1.058 (0.008) | 656 (43) [18] | |||
| Linear approximation of SEM | Offspring effect | Accurately estimated | Comparable | 42 | 1.012 (0.007) | 37 (6) [1] | Accurately estimated | Inflated | 34 | 0.939 (0.008) | 320 (13) [2] |
| Maternal effect | Accurately estimated | Comparable | 1.016 (0.007) | 423 (18) [0] | Accurately estimated | Inflated | 0.941 (0.007) | 496 (16) [0] | |||
| MTAG | Offspring effect | No correlation with effect estimate from SEM | Deflated | 30 | 0.988 (0.011) | 3595 (80) [5] | No correlation with effect estimate from SEM | Deflated | 30 | 1.001 (0.012) | 5461 (128) [16] |
| Maternal effect | No correlation with effect estimate from SEM | Deflated | 1.005 (0.011) | 3880 (89) [11] | No correlation with effect estimate from SEM | Deflated | 1.005 (0.012) | 5351 (119) [18] | |||
| mtCOJO | Offspring effect | Consistently underestimated | Deflated | 48 | 1.022 (0.007) | 62 (7) [1] | Consistently underestimated | Deflated | 62 | 1.049 (0.009) | 896 (22) [2] |
| Maternal effect | Consistently underestimated | Deflated | 36 | 1.031 (0.008) | 386 (17)[0] | Consistently underestimated | Deflated | 63 | 1.043 (0.008) | 524 (18) [0] | |
| Genomic SEMd | Offspring effect | Accurately estimated | Comparable | 103 | 0.985 (0.007) | 32 (16) [1] | Accurately estimated | Comparable | 224 | 0.971 (0.008) | 394 (13) [2] |
| Maternal effect | Accurately estimated | Comparable | 0.978 (0.007) | 372 (16) [0] | Accurately estimated | Comparable | 0.976 (0.007) | 532 (18) [0] | |||
Computational time is used as a guide only to compare between methods; this will differ depending on the computing resources available for each analysis.
aAnalyses were conducted on a Dell Poweredge R840 server that is part of a Rocks 7 open-source Linux cluster based upon CentOS 7.4. Specific details include: CPU: 4 × Intel Xeon Gold 5117 2.0 G, 14 C/28 T, 10.4GT/s, 19.25 M Cache, Turbo, HT (105 W); Disk: 480GB SSD SATA Read Intensive 6 Gbps 512 2.5in; Memory: 24 × 64GB LRDIMM, 2666MT/s, Quad Rank. The number of cores and memory were assigned to each job to optimize performance of the method.
bA locus was defined as 500 kb from the sentinel SNP.
cA false positive was defined as a locus that is greater than 500 kb from the already known birth weight associated sentinel SNP.
dWe ran the 22 chromosomes in parallel for the SEM using summary statistics and Genomic SEM, so the computational time was the time to run chromosome 2 as this is the longest chromosome and therefore slowest to complete.
Fig. 2Comparison of the standard error from the SEM using individual level data (x-axis) and the various different methods using the summary statistics from the GWAS of own and offspring birth weight (y-axis) for the 300 autosomal genome-wide significant SNPs from Warrington et al.[3].
The columns summarize the results from the analysis including unique individuals in the GWAS of own and offspring birth weight for the offspring and maternal genetic effect, respectively, followed by the results from the analysis where there were overlapping samples in the GWAS of own and offspring birth weight for the offspring and maternal genetic effect.
Fig. 3Manhattan plot and quantile–quantile (Q–Q) plot for the fertility GWAS estimating male, female and sibling-specific genetic effects using Genomic SEM.
237,768 women from the UK Biobank contributed to the unconditional GWAS of the number of children mothered, 199,570 men contributed to the GWAS of the number of children fathered and 430,466 individuals contributed to the GWAS of the number of siblings (see Supplementary Fig. 26 for the Manhattan plots of the unconditional GWAS). Point estimates for male, female and sibling effects and their standard errors were estimated using diagonally weighted least squares as implemented in Genomic SEM, and two-sided P-values obtained from Z tests on these estimates. The two-sided association P-value, on the −log10 scale, obtained from Genomic SEM for each of the SNPs (y-axis) was plotted against the genomic position (NCBI Build 37; x-axis). Association signals that reached genome-wide significance (P < 5 × 10−8) are shown in red. In the Q–Q plots, the black dots represent observed two-sided P-values and the grey line represents expected two-sided P-values under the null distribution. The SNP heritability, estimated using LD score regression, was 0.033 (SE = 0.003) for male fertility, 0.042 (SE = 0.003) for female fertility and 0.012 (SE = 0.001) for sibling-specific effects. P-values are not adjusted for multiple comparisons.
Fig. 4Schematic of the study design for comparing methods using self-reported birth weight data from the UK Biobank.
We conducted two sets of analysis, one with and one without sample overlap between the genome-wide association studies (GWAS), to investigate the effect of sample overlap in each of the methods.