| Literature DB >> 34861179 |
Yue Wu1, Kathryn S Burch2, Andrea Ganna3, Päivi Pajukanta4, Bogdan Pasaniuc5, Sriram Sankararaman6.
Abstract
Genetic correlation is an important parameter in efforts to understand the relationships among complex traits. Current methods that analyze individual genotype data for estimating genetic correlation are challenging to scale to large datasets. Methods that analyze summary data, while being computationally efficient, tend to yield estimates of genetic correlation with reduced precision. We propose SCORE (scalable genetic correlation estimator), a randomized method of moments estimator of genetic correlation that is both scalable and accurate. SCORE obtains more precise estimates of genetic correlations relative to summary-statistic methods that can be applied at scale; it achieves a 44% reduction in standard error relative to LD-score regression (LDSC) and a 20% reduction relative to high-definition likelihood (HDL) (averaged over all simulations). The efficiency of SCORE enables computation of genetic correlations on the UK Biobank dataset, consisting of ≈300 K individuals and ≈500 K SNPs, in a few h (orders of magnitude faster than methods that analyze individual data, such as GCTA). Across 780 pairs of traits in 291,273 unrelated white British individuals in the UK Biobank, SCORE identifies significant genetic correlation between 200 additional pairs of traits over LDSC (beyond the 245 pairs identified by both).Entities:
Keywords: biobank; complex traits; genetic correlation; method of moments; pleiotropy
Mesh:
Year: 2021 PMID: 34861179 PMCID: PMC8764132 DOI: 10.1016/j.ajhg.2021.11.015
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025
Figure 1Comparison of the estimates of genetic correlation from SCORE with GCTA-GREML, GCTA-HE, LDSC, and HDL ( unrelated individuals, SNPs)
(A–D) We simulated pairs of traits under 48 genetic architectures (with varying heritability, genetic correlation, and polygenicity). We plot the standard error (SE) of each method relative to GCTA-GREML. (A), (B), and (C) display the standard error (SE) of each method relative to GCTA-GREML as a function of heritability, genetic correlation, and polygenicity, while (D) summarizes the relative SE across all architectures (see the simulations to assess accuracy section of material and methods). We ran LDSC with in-sample LD and HDL with eigenvectors that preserve variance (see the data processing section of material and methods). We estimate the standard error of the relative SE by using Jackknife (error bars denote 1 standard error).
Figure 2Comparison of the runtime of SCORE with GCTA-GREML and GCTA-HE as a function of the number of samples
The samples were obtained as subsets of unrelated, white British individuals in the UK Biobank. We plot the runtime of both SCORE (that can handle any degree of sample overlap) and its variant, SCORE-OVERLAP (designed for sample overlap). SCORE runs in a few h on the largest dataset of individuals and SNPs.
Figure 3Genetic correlation estimates in the UK Biobank
We plot the genetic correlation estimates from SCORE (bottom triangle) and LDSC (upper triangle) across pairs of 28 phenotypes. Larger filled squares correspond to significant pairs after Bonferroni correction at a significance level, while smaller squares correspond to pairs that are significant at a significance level but are not significant after accounting for multiple testing. Star indicates pairs that are found to be significant by SCORE but not by LDSC.