Omer Weissbrod1, Farhad Hormozdiari2, Christian Benner3, Ran Cui4, Jacob Ulirsch4,5, Steven Gazal2, Armin P Schoech2, Bryce van de Geijn2, Yakir Reshef2, Carla Márquez-Luna6, Luke O'Connor4, Matti Pirinen3,7,8, Hilary K Finucane4,9, Alkes L Price10,11. 1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. oweissbrod@hsph.harvard.edu. 2. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. 3. Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland. 4. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 5. Program in Biological and Biomedical Sciences, Harvard Medical School, Cambridge, MA, USA. 6. The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA. 7. Department of Public Health, University of Helsinki, Helsinki, Finland. 8. Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland. 9. Department of Medicine, Massachusetts General Hospital, Boston, MA, USA. 10. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. aprice@hsph.harvard.edu. 11. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. aprice@hsph.harvard.edu.
Abstract
Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.
Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.
Genome-wide association studies of complex traits have been extremely successful in identifying loci harboring causal variants but less successful in identifying the underlying causal variants, making the development of fine-mapping methods a key priority[1,2]. The power of fine-mapping methods[3-12] is limited due to strong linkage disequilibrium (LD), but it can be increased by prioritizing variants in functional annotations that are enriched for complex trait heritability[7,8,10,13-17]. However, previous functionally-informed fine-mapping methods[18-20] have computational limitations and can only use genome-wide significant loci to estimate functional enrichment (or can only incorporate a small number of functional annotations[10]), severely limiting the benefit of functional data.We propose PolyFun, a computationally scalable framework for functionally-informed fine-mapping that makes full use of genome-wide data by specifying prior causal probabilities for fine-mapping methods such as SuSiE[21] or FINEMAP[22,23]. PolyFun estimates functional enrichment using a broad set of coding, conserved, regulatory, MAF and LD-related annotations from the baseline-LF model[24-26].We show in simulations with in-sample LD that PolyFun is well-calibrated and is more powerful than previous fine-mapping methods, with a >20% power increase over non-functionally informed fine-mapping methods. In simulations with mismatched reference LD, PolyFun remains well-calibrated when reducing the maximum number of assumed causal SNPs per locus. We apply PolyFun to 49 complex traits from the UK Biobank[27] (average N=318K) with in-sample LD and identify 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, spanning 2,225 unique variants. 223 of these variants were fine-mapped for multiple genetically uncorrelated traits, indicating pervasive pleiotropy. We further used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, finding sets of common SNPs causally explaining 50% of common SNP heritability that range in size across many orders of magnitude, from dozens to millions of SNPs.
Results
Overview of methods
PolyFun prioritizes variants in enriched functional annotations by specifying prior causal probabilities in proportion to predicted per-SNP heritabilities and providing them as input to fine-mapping methods such as SuSiE[21]or FINEMAP[22,23]. For each target locus, PolyFun robustly specifies prior causal probabilities for all SNPs on the corresponding odd (resp. even) target chromosome by (1) estimating functional enrichments for a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LF 2.2.UKB model[25] (187 annotations; Methods, Supplementary Table 1) using an L2-regularized extension of S-LDSC[17], restricted to even (resp. odd) chromosomes; (2) estimating per-SNP heritabilities for SNPs on odd (resp. even) chromosomes using the functional enrichment estimates from step 1; (3) partitioning all SNPs into 20 bins of similar estimated per-SNP heritabilities from step 2; (4) re-estimating per-SNP heritabilities for all SNPs on the target chromosome by applying S-LDSC to the 20 bins, restricted to odd (resp. even) chromosomes excluding the target chromosome; and (5) setting prior causal probabilities for SNPs on the target chromosome proportional to per-SNP heritabilities from step 4. The L2 regularization in step 1 improves the accuracy of per-SNP heritability estimation; the partitioning into odd and even chromosomes in steps 1–2 and the exclusion of the target chromosome in step 4 prevents winner’s curse; and the re-estimation of per-SNP heritabilities in step 4 ensures robustness to model misspecification.PolyFun specifies prior causal probabilities in proportion to per-SNP heritability estimates:
where β is the causal effect size of SNP i in standardized units (the number of standard deviations increase in phenotype per 1 standard deviation increase in genotype), is the vector of functional annotations of SNP i, and var[β|] is the estimated per-SNP heritability of SNP i from step 4 (Methods).A key distinction between PolyFun and previous functionally-informed fine-mapping methods[10,18-20] is the use of the entire genome and a large number of functional annotations to estimate prior causal probabilities. We exploited the computational scalability of PolyFun (together with SuSiE[21]) to fine-map up to 2,763 overlapping 3Mb loci spanning the entire genome (Methods). We subsequently used our fine-mapping results to perform polygenic localization, identifying minimal sets of common SNPs causally explaining a given proportion of common SNP heritability. Details of the PolyFun method are provided in the Methods section; we have released open-source software implementing PolyFun in conjunction with SuSiE[21] and FINEMAP[22]. In all main simulations and analyses of real traits, we applied PolyFun using summary LD information estimated directly from the target samples (both for running S-LDSC and for running SuSiE or FINEMAP), as previously recommended for fine-mapping methods[12,28].
Main simulations
We evaluated PolyFun via simulations using real genotypes from 337,491 unrelated UK Biobank British samples[27]. We analyzed 10 3Mb loci on chromosome 1, each containing 1,468–27,784 imputed MAF≥0.001 SNPs (including short indels; Supplementary Table 2). We estimated prior causal probabilities using 18,212,157 genome-wide imputed MAF≥0.001 SNPs with INFO score≥0.6. We simulated traits with heritability equal to 25% and genome-wide proportion of causal SNPs equal to 0.5%, with each target locus including 10 causal SNPs jointly explaining heritability of 0.05%. We specified prior causal probabilities using the baseline-LF model[25] with meta-analyzed functional enrichments from real data analyses (Supplementary Table 3). We generated summary statistics using N=320K samples. Further details are provided in the Methods section.We evaluated 10 fine-mapping methods (Methods, Table 1). We assessed calibration via the proportion of false positives among SNPs with posterior causal probability (posterior inclusion probability; PIP) above a given threshold (e.g. PIP>0.95), aggregating the results across all simulations; we refer to this quantity as the false discovery rate (FDR). For each PIP threshold, we estimated the FDR as one minus the PIP threshold, which is more conservative than an exact estimate (Figure 1a–b, Supplementary Note, Supplementary Table 4). No method except CAVIARBF2- and CAVIARBF2 had significantly inflated false discovery rates, although fastPAINTOR and CAVIARBF1 had suggestive evidence of inflated false discovery rates. We assessed power via the proportion of true causal SNPs with PIP above a given threshold, aggregating the results across all simulations. PolyFun + FINEMAP was the most powerful method, identifying >5% more PIP>0.95 causal SNPs than PolyFun + SuSiE and >20% more PIP>0.95 causal SNPs than FINEMAP; PolyFun + SuSiE was the second most powerful method, identifying >25% more PIP>0.95 causal SNPs than SuSiE (Figure 1c–d, Supplementary Table 4). These results demonstrate the benefits of prioritizing SNPs using functional annotations.
Table 1:
Summary of methods evaluated in main simulations.
For each method we report whether it incorporates functional data, the maximum number of functional annotations that we specified under default simulation settings (for fastPAINTOR we selected the number of annotations that maximized power while maintaining correct calibration; Methods), the maximum number of causal SNPs modeled per locus (or the exact number for SuSiE and PolyFun + SuSiE), and the corresponding reference. For fastPAINTOR and CAVIARBF, − denotes the exclusion of functional data. For CAVIARBF, 1 or 2 denotes the maximum number of causal variants. PolyFun + FINEMAP uses a new version of FINEMAP that we introduce here that incorporates prior causal probabilities.
Method
Functional data
Max #annotations
Max #causal SNPs
Ref.
fastPAINTOR−
No
N/A
Unlimited
[19]
fastPAINTOR
Yes
10
Unlimited
[19]
CAVIARBF1−
No
N/A
1
[6]
CAVIARBF1
Yes
Unlimited
1
[20]
CAVIARBF2−
No
N/A
2
[6]
CAVIARBF2
Yes
Unlimited
2
[20]
FINEMAP
No
N/A
10
[22,23]
PolyFun + FINEMAP
Yes
Unlimited
10
This paper
SuSiE
No
N/A
10
[21]
PolyFun + SuSiE
Yes
Unlimited
10
This paper
Figure 1:
Calibration, power and computational cost of fine-mapping methods in main simulations.
(a-b) FDR at PIP=0.95 (a) and PIP=0.5 (b). Upper dashed horizontal lines denote conservative FDR estimates. Lower dotted horizontal lines denote anti-conservative FDR estimates, which are not recommended (Supplementary Note). (c-d) Power at PIP=0.95 (c) and PIP=0.5 (d). The first bar of each method uses non-functionally informed fine-mapping (denoted −), and the second uses functionally informed fine-mapping (denoted +). (e) The average runtime required to fine-map a 3Mb locus in a genome-wide analysis (log scale). The first bar of each method uses non-functionally informed fine-mapping (denoted −), and the second uses functionally informed fine-mapping (denoted +). (f) The total runtime required to fine-map different numbers of loci, for functionally informed fine-mapping methods only (log scale). The runtimes of PolyFun + SuSiE and PolyFun + FINEMAP are sub-linear because they include the fixed preprocessing cost of computing prior causal probabilities (630 minutes). Error bars denote standard errors. Numerical results, including results for CAVIARBF2− and CAVIARBF2, and including panel (f) results for non-functionally informed methods, are reported in Supplementary Table 4.
We evaluated the computational cost of each method. SuSiE and PolyFun + SuSiE were much faster than the other methods, fine-mapping a 3Mb locus in 5 minutes on average (excluding fixed preprocessing time; see below) (Figure 1e, Supplementary Table 4). CAVIARBF methods allowing >2 causal SNPs per locus were not evaluated due to prohibitively slow computation time. PolyFun also requires fixed preprocessing time (steps 1–4; see Overview of methods) of 630 minutes on average; when restricting analyses to subsets of loci, PolyFun + SuSiE was still faster than all other functionally-informed methods when analyzing >23 loci (Figure 1f).We performed additional experiments to assess the robustness of PolyFun to model misspecification of functional architectures, to assess the individual impact of each of steps 1–5 of PolyFun on fine-mapping performance, and to explore additional simulation settings (Supplementary Note, Extended Data Figures 1–5, Supplementary Tables 4–6).
Extended Data Fig. 1:
Assessing the individual impact of step 1 of PolyFun (estimating functional enrichment) via perturbation analysis, by randomly shuffling different proportions of annotation coefficient estimates.
For each evaluated value of the proportion of shuffled annotation coefficient estimates, we report the number of experiments having each obtained FDR level >0 (left panel) and the number of experiments having each obtained power level >0 (right panel), out of 1000 experiments. FDR and power are reported with respect to identifying PIP≥0.95 SNPs. Experiments with FDR=0 (resp. power=0) are not reported in the left panel (resp. right panel) to improve clarity. Numerical reports are provided in Supplementary Table 6.
Extended Data Fig. 5:
Assessing the individual impact of step 5 of PolyFun (specifying prior causal probabilities in proportion of the re-estimated per-SNP heritabilities) via perturbation analysis, by randomly permuting estimated prior causal probabilities.
The figure is similar to Extended Data Figure 1 but applies a different perturbation (randomly permuting estimated prior causal probabilities). Numerical reports are provided in Supplementary Table 6.
We conclude from these experiments that PolyFun + FINEMAP and PolyFun + SuSiE outperformed all other methods, with a 3.4x faster runtime for PolyFun + SuSiE. Thus, we restricted our analyses in the remainder of this manuscript to SuSiE and PolyFun + SuSiE.
Simulations with mismatched reference LD
Our main simulations used in-sample LD computed directly from the target samples. Although we have publicly released summary LD information for British-ancestry UK Biobank samples as part of this study, there are many settings in which researchers conducting fine-mapping cannot obtain in-sample LD, and instead use LD information from an external LD reference panel[29]. We performed extensive simulations to assess how fine-mapping performance is impacted by LD mismatch between the target sample and the LD reference panel. We specifically considered (1) non-overlapping target and reference samples; (2) sample sizes of the target sample and reference panel; (3) differences in ancestry; (4) presence of related individuals in the target sample; and (5) SNPs available for analysis in the target sample and reference panel.We performed 19 experiments, described in detail in Table 3, in the Supplementary Note and in Supplementary Table 7. We quantified how mismatched reference LD impacts fine-mapping performance via the maximum number of assumed causal SNPs per locus (denoted as L) that maintains FDR<0.05 at a PIP=0.95 threshold. Based on these experiments we provide fine-mapping best-practice recommendations: (1) PolyFun + SuSiE should ideally use in-sample LD from the GWAS target sample, with L=10; (2) PolyFun + SuSiE can alternatively use a non-overlapping LD reference panel from the target population spanning ≥10% of the target sample size, with L=10; (3), PolyFun + SuSiE can be used without an LD reference panel by specifying L=1. We caution that using an LD reference panel with even subtle population differences with L>1 may lead to false positive results; (4) PolyFun + SuSiE can be used in the presence of related individuals in the target sample (but these results apply to the typical levels of relatedness observed in UK Biobank); and (5) PolyFun + SuSiE should include as many well-imputed SNPs from the target locus as possible to minimize the risk of omitting causal SNPs. The real-world implications of these best-practice recommendations are discussed in the Discussion.
Table 3:
Summary of mismatched reference LD simulations.
For each experiment (exp) we report: (GWAS) The sample size and population of the target sample (UK denotes British-ancestry individuals from UK Biobank; EUR denotes non-British European-ancestry individuals from UK Biobank, REL indicates that pairs of related individuals are included in the sample); (LD) the sample size and population of the LD reference panel (UK denotes British-ancestry individuals from UK Biobank; UK10K denotes individuals from the UK10K cohort; numbers in parentheses indicate how many individuals overlap the target sample, if any; “none” indicates that there is no LD reference panel); (Generative SNPs) The set of SNPs from which we sampled causal SNPs (UKB: the set of UK Biobank imputed SNPs with INFO score >0.6 and UKB MAF>0.1%; UK10K: the set of UK10K SNPs; INF: the set of UKB imputed SNPs with INFO score >0.9; COM: the set of UKB imputed SNPs with MAF >1% in British-ancestry individuals); (SNPs analyzed) the set of SNPs that was used for fine-mapping; and (max. L) The maximum number of causal SNPs per locus assumed by PolyFun + SuSiE that maintains FDR<0.05 at a PIP=0.95 threshold (selected from the options 1,2,3,10; - indicates that none of these options maintains FDR<0.05). Horizontal lines indicate the partitioning into types of experiments described in the Supplementary Note. Numerical results are reported in Supplementary Table 7.
exp
GWAS
LD
Generative SNPs
SNPs analyzed
max. L
a
44K UK
44K UK (44K overlap)
UKB
UKB
10
b
44K UK
44K UK
UKB
UKB
10
c
44K UK
4K UK
UKB
UKB
10
d
44K UK
400 UK
UKB
UKB
1
e
44K UK
none
UKB
UKB
1
f
293K UK
44K UK
UKB
UKB
10
g
293K UK
4K UK
UKB
UKB
2
h
293K UK
4K UK (4K overlap)
UKB
UKB
2
i
44K EUR
44K UK
UKB
UKB
3
j
44K EUR
4K UK
UKB
UKB
2
k
44K EUR
400 UK
UKB
UKB
1
l
22K EUR+22K UK
44K UK (22K overlap)
UKB
UKB
3
m
44K UK-REL
44K UK
UKB
UKB
10
n
44K EUR-REL
44K UK
UKB
UKB
3
o
44K UK
3.6K UK10K
UKB
UK10K∩UKB
-
p
44K UK
3.6K UK10K
UK10K∩UKB
UK10K∩UKB
2
q
44K UK
3.6K UK10K
UK10K∩UKB∩INF
UK10K∩UKB∩INF
10
r
44K UK
3.6K UK10K
UK10K∩UKB∩COM
UK10K∩UKB∩COM
1
s
44K UK
4K UK
UK10K∩UKB
UK10K∩UKB
10
Functionally informed fine-mapping of 49 complex traits
We applied PolyFun + SuSiE to fine-map 49 traits in the UK Biobank, including 33 traits analyzed in refs. [30,31], 9 blood cell traits analyzed in ref. [12], and 7 metabolic traits (average N=318K; Supplementary Table 8). For each trait we fine-mapped up to 2,763 overlapping 3Mb loci spanning M=18,212,157 imputed MAF≥0.001 SNPs with INFO score≥0.6 (including short indels; excluding three long-range LD regions and loci with close to zero heritability; Methods). We assigned to each SNP its PIP computed using the locus in which it was most central. We have publicly released the PIPs and the prior and posterior means and variances of the causal effect sizes for all SNPs and traits analyzed.PolyFun + SuSiE identified 3,025 PIP>0.95 fine-mapped SNP-trait pairs, a >32% improvement vs. SuSiE; 9,684 PIP>0.5 SNP-trait pairs, a >59% improvement vs. SuSiE; and 225,153 PIP>0.05 SNP-trait pairs, a >84% improvement vs. SuSiE (Supplementary Table 9). The number of PIP>0.95 SNPs per trait ranged from 0 (number of children) to 407 (height) (Figure 2a, Supplementary Table 9). The 3,025 PIP>0.95 SNP-trait pairs spanned 2,225 unique SNPs, including 532 low-frequency SNPs (0.0050.95 SNPs were also lead GWAS SNPs (defined as MAF>0.001 SNPs with P<5×10−8 and no MAF>0.001 SNP with a smaller p-value within 1Mb) (Supplementary Table 10), demonstrating the importance of using fine-mapped SNPs rather than lead GWAS SNPs for downstream analysis. 31% of the PIP>0.95 SNPs resided in coding regions and 22% were non-synonymous (broadly consistent with previous fine-mapping studies[8,12]) (Supplementary Table 10). When restricting the analysis to 16 genetically uncorrelated traits (|r|<0.2; Methods and Supplementary Tables 11–12) we identified 1,626 PIP>0.95 SNP-trait pairs spanning 1,496 unique SNPs, with a median distance of 9kb between a PIP>0.95 SNP and the nearest lead GWAS SNP for the same trait (Supplementary Table 10). The 16 genetically uncorrelated traits included 5,314 genome-wide significant locus-trait pairs (defined by 1Mb windows around lead GWAS SNPs) harboring 0.28 PIP>0.95 SNPs per locus on average (Supplementary Table 13); 1,080 of the 5,314 locus-trait pairs (20%) harbored ≥1 PIP>0.95 SNP(s), harboring 1.37 PIP>0.95 SNPs on average (Supplementary Table 13). 150 of the 1,626 SNP-trait pairs identified by PolyFun + SuSiE PIP>0.95 (9.2%) did not lie within genome-wide significant loci, and 161 of the 1,626 SNP-trait pairs (9.9%) had P>5×10−8 (Supplementary Table 10).
Figure 2:
Summary of fine-mapping results for UK Biobank traits.
(a) the number of SNPs with PIP>0.95 identified by SuSiE (black bars) and PolyFun + SuSiE (gray bars) across 16 genetically uncorrelated traits in the UK Biobank. Traits are ordered by PolyFun + SuSiE results. The numbers in the legend refer to the sum of all 49 traits analyzed. (b) The proportion of MAF>0.001 SNP-heritability () tagged by lead GWAS SNPs (gray bars) and by PolyFun + SuSiE PIP>0.95 SNPs (black bars). Traits are ordered as in panel (a). For hair color, the tagged by PIP>0.95 SNPs is greater than tagged by lead GWAS SNPs. MPV: Mean platelet volume; BMD: bone mineral density; MCH: mean corpuscular hemoglobin; MC: monocyte count; HLSRC: high light scatter reticulocyte count; FEV1/FVC: ratio of forced expiratory volume to forced vital capacity; DBP: diastolic blood pressure; FVC: forced vital capacity. Numerical results are reported in Supplementary Tables 9,14.
We estimated the SNP-heritability () tagged by PIP>0.95 fine-mapped SNPs (which is likely to be close to the heritability causally explained by these SNPs, if most of the tagged SNP-heritability originates from PIP>0.95 SNPs). The tagged by PIP>0.95 SNPs captured a large proportion of the tagged by lead GWAS SNPs (median proportion=42%; Figure 2b, Methods, Supplementary Table 14). This proportion was substantially larger than the proportion of GWAS loci harboring PIP>0.95 SNPs (20%; see above), as fine-mapping power is higher at loci with larger causal effects (Supplementary Table 4). However, fine-mapped SNPs tagged a smaller proportion of total MAF>0.001 (median proportion=19%; Figure 2b, Methods, Supplementary Table 14), indicating that substantially larger sample sizes are required to comprehensively fine-map all heritable SNP effects.Among the 2,225 unique PIP>0.95 SNPs fine-mapped for at least one trait, 223 SNPs were fine-mapped for multiple genetically uncorrelated traits (selecting a different subset of genetically uncorrelated traits for each SNP; Methods), including 55 SNPs fine-mapped for ≥3 genetically uncorrelated traits, indicating pervasive pleiotropy (Extended Data Figure 6, Supplementary Table 15). 118 pleiotropic SNPs resided in coding regions and 93 were non-synonymous (Supplementary Table 15). The 17 SNPs fine-mapped for at least 4 traits are reported in Table 2. Previous studies have reported that genetically uncorrelated traits often share association signals at the same loci[32], but did not fine-map those signals to individual SNPs as performed here.
Extended Data Fig. 6:
Visualization of fine-mapping results for UK Biobank traits.
We display an ideogram of all 2,225 PIP>0.95 fine-mapped SNPs identified by PolyFun + SuSiE across 49 UK Biobank traits. Traits are color-coded into groups (see legend and Supplementary Table 8). White circles indicate SNPs that are pleiotropic for ≥2 genetically uncorrelated traits, with circles to the right of a white circle denoting the genetically uncorrelated traits (max of 5 colored circles due to space limitations). Numerical results are reported in Supplementary Table 10.
Table 2:
Pleiotropic fine-mapped SNPs for UK Biobank traits.
We report SNPs fine-mapped (PIP>0.95) for ≥4 genetically uncorrelated traits (|r|<0.2). For each SNP we report its name (SNP), position (hg19), MAF in the UK Biobank, closest gene(s) (using data from the GWAS catalog[64]), top annotation (Methods) and fine-mapped traits (and the number of fine-mapped traits). SNPs are ordered first by the number of fine-mapped traits and then by genomic position. HDL: HDL cholesterol; MC: monocyte count; MPV: mean platelet volume; HLSRC: high light scatter reticulocyte count; Cholesterol: total cholesterol; RBCDW: red blood cell distribution width; FEV1/FVC: ratio of forced expiratory volume to forced vital capacity; MCH: mean corpuscular hemoglobin; SBP: systolic blood pressure; DBP: diastolic blood pressure; FVC: forced vital capacity; Cardiovascular: cardiovascular-related disease; RBC: red blood cell count; LC: lymphocyte count; HbA1c: Hemoglobin A1c; WHR: waist-hip ratio (adjusted for BMI). Results for all 223 pleiotropic fine-mapped SNPs are reported in Supplementary Table 15.
Age Menarche, Cardiovascular, Height, Platelet Count (4)
rs3918226
chr7:150690176
0.08
NOS3
Conserved
Eczema, Height, High Cholesterol, MPV (4)
rs150813342
chr9:135864513
0.01
GFI1B
Conserved
Eosinophil Count, HLSRC, MCH, Platelet Count (4)
rs964184
chr11:116648917
0.13
ZPR1
DHS
Cholesterol, MPV, RBCDW, Vitamin D (4)
rs35979828
chr12:54685880
0.07
NFE2
Conserved
Eosinophil Count, Platelet Count, RBC, RBCDW (4)
rs2277339
chr12:57146069
0.1
PRIM1
non-synonymous
Height, LC, RBC, RBCDW (4)
rs72681869
chr14:50655357
0.01
SOS2
non-synonymous
FVC, Hair Color, HbA1c, SBP (4)
rs61745086
chr16:88782050
0.01
PIEZO1,CTU2
non-synonymous
HLSRC, HbA1c, Height, RBC (4)
rs34557412
chr17:16852187
0.01
TNFRSF13B
non-synonymous
HbA1c, MC, MPV, RBC (4)
rs77542162
chr17:67081278
0.02
ABCA6
non-synonymous
HbA1c, Height, LDL, Platelet Count (4)
To better understand the improvement of PolyFun + SuSiE over SuSiE, we examined the 121 loci where PolyFun + SuSiE identified a fine-mapped common SNP (PIP>0.95) but SuSiE did not (PIP<0.5 for all SNPs within 1Mb) (Figure 3 and Supplementary Table 16). In each case, functional annotations prioritized one SNP out of several candidates, greatly improving fine-mapping resolution.
Figure 3:
Examples of the advantages of functionally-informed fine-mapping for UK Biobank traits.
We report four examples where PolyFun + SuSiE identified a fine-mapped common SNP (PIP>0.95) but SuSiE did not (PIP<0.5 for all SNPs within 1Mb). Circles denote PolyFun + SuSiE PIPs and squares denote SuSiE PIPs. SNPs are shaded according to their prior causal probabilities as estimated by PolyFun. The top PolyFun + SuSiE SNP is labeled (next to its PolyFun + SuSiE PIP and its SuSiE PIP). The annotation of each top PolyFun + SuSiE SNP that is most enriched among SuSiE PIP>0.95 SNPs (Methods) is reported in parentheses below its label. Asterisks denote lead GWAS SNPs. Numerical results are reported in Supplementary Table 16.
We validated the motivation for performing functionally-informed fine-mapping by verifying that fine-mapped SNPs are enriched for functional annotations, as previously shown for autoimmune diseases[7,8,10] and blood traits[12] (using non-functionally-informed SuSiE to avoid biasing the results). For each of 50 main binary annotations from the baseline-LF model[24], for various PIP ranges, we computed the functional enrichment of fine-mapped common SNPs in the PIP range, defined as the proportion of common SNPs in the PIP range lying in the annotation divided by the proportion of genome-wide common SNPs lying in the annotation, and meta-analyzed the results across genetically uncorrelated traits (Methods, Figure 4, Supplementary Table 17). PIP>0.95 SNPs were strongly and significantly enriched for non-synonymous SNPs (51x enrichment, P=6.8×10−185) and SNPs in conserved regions (16x enrichment, P<10−300), significantly enriched for SNPs in various regulatory annotations (e.g. promoter-ExAC and H3K4me3), and significantly depleted for SNPs in repressed regions, consistent with previous literature on functional enrichment of fine-mapped SNPs[7,8,10-12] and disease heritability[17,24,25,33]. We observed qualitatively similar but weaker enrichments at lower PIP ranges (Figure 4, Supplementary Table 17).
Figure 4:
Functional enrichment of SuSiE fine-mapped common SNPs for UK Biobank traits.
We report the functional enrichment of fine-mapped common SNPs (defined as the proportion of common SNPs in a PIP range lying in an annotation divided by the proportion of genome-wide common SNPs lying in the annotation) for 5 selected binary annotations, meta-analyzed across 14 genetically uncorrelated UK Biobank traits with ≥10 PIP>0.95 SNPs (log scale). The proportion of common SNPs lying in each binary annotation is reported above its name. The horizontal dashed line denotes no enrichment. Error bars denote standard errors. Numerical results for all 50 main binary annotations and all traits are reported in Supplementary Table 17.
We compared our fine-mapping results to two previous studies. First, we compared our results to ref. [12], which performed non-functionally informed fine-mapping for 9 blood cell traits using approximately 115K of the individuals included in our analyses. PolyFun + SuSiE identified 4.4× more SNPs than ref. [12], including all four SNPs that were functionally validated via luciferase reporter assays in ref. [12] (PIP>0.999 for all four SNPs; Methods, Supplementary Table 18–20). Second, we compared our results to ref. [7], which performed non-functionally-informed fine-mapping for 7 of our traits, using a non-functionally informed method (PICS) and independent smaller data sets. PolyFun + SuSiE identified 35x more SNPs than ref. [7]; Supplementary Tables 21–22). Further details of the comparison are provided in the Supplementary Note.We further performed 6 secondary analyses, described in the Supplementary Note, in Extended Data Figures 7–9, and in Supplementary Tables 10 and 23–28.
Extended Data Fig. 7:
Functional enrichment of PolyFun + SuSiE fine-mapped common SNPs for UK Biobank traits.
The figure is analogous to Figure 4 but uses PIPs computed by PolyFun + SuSiE instead of SuSiE. Numerical results are reported in Supplementary Table 26.
Extended Data Fig. 9:
Functional enrichment of SuSiE fine-mapped low-frequency and rare SNPs for UK Biobank traits.
The figure is analogous to Figure 4 but uses only low-frequency and rare SNPs (0.05>MAF>0.001) instead of common (MAF>0.05) SNPs. Numerical results are reported in Supplementary Table 28.
In summary, we leveraged the improved power of PolyFun + SuSiE to robustly identify thousands of fine-mapped SNPs, providing a rich set of potential candidates for functional follow-up. Our results further indicate pervasive pleiotropy, with many SNPs fine-mapped for two or more genetically uncorrelated traits.
Polygenic localization of 49 complex traits
PIP>0.95 SNPs tag a large proportion of the SNP-heritability () tagged by lead GWAS SNPs (median proportion=42%) but a small proportion of total genome-wide (median proportion=19%) (Figure 2b), implying that they causally explain a small proportion of . We thus propose polygenic localization, whose aim is to identify a minimal set of common SNPs causally explaining a specified proportion of common SNP heritability. A key difference between polygenic localization and previous studies of polygenicity[34-38] is that polygenic localization aims to identify (not just characterize) such SNPs.Given a ranking of SNPs by posterior per-SNP heritability (i.e., the posterior mean of their squared effect size; see Methods), we define M50% as the size of the smallest set of top-ranked common SNPs causally explaining 50% of common SNP heritability (resp. M for proportion p of common SNP heritability). We estimate M50% (resp. M) by (1) partitioning SNPs into 50 ranked bins of similar posterior per-SNP heritability estimates from PolyFun + SuSiE and stratifying the lowest-heritability bin into 10 equally-sized MAF bins, yielding 59 bins; (2) running S-LDSC using a different set of samples to re-estimate the average per-SNP heritability in each bin; and (3) computing the number of top-ranked common SNPs (with respect to the original ranking) whose estimated per-SNP heritabilities (from step 2) sum up to 50% (resp. the proportion p) of the total estimated SNP-heritability. We refer to this method as PolyLoc. The analysis of new samples in step 2 of PolyLoc prevents winner’s curse; although PolyFun + SuSiE is robust to winner’s curse, PolyLoc would be susceptible to winner’s curse if it reused the data analyzed by PolyFun + SuSiE. We note that M50% relies on an empirical ranking and is thus larger than the size of the smallest set of SNPs causally explaining 50% of common SNP heritability, denoted as . We performed extensive simulations to confirm that PolyLoc produced robust upper bounds of (Supplementary Note, Supplementary Tables 29–30). Further details of PolyLoc are provided in the Methods section; we have released open source software implementing PolyLoc.We applied PolyLoc to the 49 complex traits from the UK Biobank (Supplementary Table 8). We ranked SNPs using N=337K unrelated British ancestry samples (steps 1–2) and re-estimated average per-SNP heritabilities in each of 59 SNP bins using S-LDSC applied to N=122K European-ancestry UK Biobank samples that were not included in the N=337K set to avoid winner’s curse (step 3). Estimates of M50% ranged widely from 28 (hair color) to 3.4K (height) to 2 million (number of children) (Figure 5, Supplementary Table 31). The median estimate of M50% across 16 genetically uncorrelated traits was 8.9K; the median estimate of M5% was 8; and the median estimate of M95% was 4.4 million (of 7.0 million total common SNPs) (Supplementary Table 31). Pigmentation traits were the least polygenic traits while number of children was the most polygenic trait, having M50% 3.7x larger than the second most polygenic of the 16 independent traits (chronotype, having M50%=553K), consistent with ref. [34]. We performed 7 secondary analyses, described in the Supplementary Note and in Supplementary Tables 32–33. We note that far fewer than 2 million SNPs may causally explain 50% of the common SNP heritability of number of children, because M50% is a (possibly loose) upper bound.
Figure 5:
Polygenic localization results for UK Biobank traits.
(a) M50% estimates across 16 genetically uncorrelated traits. For each trait, we report the number of top-ranked common SNPs (using PolyFun + SuSiE posterior per-SNP heritability estimates for ranking) causally explaining 50% of common SNP heritability, and its standard error (log scale). The horizontal dashed line denotes the total number of common SNPs in the analysis (7.0 million). (b-d) The proportion of common SNP heritability of (b) hair color, (c) height, and (d) number of children explained by different numbers of top-ranked SNPs, for all 7.0 million common SNPs (left) and the 5,000 top-ranked common (right). Gray shading denotes standard errors. Dashed black lines denote a null model with a constant per-SNP heritability. We also report the number of top-ranked SNPs causally explaining 50% of common SNP heritability, denoted M50%. Discontinuities in the slope indicate transitions between SNP bins. Numerical results for all 49 UK Biobank traits are reported in Supplementary Table 31.
Our results demonstrate that half of the common SNP heritability of complex traits is causally explained by typically thousands of SNPs (median M50%=8.9K), and the remaining heritability is spread across an extremely large number of extremely weak-effect SNPs (median M95%=4.4 million), consistent with extremely polygenic but heavy-tailed trait architectures[1,34-36,39-43].
Discussion
We have introduced PolyFun, a framework that improves fine-mapping by prioritizing variants that are a-priori more likely to be causal based on their functional annotations. Across 49 UK Biobank traits, PolyFun + SuSiE confidently fine-mapped 3,025 SNP-trait pairs (PIP >0.95), a 32% increase over non-functionally informed SuSiE. 223 of the fine-mapped SNPs were fine-mapped for multiple genetically uncorrelated traits, indicating pervasive pleiotropy. We further leveraged the results of PolyFun to perform polygenic localization by constructing minimal SNP sets causally explaining a given proportion of common SNP heritability, demonstrating that 50% of common SNP heritability can be explained by sets ranging in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). We note that these set sizes impose a (possibly loose) upper bound on the size of the smallest sets causally explaining 50% of common SNP heritability. We have publicly released the PIPs and the prior and posterior means and variances of effect sizes for all SNPs and traits analyzed.We recommend applying PolyFun using in-sample LD from the GWAS target sample (i.e., using exactly the same samples in both the target and reference samples), assuming 10 causal SNPs per locus; we have facilitated this option for UK Biobank researchers by publicly releasing summary LD information for N=337K British-ancestry UK Biobank samples. As a second-best option we recommend applying PolyFun using LD-reference panel from the target sample population spanning at least 10% of the target sample size, while assuming 10 causal SNPs per locus. However, we caution that even subtle population differences may lead to false positive results. Hence, our published summary LD information files are unsuitable for analysis of summary statistics involving non-British UK Biobank individuals, or data from other cohorts or consortia[44-46]. However, researchers may use larger subsets of UK Biobank data to identify genome-wide significant loci, which they can fine-map using summary statistics and LD reference data based on N=337K British-ancestry individuals. In the absence of a reference panel from the target sample population spanning >10% of the target sample size, we recommend applying PolyFun without using an LD reference panel by restricting it to assume a single causal SNP per locus.Our fine-mapping analysis differs from several previous fine-mapping studies in two aspects. First, we applied PolyFun genome-wide. However, we envision that the PolyFun software will primarily be used to fine-map genome-wide significant loci, which harbor most PIP>0.95 SNPs. We discuss possible reasons for identifying PIP>0.95 SNPs with P>5×10−8 in the Supplementary Note. Second, PolyFun fine-maps all signals in a locus jointly to maximize power[5,28]. Researchers wishing to use PolyFun for a partitioned analysis[47] may still do so by first partitioning a locus into multiple signals using a separate tool (e.g. GCTA-COJO[47]) and then applying PolyFun to each signal separately, restricting PolyFun to assume a single causal SNP per signal.Our results provide several opportunities for future work. First, the fine-mapped SNPs that we have identified can be prioritized for functional follow-up. Second, fine-mapping results (posterior mean effect sizes) can be used to compute trans-ethnic polygenic risk scores[48] which may be less sensitive to LD differences between populations than existing methods[49,50]. Third, the proximal pairs of coding and non-coding fine-mapped SNPs that we identified (Supplementary Table 25) may aid efforts to link SNPs to genes[51-53]. Fourth, SNPs that were fine-mapped for multiple genetically uncorrelated traits may shed light on shared biological pathways[54]. Fifth, sets of SNPs causally explaining 50% of common SNP heritability can potentially be used for gene and pathway enrichment analysis[55,56]. Finally, PolyFun can incorporate additional functional annotations at negligible additional computational cost, motivating further efforts to identify conditionally informative annotations.Our work has several limitations. First, our PIP>0.95 FDR estimates for PolyFun and for other methods are conservative, demonstrating the challenges of exact calibration in fine-mapping. Second, subtle population stratification may lead to spurious fine-mapping results[57]. However, our fine-mapped SNPs are concentrated in associated loci with larger estimated effects, which are relatively less likely to be spurious. Third, we restricted fine-mapping to N=337K unrelated British-ancestry individuals, consistent with previous studies[12]. Hence, our published summary LD information files do not support fine-mapping of UK Biobank data that includes non-British individuals. Fourth, PolyLoc requires analyzing samples distinct from the samples analyzed by PolyFun to avoid winner’s curse. Researchers with access to individual-level genetic data can partition the samples as we have done (we recommend using approximately 75% of the data for fine-mapping and 25% for polygenic localization). Fifth, PolyFun does not support X-chromosome analysis. Sixth, PolyLoc only provides an upper bound on the proportion of SNPs causally explaining a given proportion of SNP-heritability. Finally, multi-ethnic fine-mapping[58] and incorporation of tissue-specific functional annotations[9,13,15,17] may further increase fine-mapping power. Incorporating these into our fine-mapping framework is an avenue for future work.
Online Methods
PolyFun fine-mapping method
PolyFun first estimates prior causal probabilities for all SNPs and then applies fine-mapping methods such as SuSiE[21] or FINEMAP[22,23] with these prior causal probabilities. Here we describe estimation of the prior causal probabilities.We model standardized phenotypes y using the linear model , where x denotes standardized SNP genotypes, β denotes effect size, and ϵ is a residual term. We use a point-normal model for β:
where are the functional annotations of SNP i, P(β ≠ 0|) is its prior causal probability, and var[β|β ≠ 0] is its causal variance, which we assume is independent of . This assumption is motivated by our recent work showing that functional enrichment is primarily due to differences in polygenicity rather than differences in effect-size magnitude, which is constrained by negative selection[34].The key quantity that PolyFun uses to estimate prior causal probabilities is the per-SNP heritability of SNP i, var[β|] (we refer to this quantity as per-SNP heritability because the total SNP-heritability var[∑x|] is equal to ∑var[β|], assuming that causal SNP effects have zero mean and are uncorrelated with other SNP effects and with other SNPs conditional on ). PolyFun relates the prior causal probability P(β ≠ 0|) to the per-SNP heritability var[β|] via the law of total variance:Equation 1 in the main text follows because P(β ≠ 0|) is proportional to var[β|] with the proportionality factor 1/var[β|β ≠ 0].To derive Equation 2 we define the causality indicator and apply the law of total variance to var[β|]:The last equality holds because we assume that the causal effect size variance is independent of functional annotations, as explained above.PolyFun avoids directly estimating the proportionality factor 1/var[β|β ≠ 0] by constraining the prior causal probabilities P(β ≠ 0|) in each tested locus to sum to 1.0. This constraints implies that each locus is a-priori expected to harbor one causal SNP, consistent with previous fine-mapping methods[5,6,22] (this constraint is ignored by PolyFun + SuSiE because it is invariant to scaling of prior causal probabilities). Hence, the main challenge is estimating the per-SNP heritabilities var[β|].To estimate var[β|], PolyFun incorporates a regularized extension of S-LDSC with the baseline-LF model[17,24-26], which we extend to a new version 2.2.UKB (Supplementary Table 1, see below). S-LDSC uses the linear model and jointly estimates all τ parameters by minimizing the term , where c are functional annotations, τ is the coefficient of annotation c, is the χ2 statistic of SNP i, n is the sample size, b measures the contribution of confounding biases, and .While S-LDSC produces robust estimates of functional enrichment, it has two limitations in estimating var[β|]: (i) these estimates can have large standard errors in the presence of many annotations, and (ii) the model may not be robust to model misspecification. To address the first limitation, PolyFun incorporates an L2-regularized extension of S-LDSC. To address the second limitation, PolyFun employs special procedures to ensure robustness to model misspecification. The key idea is to approximate arbitrary complex functional forms of var[β|] via a piecewise-constant function. To do this, PolyFun partitions SNPs with similar estimated values of var[β|] (estimated via a possibly misspecified model) into non-overlapping bins; estimates the SNP-heritability causally explained by each bin b; and specifies var[β|] for SNPs in bin b as the SNP-heritability causally explained by bin b divided by the number of SNPs in bin b. PolyFun avoids winner’s curse by using different data for partitioning SNPs and for per-bin heritability estimation.In detail, PolyFun robustly specifies prior causal probabilities for all SNPs on a target locus on a corresponding odd (resp. even) target chromosome via the following procedure:Estimate annotation coefficients and intercepts using only SNPs in even chromosomes via an L2-regularied extension of S-LDSC that minimizes (resp. using and ). We select the regularization strength λ from a geometrically-spaced grid of 100 values ranging from 10−8 to 100, selecting the one that minimizes the average out-of-chromosome error , where r iterates over even (resp. even) chromosomes, and , are the S-LDSC τ and b estimates, respectively, when applied to all SNPs on even chromosomes except for chromosome r (resp. for odd chromosomes).Compute per-SNP heritabilities for each SNP i in an odd chromosome (resp. ).Partition all SNPs into 20 bins with similar values of using the Ckmedian.1d.dp method[59]. This method partitions SNPs into 20 maximally homogenous bins such that the average distance of to the median of the bin of SNP i is minimized. Even though this step uses functional annotations data of the target chromosome it does not use the summary statistics of SNPs in the target chromosome, which ensures robustness to winner’s curse.Apply S-LDSC with non-negativity constraints to estimate per-SNP heritabilities in each of the 20 bins of all SNPs in odd (resp. even) chromosomes except for the target chromosome r (to avoid using the same data that will be used in fine-mapping), denoted . Afterwards, regularize the estimates by setting all values smaller than to , using q = 1/100 by default, and rescaling the estimates to have the same sum (over all genome-wide SNPs) as before. The regularization prevents SNPs from a having a zero per-SNP heritability, which would exclude them from fine-mapping. We did not apply L2-regularization in this step because we require approximately unbiased estimates, and because standard errors are relatively small under a small number of non-overlapping annotations.Specify a prior causal probability proportional to to each SNP that is in bin b and that resides in a target locus in chromosome r, such that the prior causal probabilities in the target locus sum to one.PolyFun uses version 2.2.UKB of the baseline-LF model, which differs from the original baseline-LF model[25] by including MAF≥0.001 SNPs and several new annotations, and omitting annotations that could not be easily extended to account for MAF<0.005 SNPs (Supplementary Table 1). Briefly, we use 187 overlapping functional annotations, including 10 common MAF bins (MAF≥0.05); 10 low-frequency MAF bins (0.05>MAF≥0.001); 6 LD-related annotations for common SNPs (levels of LD, predicted allele age, recombination rate, nucleotide diversity, background selection statistic, CpG content); 5 LD-related annotations for low-frequency SNPs; 40 binary functional annotations for common SNPs; 7 continuous functional annotations for common SNPs; 40 binary functional annotations for low-frequency SNPs; 3 continuous functional annotations for low-frequency SNPs; and 66 annotations constructed via windows around other annotations[17]. We did not include a base annotation that includes all SNPs, because such an annotation is linearly dependent on all the MAF bins when S-LDSC uses the same set of SNPs to compute LD-scores and to estimate annotation coefficients.
Main fine-mapping simulations
We simulated summary statistics for 18,212,157 genotyped and imputed MAF≥0.001 autosomal SNPs with INFO score≥0.6 (including short indels, excluding three long-range LD regions; see below), using N=337,491 unrelated British-ancestry individuals from UK Biobank release 3. In most simulations we computed an effect variance β for every SNP i with annotations using the baseline-LF (version 2.2.UKB) model, , where c are annotations and τ estimates are taken from a fixed-effects meta-analysis of 16 well-powered genetically uncorrelated (|r|<0.2) UK Biobank traits, scaled such that is the same across all traits (Supplementary Table 3). In some simulations we generated values of var[β|] under alternative functional architectures to evaluate the robustness of PolyFun to modeling misspecification (Supplementary Note). Each SNP was set to be causal with probability proportional to var[β|], such that the average causal probability was equal to the desired proportion of causal SNPs. We provide technical details about the simulations in the Supplementary Note.We performed fine-mapping in each of the 10 selected 3Mb loci on chromosome 1 using methods based on SuSiE[21], FINEMAP[22,23], CAVIARBF[20] and fastPAINTOR[19]. Following previous literature[12,28] all methods used in-sample LD (i.e., summary LD information based on the genotypes of the same 337,491 individuals used to generate summary statistics), computed via LDstore[28]. For fastPAINTOR-, fastPAINTOR, SuSiE, and PolyFun + SuSiE, we specified a causal effect size variance using an estimator that we developed based on a modified version of HESS[60] rather than using the estimator implemented in these methods, because it improved false discovery rate and power in most simulation settings (Supplementary Note, Supplementary Table 4).We ran SuSiE 0.7.1.0487 with default values for all parameters except the following: (1) We used 10 causal SNPs per locus; and (2) we estimated a per-locus causal effect size variance (the scaled_prior_variance parameter) via our modified HESS approach. We specified prior causal probabilities via the prior_weights parameter. We modified the SuSiE source code to avoid performing the LD matrix diagnostics (positive-definiteness and symmetry) because they greatly increased memory consumption.We ran FINEMAP 1.3.1.b with a maximum of 10 causal SNPs per locus and with default settings for all other parameters. We specified prior causal probabilities via the –prior-snps argument.We ran CAVIARBF 0.2.1 with an AIC-based parameter selection, using ridge regression with regularization parameter λ selected from {2−10, 2−5, 2−2.5, 20, 22.5, 25, 100, 1000, 10000, 100000}, with a single locus and with up to either 1 or 2 causal SNPs per locus, owing to computational limitations.We ran fastPAINTOR 3.1 in MCMC mode. We specified a per-locus causal effect size variance (specified via the -variance argument) using our modified HESS approach (as in PolyFun + SuSiE). We avoided truncating the LD matrix (using prop_ld_eigenvalues=1.0) because we used in-sample summary LD information. As fastPAINTOR is generally not designed to work with >10 annotations[18,19] (and was too slow in our simulations to estimate the significance of each annotation and include only conditionally significant annotations as done in ref. [18]), we selected a subset of 10 highly informative annotations by (1) scoring each annotation based on its average contribution to effect variance across all SNPs, using the true τ of the generative model; (2) iteratively selecting top-ranked annotations such that no annotation has correlation >0.3 (in absolute value) with a previously selected annotation, until selecting 10 annotations. We determined that 10 annotations yielded approximately optimal power while maintaining correct calibration (Supplementary Table 4).For each PIP threshold, we conservatively estimated false discovery rates by setting all PIPs greater than the threshold to the threshold, yielding a uniform false-discovery threshold (Supplementary Note, Supplementary Table 4).We computed p-values of FDR differences and of power differences of analyses with perturbed PolyFun steps via a Wald test, using a jackknife over simulated datasets to estimate standard errors (Supplementary Note).Our mismatched reference LD simulations differed from our main simulations in several ways: (i) we generated summary statistics using up to N=44K unrelated (or related) European-ancestry (British or non-British) UK Biobank target samples in most experiments, compared with N=320K in our main simulations, because the UK Biobank includes only 44K unrelated UK Biobank individuals of non-British European ancestry (we used N=293K unrelated British-ancestry UK Biobank target samples in a subset of experiments to more closely match our main simulations); (ii) we computed summary LD information using either N=400, N=4,000, or N=44K unrelated British-ancestry UK Biobank reference samples (either non-overlapping or overlapping with the target samples), or using N=3,567 reference samples from the UK10K cohort[61] (compared with in-sample LD based on the target samples in the main simulations); (iii) we generated summary statistics using individual level genotypes rather than summary LD information (as required when the target sample and the LD reference panel are not the same); (iv) we simulated 3 causal SNPs per locus that jointly explain 0.5% of trait variance, compared with 10 causal SNPs that jointly explain 0.05% of trait variance in our main simulations, to obtain sufficient power despite having a smaller sample size; and (v) in some experiments we used a subset of SNPs for generating causal SNPs or for fine-mapping analysis. We provide technical details of these simulations in the Supplementary Note.
Functionally informed fine-mapping of 49 complex traits in the UK Biobank
We applied SuSiE and PolyFun + SuSiE to fine-map 49 traits in the UK Biobank, using the same data and the same parameter settings described in the Fine-mapping simulations section. We performed basic QC on each trait as described in our previous publications[30,31]. Specifically, we removed outliers outside the reasonable range for each quantitative trait, and applied quantile normalizing within sex strata after correcting for covariates for non-binary traits with non-normal distributions. We computed summary statistics with BOLT-LMM v2.3.3[31] adjusting for sex, age and age squared, assessment center, genotyping platform, and the top 20 principal components (computed as described in ref. [31]), and dilution factor for biochemical traits. As the non-infinitesimal version of BOLT-LMM does not estimate effect sizes, we computed z-scores for fine-mapping by taking the square root of the BOLT-LMM χ2 statistics and multiplying them by the sign of the effect estimate from the infinitesimal version of BOLT-LMM.We partitioned all autosomal chromosomes into 2,763 overlapping 3Mb-long loci with a 1Mb spacing between the start points of consecutive loci. We computed a PIP for each SNP based on the locus whose center was closest to the SNP (excluding SNPs >1Mb away from the closest center and loci wherein all SNPs had squared marginal effect sizes smaller than 0.00005). We excluded the MHC region (chr6 25.5M-33.5M) and two other long-range LD regions (chr8 8M-12M, chr11 46M-57M)[62] from all analyses, following our observations that both FINEMAP and SuSiE tend to produce spurious results in these regions, finding many PIP=1 SNPs across many traits regardless of their BOLT-LMM p-values. We verified that other previously reported long-range LD regions[62] do not harbor a disproportionate number of PIP>0.95 SNPs. We specified per-locus causal effect variances for SuSiE and PolyFun + SuSiE via our modified HESS approach. For all S-LDSC and fine-mapping analyses we specified a sample size corresponding to the BOLT-LMM effective sample size[31] (given by the true sample size multiplied by the median ratio between χ2 statistics of BOLT-LMM and linear regression across SNPs having BOLT-LMM χ2>30).All S-LDSC analyses used LD scores computed from in-sample summary LD information (based on imputed SNP dosages rather than sequenced genotypes as in previous publications[24-26], assigning to each SNP the LD score computed in the locus in which it was most central) because they provide better coverage of low-frequency SNPs and are consistent with the fine-mapping analyses. We computed genetic correlations with LDSC, using the same summary statistics used for fine-mapping and restricting the analysis to common SNPs.We selected a subset of 16 genetically uncorrelated traits by ranking all traits according to the number of PolyFun + SuSiE PIP>0.95 SNPs and greedily selecting top-ranked traits such that no selected trait has |r|>0.2 with a previously selected trait, excluding traits having either (1) estimates <0.05 in either the PolyFun dataset (N=337K) or in the PolyLoc dataset (N=122K) (see estimation description below); or (2) traits with an effective sample size <100K in the N=337K dataset (using 4/(1/#cases + 1/#controls) for binary traits).We estimated tagged by PIP>0.95 SNPs and by lead GWAS SNPs via a multivariate linear regression. We regressed all the covariates used in BOLT-LMM out of the phenotypes, performed multivariate linear regression on the residuals (using all PIP>0.95 SNPs as explanatory variables) and reported the adjusted R2 as the tagged by these SNPs. We verified that the results remained nearly identical regardless of whether we excluded related individuals (Supplementary Table 14). We estimated MAF>0.001 SNP-heritability for trait selection and for Figure 2b by running S-LDSC with all the baseline-LF annotations. We overrode the automatic removal of very large effect SNPs employed by S-LDSC for hair color, because this removal led to estimates that were smaller than the linear regression-based estimates, due to the large proportion of SNP-heritability originating from very large-effect SNPs.We defined top annotations for Table 2, Figure 3, and Supplementary Tables 15–16 by first ranking all annotations according to their functional enrichment among PIP>0.95 SNPs (as in Figure 4; see below), and associating each SNP with its top ranked annotation, using meta-analyzed enrichment.We selected a subset of genetically uncorrelated traits for each SNP (used in Extended Data Figure 6, Table 2, and Supplementary Table 15), aiming to select traits from a diverse a set of groups as possible (anthropometric, lipids/metabolic, blood, cardiovascular/metabolic disease, other; Extended Data Figure 6, Supplementary Table 8). To this aim, we iterated over trait groups cyclically. For each group containing ≥1 unselected traits with PIP>0.95 for the analyzed SNP, we selected the trait having the smallest average |r| with unselected traits from other groups (if there remained any) or from all remaining traits (otherwise), selecting among all traits having |r|<0.2 with previously selected traits, until no more eligible traits remained. We plotted the ideogram in Extended Data Figure 6 with the PhenoGram[63] software.We computed enrichment of functional annotations among fine-mapped SNPs (Figure 4) as the ratio between the proportion of common SNPs with PIP above a given threshold having a specific annotation and the proportion of common SNPs having the annotation. We excluded continuous annotations and annotations constructed via windows around other annotations, and merged concordant annotations for common and low-frequency variants. We computed P-values using Fisher’s exact test (meta-analyzed across traits via Fisher’s method). We computed standard errors by (1) computing the standard error s of the log of the enrichment via the standard formula for the standard error of relative risk (exploiting the fact that enrichment and relative risk are both ratios of proportions); and (2) computing the standard error of the enrichment via (i.e., the standard deviation of the exponent of a normal random variable), where r is the original enrichment estimate (meta-analyzed across traits using a fixed-effects meta-analysis). We excluded traits having <10 PIP>0.95 SNPs from the meta-analysis. The annotations shown in Figure 4 are non-synonymous, Conserved_LindbladToh (denoted Conserved), Human_Promoter_Villar_ExAC (denoted Promoter-ExAC), H3K4me3_Trynka (denoted H3K4me3), and Repressed_Hoffman (denoted Repressed) (see Supplementary Table 1 for details).To compare our fine-mapping results with those of refs.[7,12], we restricted the comparison to SNPs that were not excluded from our fine-mapping procedure (SNPs having MAF≥0.001 in the UK Biobank N=337K dataset, INFO score≥0.6, distance <1Mb away from the closest locus center, and not residing in one of the excluded long-range LD regions). When the same SNP had multiple reported PIPs in ref. [12], we used the entry with the larger PIP. We caution that the comparison with ref. [12] is not a replication analysis because the datasets of ref. [12] and of PolyFun + SuSiE are correlated.We selected five traits for down-sampling analysis (analyzing N=107K individuals) as the set of traits having (1) the largest number of 3Mb loci harboring a genome-wide significant SNP; (2) >10 PIP>0.95 SNPs in the SuSiE N=107K analysis; and (3) |rg|<0.2 with another selected trait.
Polygenic localization
Polygenic localization aims to identify a minimal set of SNPs causally explaining a given proportion of common SNP heritability. To define polygenic localization, we first define , as the smallest integer k such that , where β are standardized SNP effect sizes, s denotes a ranking of such that , and m is the number of common SNPs. Unfortunately, is unknown in practice. Polygenic localization therefore estimates an upper-bound of , denoted as M. We define M as the smallest integer k′ such that , where s′ is a possibly non-optimal ranking of SNPs. We note that by construction. We provide a full derivation of Polygenic localization in the Supplementary Note.We now provide a brief conceptual description of PolyLoc (a full description is provided in the Supplementary Note). Briefly, PolyLoc proceeds by (1) partitioning SNPs with similar posterior mean estimates (using PolyFun + SuSiE estimates) into bins; (2) treating β as a zero-mean random variable and jointly estimating var[β] in every bin using S-LDSC; and (3) finding the smallest integer k such that , where denotes the original ranking of posterior mean estimates from PolyFun + SuSiE. The use of instead of uses the assumption that β has zero mean in each bin. The partitioning into bins in step 1 induces a piecewise-linear approximation of the function
. We use different datasets to estimate posterior means and to estimate var[β] to prevent winner’s curse. Our approach is conservative by design due to using an imperfect ranking compared to the true ranking s1, …, s. The degree of conservativeness is a function of fine-mapping power, and thus depends on factors affecting fine-mapping power such as sample size, levels of LD at causal SNPs, MAFs of causal SNPs, and trait polygenicity.In secondary analyses, we compared PolyLoc to an alternative method that performs polygenic localization based on prior estimates of per-SNP heritability from functional annotations, rather than posterior estimates. This alternative method uses per-SNP heritability estimates and SNP bins from step 4 of PolyFun, based only on the N=337K dataset (noting that it does not suffer from winner’s curse because PolyFun applies a partitioning into odd and even chromosomes).
Data availability
PolyFun fine-mapping results generated in this study are available for public download at http://data.broadinstitute.org/alkesgroup/polyfun_results. Summary LD information generated in this study is available for public download at https://data.broadinstitute.org/alkesgroup/UKBB_LD. Baseline-LF v2.2.UKB annotations and LD-scores for UK Biobank SNPs are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF_v2.2.UKB.tar.gz. Access to the UK Biobank resource is available via application (http://www.ukbiobank.ac.uk).
Code availability
PolyFun and PolyLoc software is available at https://github.com/omerwe/polyfun. SuSiE software is available at https://github.com/stephenslab/susieR. FINEMAP software is available at http://www.christianbenner.com/#.
Assessing the individual impact of step 1 of PolyFun (estimating functional enrichment) via perturbation analysis, by randomly shuffling different proportions of annotation coefficient estimates.
For each evaluated value of the proportion of shuffled annotation coefficient estimates, we report the number of experiments having each obtained FDR level >0 (left panel) and the number of experiments having each obtained power level >0 (right panel), out of 1000 experiments. FDR and power are reported with respect to identifying PIP≥0.95 SNPs. Experiments with FDR=0 (resp. power=0) are not reported in the left panel (resp. right panel) to improve clarity. Numerical reports are provided in Supplementary Table 6.
Assessing the individual impact of step 2 of PolyFun (estimating per-SNP heritabilities on odd/even chromosomes) via perturbation analysis, by using both odd and even chromosomes to estimate functional enrichment.
The figure is similar to Extended Data Figure 1 but applies a different perturbation (using both odd and even chromosomes to estimate functional enrichment). Numerical reports are provided in Supplementary Table 6.
Assessing the individual impact of step 3 of PolyFun (partitioning all SNPs into 20 bins of similar per-SNP heritability) via perturbation analysis, by varying the number of per-SNP heritability bins.
The figure is similar to Extended Data Figure 1 but applies a different perturbation (changing the number of per-SNP heritability bins). Numerical reports are provided in Supplementary Table 6.
Assessing the individual impact of step 4 of PolyFun (re-estimating per-SNP heritabilities within each bin excluding the target chromosome) via perturbation analysis, by not excluding the target chromosome from the re-estimation procedure.
The figure is similar to Extended Data Figure 1 but applies a different perturbation (disables the exclusion of the target chromosome, either when using the default sample size N=320K or when using a smaller sample size of N=10K). Numerical reports are provided in Supplementary Table 6.
Assessing the individual impact of step 5 of PolyFun (specifying prior causal probabilities in proportion of the re-estimated per-SNP heritabilities) via perturbation analysis, by randomly permuting estimated prior causal probabilities.
The figure is similar to Extended Data Figure 1 but applies a different perturbation (randomly permuting estimated prior causal probabilities). Numerical reports are provided in Supplementary Table 6.
Visualization of fine-mapping results for UK Biobank traits.
We display an ideogram of all 2,225 PIP>0.95 fine-mapped SNPs identified by PolyFun + SuSiE across 49 UK Biobank traits. Traits are color-coded into groups (see legend and Supplementary Table 8). White circles indicate SNPs that are pleiotropic for ≥2 genetically uncorrelated traits, with circles to the right of a white circle denoting the genetically uncorrelated traits (max of 5 colored circles due to space limitations). Numerical results are reported in Supplementary Table 10.
Functional enrichment of PolyFun + SuSiE fine-mapped common SNPs for UK Biobank traits.
The figure is analogous to Figure 4 but uses PIPs computed by PolyFun + SuSiE instead of SuSiE. Numerical results are reported in Supplementary Table 26.
Functional enrichment of SuSiE fine-mapped MAF>0.001 SNPs for UK Biobank traits.
The figure is analogous to Figure 4 but uses MAF>0.001 SNPs instead of common (MAF>0.05) SNPs. Numerical results are reported in Supplementary Table 27.
Functional enrichment of SuSiE fine-mapped low-frequency and rare SNPs for UK Biobank traits.
The figure is analogous to Figure 4 but uses only low-frequency and rare SNPs (0.05>MAF>0.001) instead of common (MAF>0.05) SNPs. Numerical results are reported in Supplementary Table 28.
Authors: Wenan Chen; Beth R Larrabee; Inna G Ovsyannikova; Richard B Kennedy; Iana H Haralambieva; Gregory A Poland; Daniel J Schaid Journal: Genetics Date: 2015-05-06 Impact factor: 4.562
Authors: Anubha Mahajan; Daniel Taliun; Matthias Thurner; Neil R Robertson; Jason M Torres; N William Rayner; Anthony J Payne; Valgerdur Steinthorsdottir; Robert A Scott; Niels Grarup; James P Cook; Ellen M Schmidt; Matthias Wuttke; Chloé Sarnowski; Reedik Mägi; Jana Nano; Christian Gieger; Stella Trompet; Cécile Lecoeur; Michael H Preuss; Bram Peter Prins; Xiuqing Guo; Lawrence F Bielak; Jennifer E Below; Donald W Bowden; John Campbell Chambers; Young Jin Kim; Maggie C Y Ng; Lauren E Petty; Xueling Sim; Weihua Zhang; Amanda J Bennett; Jette Bork-Jensen; Chad M Brummett; Mickaël Canouil; Kai-Uwe Ec Kardt; Krista Fischer; Sharon L R Kardia; Florian Kronenberg; Kristi Läll; Ching-Ti Liu; Adam E Locke; Jian'an Luan; Ioanna Ntalla; Vibe Nylander; Sebastian Schönherr; Claudia Schurmann; Loïc Yengo; Erwin P Bottinger; Ivan Brandslund; Cramer Christensen; George Dedoussis; Jose C Florez; Ian Ford; Oscar H Franco; Timothy M Frayling; Vilmantas Giedraitis; Sophie Hackinger; Andrew T Hattersley; Christian Herder; M Arfan Ikram; Martin Ingelsson; Marit E Jørgensen; Torben Jørgensen; Jennifer Kriebel; Johanna Kuusisto; Symen Ligthart; Cecilia M Lindgren; Allan Linneberg; Valeriya Lyssenko; Vasiliki Mamakou; Thomas Meitinger; Karen L Mohlke; Andrew D Morris; Girish Nadkarni; James S Pankow; Annette Peters; Naveed Sattar; Alena Stančáková; Konstantin Strauch; Kent D Taylor; Barbara Thorand; Gudmar Thorleifsson; Unnur Thorsteinsdottir; Jaakko Tuomilehto; Daniel R Witte; Josée Dupuis; Patricia A Peyser; Eleftheria Zeggini; Ruth J F Loos; Philippe Froguel; Erik Ingelsson; Lars Lind; Leif Groop; Markku Laakso; Francis S Collins; J Wouter Jukema; Colin N A Palmer; Harald Grallert; Andres Metspalu; Abbas Dehghan; Anna Köttgen; Goncalo R Abecasis; James B Meigs; Jerome I Rotter; Jonathan Marchini; Oluf Pedersen; Torben Hansen; Claudia Langenberg; Nicholas J Wareham; Kari Stefansson; Anna L Gloyn; Andrew P Morris; Michael Boehnke; Mark I McCarthy Journal: Nat Genet Date: 2018-10-08 Impact factor: 38.330
Authors: Peter M Visscher; Naomi R Wray; Qian Zhang; Pamela Sklar; Mark I McCarthy; Matthew A Brown; Jian Yang Journal: Am J Hum Genet Date: 2017-07-06 Impact factor: 11.025
Authors: Jacob C Ulirsch; Caleb A Lareau; Erik L Bao; Leif S Ludwig; Michael H Guo; Christian Benner; Ansuman T Satpathy; Vinay K Kartha; Rany M Salem; Joel N Hirschhorn; Hilary K Finucane; Martin J Aryee; Jason D Buenrostro; Vijay G Sankaran Journal: Nat Genet Date: 2019-03-11 Impact factor: 38.330
Authors: Harm-Jan Westra; Marta Martínez-Bonet; Suna Onengut-Gumuscu; Annette Lee; Yang Luo; Nikola Teslovich; Jane Worthington; Javier Martin; Tom Huizinga; Lars Klareskog; Solbritt Rantapaa-Dahlqvist; Wei-Min Chen; Aaron Quinlan; John A Todd; Steve Eyre; Peter A Nigrovic; Peter K Gregersen; Stephen S Rich; Soumya Raychaudhuri Journal: Nat Genet Date: 2018-09-17 Impact factor: 41.307
Authors: Anubha Mahajan; Jennifer Wessel; Sara M Willems; Wei Zhao; Neil R Robertson; Audrey Y Chu; Wei Gan; Hidetoshi Kitajima; Daniel Taliun; N William Rayner; Xiuqing Guo; Yingchang Lu; Man Li; Richard A Jensen; Yao Hu; Shaofeng Huo; Kurt K Lohman; Weihua Zhang; James P Cook; Bram Peter Prins; Jason Flannick; Niels Grarup; Vassily Vladimirovich Trubetskoy; Jasmina Kravic; Young Jin Kim; Denis V Rybin; Hanieh Yaghootkar; Martina Müller-Nurasyid; Karina Meidtner; Ruifang Li-Gao; Tibor V Varga; Jonathan Marten; Jin Li; Albert Vernon Smith; Ping An; Symen Ligthart; Stefan Gustafsson; Giovanni Malerba; Ayse Demirkan; Juan Fernandez Tajes; Valgerdur Steinthorsdottir; Matthias Wuttke; Cécile Lecoeur; Michael Preuss; Lawrence F Bielak; Marielisa Graff; Heather M Highland; Anne E Justice; Dajiang J Liu; Eirini Marouli; Gina Marie Peloso; Helen R Warren; Saima Afaq; Shoaib Afzal; Emma Ahlqvist; Peter Almgren; Najaf Amin; Lia B Bang; Alain G Bertoni; Cristina Bombieri; Jette Bork-Jensen; Ivan Brandslund; Jennifer A Brody; Noël P Burtt; Mickaël Canouil; Yii-Der Ida Chen; Yoon Shin Cho; Cramer Christensen; Sophie V Eastwood; Kai-Uwe Eckardt; Krista Fischer; Giovanni Gambaro; Vilmantas Giedraitis; Megan L Grove; Hugoline G de Haan; Sophie Hackinger; Yang Hai; Sohee Han; Anne Tybjærg-Hansen; Marie-France Hivert; Bo Isomaa; Susanne Jäger; Marit E Jørgensen; Torben Jørgensen; Annemari Käräjämäki; Bong-Jo Kim; Sung Soo Kim; Heikki A Koistinen; Peter Kovacs; Jennifer Kriebel; Florian Kronenberg; Kristi Läll; Leslie A Lange; Jung-Jin Lee; Benjamin Lehne; Huaixing Li; Keng-Hung Lin; Allan Linneberg; Ching-Ti Liu; Jun Liu; Marie Loh; Reedik Mägi; Vasiliki Mamakou; Roberta McKean-Cowdin; Girish Nadkarni; Matt Neville; Sune F Nielsen; Ioanna Ntalla; Patricia A Peyser; Wolfgang Rathmann; Kenneth Rice; Stephen S Rich; Line Rode; Olov Rolandsson; Sebastian Schönherr; Elizabeth Selvin; Kerrin S Small; Alena Stančáková; Praveen Surendran; Kent D Taylor; Tanya M Teslovich; Barbara Thorand; Gudmar Thorleifsson; Adrienne Tin; Anke Tönjes; Anette Varbo; Daniel R Witte; Andrew R Wood; Pranav Yajnik; Jie Yao; Loïc Yengo; Robin Young; Philippe Amouyel; Heiner Boeing; Eric Boerwinkle; Erwin P Bottinger; Rajiv Chowdhury; Francis S Collins; George Dedoussis; Abbas Dehghan; Panos Deloukas; Marco M Ferrario; Jean Ferrières; Jose C Florez; Philippe Frossard; Vilmundur Gudnason; Tamara B Harris; Susan R Heckbert; Joanna M M Howson; Martin Ingelsson; Sekar Kathiresan; Frank Kee; Johanna Kuusisto; Claudia Langenberg; Lenore J Launer; Cecilia M Lindgren; Satu Männistö; Thomas Meitinger; Olle Melander; Karen L Mohlke; Marie Moitry; Andrew D Morris; Alison D Murray; Renée de Mutsert; Marju Orho-Melander; Katharine R Owen; Markus Perola; Annette Peters; Michael A Province; Asif Rasheed; Paul M Ridker; Fernando Rivadineira; Frits R Rosendaal; Anders H Rosengren; Veikko Salomaa; Wayne H-H Sheu; Rob Sladek; Blair H Smith; Konstantin Strauch; André G Uitterlinden; Rohit Varma; Cristen J Willer; Matthias Blüher; Adam S Butterworth; John Campbell Chambers; Daniel I Chasman; John Danesh; Cornelia van Duijn; Josée Dupuis; Oscar H Franco; Paul W Franks; Philippe Froguel; Harald Grallert; Leif Groop; Bok-Ghee Han; Torben Hansen; Andrew T Hattersley; Caroline Hayward; Erik Ingelsson; Sharon L R Kardia; Fredrik Karpe; Jaspal Singh Kooner; Anna Köttgen; Kari Kuulasmaa; Markku Laakso; Xu Lin; Lars Lind; Yongmei Liu; Ruth J F Loos; Jonathan Marchini; Andres Metspalu; Dennis Mook-Kanamori; Børge G Nordestgaard; Colin N A Palmer; James S Pankow; Oluf Pedersen; Bruce M Psaty; Rainer Rauramaa; Naveed Sattar; Matthias B Schulze; Nicole Soranzo; Timothy D Spector; Kari Stefansson; Michael Stumvoll; Unnur Thorsteinsdottir; Tiinamaija Tuomi; Jaakko Tuomilehto; Nicholas J Wareham; James G Wilson; Eleftheria Zeggini; Robert A Scott; Inês Barroso; Timothy M Frayling; Mark O Goodarzi; James B Meigs; Michael Boehnke; Danish Saleheen; Andrew P Morris; Jerome I Rotter; Mark I McCarthy Journal: Nat Genet Date: 2018-04-09 Impact factor: 38.330
Authors: Elisa Navarro; Evan Udine; Katia de Paiva Lopes; Madison Parks; Giulietta Riboldi; Brian M Schilder; Jack Humphrey; Gijsje J L Snijders; Ricardo A Vialle; Maojuan Zhuang; Tamjeed Sikder; Charalambos Argyrou; Amanda Allan; Michael J Chao; Kurt Farrell; Brooklyn Henderson; Sarah Simon; Deborah Raymond; Sonya Elango; Roberto A Ortega; Vicki Shanker; Matthew Swan; Carolyn W Zhu; Ritesh Ramdhani; Ruth H Walker; Winona Tse; Mary Sano; Ana C Pereira; Tim Ahfeldt; Alison M Goate; Susan Bressman; John F Crary; Lotje de Witte; Steven Frucht; Rachel Saunders-Pullman; Towfique Raj Journal: Nat Aging Date: 2021-09-14
Authors: Kyoko Watanabe; Philip R Jansen; Jeanne E Savage; Priyanka Nandakumar; Xin Wang; David A Hinds; Joel Gelernter; Daniel F Levey; Renato Polimanti; Murray B Stein; Eus J W Van Someren; August B Smit; Danielle Posthuma Journal: Nat Genet Date: 2022-07-14 Impact factor: 41.307
Authors: Steven Gazal; Omer Weissbrod; Farhad Hormozdiari; Kushal K Dey; Joseph Nasser; Karthik A Jagadeesh; Daniel J Weiner; Huwenbo Shi; Charles P Fulco; Luke J O'Connor; Bogdan Pasaniuc; Jesse M Engreitz; Alkes L Price Journal: Nat Genet Date: 2022-06-06 Impact factor: 41.307
Authors: Mary E Haas; James P Pirruccello; Samuel N Friedman; Minxian Wang; Connor A Emdin; Veeral H Ajmera; Tracey G Simon; Julian R Homburger; Xiuqing Guo; Matthew Budoff; Kathleen E Corey; Alicia Y Zhou; Anthony Philippakis; Patrick T Ellinor; Rohit Loomba; Puneet Batra; Amit V Khera Journal: Cell Genom Date: 2021-12-08
Authors: Qingbo S Wang; David R Kelley; Jacob Ulirsch; Masahiro Kanai; Shuvom Sadhuka; Ran Cui; Carlos Albors; Nathan Cheng; Yukinori Okada; Francois Aguet; Kristin G Ardlie; Daniel G MacArthur; Hilary K Finucane Journal: Nat Commun Date: 2021-06-07 Impact factor: 14.919