Literature DB >> 32310995

Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses.

Abstract

Horizontal integration of summary statistics from different GWAS traits can be used to evaluate evidence for their shared genetic causality. One popular method to do this is a Bayesian method, coloc, which is attractive in requiring only GWAS summary statistics and no linkage disequilibrium estimates and is now being used routinely to perform thousands of comparisons between traits. Here we show that while most users do not adjust default software values, misspecification of prior parameters can substantially alter posterior inference. We suggest data driven methods to derive sensible prior values, and demonstrate how sensitivity analysis can be used to assess robustness of posterior inference. The flexibility of coloc comes at the expense of an unrealistic assumption of a single causal variant per trait. This assumption can be relaxed by stepwise conditioning, but this requires external software and an LD matrix aligned to study alleles. We have now implemented conditioning within coloc, and propose a new alternative method, masking, that does not require LD and approximates conditioning when causal variants are independent. Importantly, masking can be used in combination with conditioning where allelically aligned LD estimates are available for only a single trait. We have implemented these developments in a new version of coloc which we hope will enable more informed choice of priors and overcome the restriction of the single causal variant assumptions in coloc analysis.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32310995 PMCID： PMC7192519 DOI： 10.1371/journal.pgen.1008720

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

As genome-wide association studies (GWAS) have considered a greater diversity of traits in greater numbers of samples, comparative analyses of GWAS results have become a useful tool to explore the aetiological connections between different traits. For example, estimates of genetic correlation obtained via LD score regression quantify the average proportion of genetic variance of two traits that is shared across the genome, [1] although typically large sample sizes are required in both trait studies for accuracy. [2] Linking traits through genetics overcomes at least one major challenge of observational studies, reverse causality, and with careful design, can also address confounding. Epidemiologists have developed and widely deployed the technique of Mendelian randomization (MR), [3] which has been used, for example, to establish causal effects of factors such as alcohol intake on aspects of health. [4] The method uses a genetic variant or variants with established effects on one trait, and assesses whether a second trait is (proportionally) associated with these instrumental variables. Assuming certain assumptions hold true, [5] this provides evidence that the first trait is somehow causal for the second. While MR was originally envisaged as a test of causality of specific risk factors for which tests of causality might be confounded in observational studies, MR has been extended to routinely assess the potential for any GWAS trait to mediate another. [6] However, the ubiquity of genetic effects on some measurable aspect of human physiology or health, which have prompted suggestions of an omnigenic model, [7] raise concerns that LD between causal variants can violate the MR assumption that the instrumental variable is only associated with the outcome through the “mediating” trait. [8] This routine testing of all possible mediators is similar in design to the assessment of potential molecular causes of disease, which has been addressed through alternative approaches that focus not on whether one trait is causal for another, but whether two traits share the same causal variants in a single, LD-defined, genetic region, termed colocalisation. While one such method is built on MR [9] and proceeds by filtering MR-positive associations via a test of heterogeneity in the estimated proportional effect across multiple SNPs in the region, another popular colocalisation method, coloc, [10] avoids MR assumptions altogether. Instead, coloc enumerates every possible configuration of causal variants for each of two traits, and calculates the support for that causal model in the form of a Bayes factor can be calculated under an assumption that at most one causal variant per trait exists in the region (see S1 Text). Each configuration corresponds to exactly one of five mututally exclusive hypotheses about association and genetic sharing in the region: The coloc approach has also been extended beyond pairs of traits, although computational efficiency scales poorly with numbers of traits [11, 12] unless decisions are binarised [13] and to deal with GWAS data that share controls, though at the expense of requiring raw genotype data [11]. As a Bayesian method, coloc requires specification of three informative prior probabilities: p1, p2, p12 are, respectively, the prior probabilities that any random SNP in the region is associated with exactly trait 1, trait 2, or both traits (Fig 1). Although values for these were suggested in the initial proposal, [12] appropriate values should depend on specific datasets used, particularly for p12, and no specific guidance on how this choice should be made was given.

Fig 1

Each hypothesis for coloc analysis H0…H4 may be enumerated by configurations, one configuration per row shown grouped by hypothesis.

Each hypothesis for coloc analysis H0…H4 may be enumerated by configurations, one configuration per row shown grouped by hypothesis.

Each circle in this figure represents one of n genetic variants, and is shaded orange if causal for trait 1, blue if causal for trait 2. There are different numbers of configurations for each hypothesis, depending on the number of SNPs in a region, and the prior is set according to three prior probabilities so that all configurations within a hypothesis are equally likely. One of the strengths of coloc is the simplicity of data required. The assumption of at most one causal variant per trait allows inference to be made through reconstructing joint models across all SNPs from univariate (single SNP) GWAS summary data. [14, 15] Importantly, this requires no reference LD matrix and allows combining data from traits studied in differently structured populations. Further, p-values will suffice if internal or external estimates of minor allele frequency (MAF) are available, so that (unsigned) effect estimates and their standard errors can be re-constructed. However, the single causal variant assumption is convenient rather than realistic and when it does not hold colocalisation effectively tests whether the strongest signals for the two traits colocalise [10] which has been shown to be conservative [16]. e-CAVIAR [17] removes the assumption of a single causal variant per trait by integrating over the fine mapping posteriors for two traits, but requires signed effect estimates that are aligned to a reference LD matrix, that the traits are studied in the same population, and does not allow using any prior knowledge that shared causal variants are more or less likely than distinct variants. Perhaps the most challenging of these is the alignment of signed effect estimates to a reference LD matrix. This can be impossible in the case that signed estimates are not provided due to privacy concerns, [18] or that alleles are not provided. Even where alleles are available, palindromic SNPs (A/T, C/G) cannot be aligned unambiguously particularly for MAF ≈ 0.5. The assumption of a single causal variant in coloc may be relaxed by successively conditioning on the most significant variants for each trait, and testing for colocalisation between each pair of conditioned signals, although this requires either complete genotype data or use of external software such as CoJo [19] together with signed and LD-aligned effect estimates to allow reconstruction of conditional regression effect estimates. To support more accurate coloc analyses, we explored a variety of data-driven approaches to inform prior choice across a range of traits and developed a framework to explore sensitivity of conclusions to the priors used. Further, we implemented an existing conditioning approach in the coloc package, but also developed an alternative approach to conditioning which does not require aligned LD and effect estimates, to offer an option to deal with multiple causal variants which preserves the simplicity of the data required for coloc analyses.

Results

We used Scopus to identify 60 papers which cited coloc [10] and were published in 2018. Out of these, we extracted the subset of 25 papers that were both applied papers (rather than methodological) and for which full text could be accessed (S1 Table). The studies covered a variety of trait pairs, generally integrating a disease GWAS with molecular quantitative trait loci (QTL) data, [20-39] but also comparing pairs of disease GWAS, [40] eQTL and pQTL [41, 42] or eQTL and other molecular traits. [43, 44] Only four studies considered the potential for multiple causal variants in a region, either discussing the implications on their results, or using conditioning in at least one trait, and 22 out of 25 studies used the software default priors across this diverse range of trait pairs. Given that it is likely that the prior probability of colocalisation may depend on the trait pairs under consideration, we decided to evaluate the effect of mis-specifying prior parameters and/or not conditioning when multiple causal variants exist.

The importance and elicitation of prior parameter values

Before examining the robustness of inference to changes in prior values, we elucidate some properties of prior parameters. While priors are expressed per SNP, our hypotheses and posterior relate to a region—a set of n neighbouring SNPs. The prior that one SNP in the region is causally associated with trait 1 is ≈ np1 (and similarly np2 for trait 2, np12 for colocalisation). All these scale with the number of SNPs—the larger the set of SNPs we consider, the greater the chance one of them is causal for any trait. Despite this, the prior odds for H4/H1—colocalisation compared to association of trait 1 only—remains constant at p12/p1. The prior for H3 (two distinct variants for the two traits) is ≈ n(n − 1)p1 p2 which scales with the square of n. This means that prior odds of the two hypotheses of greatest interest, H4/H3, depends not only on the per SNP prior of causality for one or other trait, but also on the number of SNPs in a region, to the extent that the same p1, p2, p12 may favour either H3 or H4 as larger regions are considered (Fig 2). This effect can be understood by noting that both H3 and H4 imply that each trait has exactly one causal variant in the region. Simple combinatorics implies that as the number of SNPs in a region increases, then the number of ways two different SNPs can be causal for the two traits (H3) increases more rapidly than the number of ways one SNP can be causal for both (H4). Hence, H3 becomes relatively more likely than H4 as the number of SNPs in the region increases.

Fig 2

Effects of varying p12 on the prior for H4 (coloured lines) compared to H3 (dashed line) as a function of the number of SNPs in the region.

For all plots p1 = p2 = 10−4 is constant. The coloured squares highlight points P(H3) = P(H4) for different p12.

Effects of varying p12 on the prior for H4 (coloured lines) compared to H3 (dashed line) as a function of the number of SNPs in the region.

For all plots p1 = p2 = 10−4 is constant. The coloured squares highlight points P(H3) = P(H4) for different p12.

Marginal priors

To elicit values for p1, p2, we reparameterise, focusing on the possible marginal events for any SNP: Note that in this notation, A1 and A2 are not mutually exclusive, so that colocalisation is A1 ∩ A2. q1, q2 can be estimated empirically by considering evidence from the wealth of single trait association data that already exists. For eQTLs, we use GTeX data [45] and find that q. is dependent on the MAF of SNPs considered, which reflects variable power with fewer true eQTL variants detectable at lower MAF, and search window around the gene considered as previously noted, tending to 10−4 for common SNPs and windows ∼1 mb (Fig 3).

Fig 3

Determining plausible priors q1, q2.

Determining plausible priors q1, q2.

a q. estimated for eQTLs as the ratio of estimated number of LD-independent significant eQTL variants divided by number of SNPs considered for an eQTL analysis in GTeX whole blood samples in successively larger windows around a gene TSS. Separate lines show findings in 5 equal groups of MAF, with the top and bottom groups labelled. b The number of hits claimed per study according to the GWAS catalog. q. could be estimated as number of hits / number of common SNPs (∼ 2, 000, 000). c Posterior probability of association at a single SNP as a function of -log10 p values for varying values of q.. We considered both case/control and quantitative trait designs, and a range of MAF (0.05-0.5) and sample size (2000,5000,10000). The relationship between -log10 p (x axis) and posterior probability of association (y axis) is consistent across all designs, affected only by the prior probability of association (q1, q2). The vertical line indicates p = 5 × 10−8, the conventional genome-wide significance threshold in European populations. The GWAS Catalog [46] enables us to consider something similar by aggregating over 5000 GWAS studies. We find, as expected, and again as previously noted,[47] that the number of hits per study increases steadily with increasing sample size (Fig 3), but that the count also depends on the class of trait considered, with “harder” endpoints such as breast cancer and heel bone mineral density identifying orders of magnitude more associations compared to “weaker” endpoints such as tendency to strenuous sports or activity levels. The largest studies find ∼ 100–1000 hits out of ∼ 2 million common SNPs leading to estimates that 5 in 10,000–100,000 common SNPs are detectably causal for these traits which corresponds to q. ∈ [5 ×10−5, 5 × 10−4]. Even with the largest studies, these estimates must be considered likely to continue to increase with sample size, and therefore conservative. Using conservative priors for p1, p2 in colocalisation analysis is likely to reduce power to detect either shared or distinct causal variants, because weaker signals may be wrongly interpreted as trait-unique or null. However, estimates from the largest available studies also represent at upper bound on the proportion of variants likely to be detectably associated in any new study from the same class of traits, and therefore relaxing the priors further might result in over-stating the evidence for causal variants and erring towards false detection of shared or distinct causal variants. An alternative approach is to choose the prior according to the p-value that we would consider significant. The threshold of p < 5 × 10−8 has been widely adopted as “genome-wide significant” for GWAS studies in European populations. Across a range of designs (case/control or quantitative trait, with varying MAF and sample size), we see that a prior of q. = 10−4 gives a strong posterior probability of association (≈ 0.94). The default coloc marginal prior of q1 = q2 = 10−4 + p12 ≈ 10−4 is thus supported by the convergence of these three approaches to values of the order of 10−4.

Prior probability of joint or conditional causality

q1 and q2 themselves place some constraints on p12. On the one hand, the chance of joint causality cannot be greater than the chance of causal association with either trait. On the other hand, if traits were independent, then causal variants for each trait would happen to co-occur at the same location with probability q1 × q2. However, simulations show that the distribution of expected posterior probabilities vary considerably with p12 over this range (Fig 4), indicating that we need to make some effort to elicit plausible values. The results suggest that the coloc default of p12= = 10−5 may be overly liberal, with data simulated under H3 having posterior support for H4, particularly for smaller samples, and that p12 = 5 × 10−6 may be a more generally robust choice.

Fig 4

Distribution of expected posterior probabilities across a wide range of simulated data.

Distribution of expected posterior probabilities across a wide range of simulated data.

In all analyses we fixed p2 = p1 = 10−4 and varied p12. Coloured bar heights represent the average posterior probability for each hypothesis over the set of simulations for a given simulated hypothesis and sample size. We consider different approaches to determine data-driven estimation of p12. First, we can set a lower bound if we take into account that not all of the genome is understood to be functional. Estimates of the functional proportion vary considerably, from 25% [48]–80%. [49] Even for traits that are genetically independent, knowing that a SNP is causal for one trait implies it is functional, and thus more likely to be causal for another trait then a random SNP that may or may not be functional. Assuming the proportion of genetic variants that are functional is f, the probability of co-occurence by chance alone is q1 q2/f (see S1 Text). In the case of comparing two GWAS studies, it may be possible to estimate the genetic correlation, r. We show in S1 Text that, when shared variants do not have any systematically different distribution of allele frequencies or effects compared to non-shared variants, where n12, n1, n2 are the number of variants shared, distinct to trait 1 and distinct to trait 2. Putting these together, we find Second, where studies of both traits are well powered, then methods for joint analysis of trait pairs may be informative. For example, gwas-pw [50] extends the original coloc by using empirical Bayes to estimate per-hypothesis priors via joint analysis of all regions genomewide. However, this comes at a cost of ignoring the dependence of per-hypothesis priors on the number of SNPs in a region, and even in simulated data did not generate consistent estimates. This latter may reflect the limited information that exists in any pair of GWAS (the number of regions where detectable signals exist for both traits). Nonetheless, such an approach can probably give a useful order of magnitude estimate for p12. Finally, in the absence of data about joint trait association at the genome-wide level, it is necessary to rely more on investigator judgement, and here it may helpful to consider conditional probabilities The term q1|2 represents the probability that a SNP, already known to be causal for trait 2, is also causal for trait 1. In asymmetric analysis such as GWAS and eQTL, it may be simpler to condition on one event rather than the other—does the investigator have a clearer idea of the chance that a SNP that causally regulates gene expression in a given tissue is causally associated with a disease or the chance that a SNP that is causally associated with a disease does so via transcriptional regulation in that same tissue? To aid translation of priors between the two parameterisations discussed here, we have created an online tool “coloc explorer” at https://chr1swallace.shinyapps.io/coloc-priors.

Sensitivity analysis

In the expected case that an investigator does not have a strong prior belief in a single value for p12 we can use sensitivity analysis to consider whether conclusions are robust over a range of plausible values. Helpfully, it is not necessary to reanalyse the complete dataset multiple times. Given that where D represents study data and π = (p1, p2, p12) is the prior parameter vector used for analysis, we can derive posterior probabilities under an alternative prior parameter π* as and so we can rapidly explore sensitivity of inference to changes to p12. Fig 5 shows an example where conclusions depend heavily on the relative prior belief in H3 and H4 and a conclusion of colocalisation by a decision rule of P(H4|D, π) > 0.5 is only valid if prior beliefs are that H4 is at least as likely as H3. An alternative example where results are robust over a wide range of p12 is shown in S1 Fig. Detailed instructions to run a sensitivity analysis are given at http://chr1swallace.github.io/coloc/articles/a04_sensitivity.html.

Fig 5

Example of sensitivity analysis on a dataset which shows evidence for colocalisation at a predefined rule of posterior P(H4) > 0.5 only when the prior beliefs in H3 and H4 are approximately equal.

The left hand panels show local Manhattan plots for the two traits, while the right hand panels show prior and posterior probabilities for H0-H4 as a function of p12. The dashed vertical line indicates the value of p12 used in initial analysis (the value about which sensitivity is to be checked). H0 is omitted from the prior plot to enable the relative difference for the other hypotheses to be seen.

Example of sensitivity analysis on a dataset which shows evidence for colocalisation at a predefined rule of posterior P(H4) > 0.5 only when the prior beliefs in H3 and H4 are approximately equal.

Conditioning and masking to allow for multiple causal variants

In order to deal with multiple causal variants in a region, we implemented the CoJo approach [19] within the coloc package. We also propose an alternative to conditioning which does not depend on allelic alignment and can be used with p-values alone: masking. Stepwise regression proceeds by identifying the top SNP, and then re-estimating association statistics across all other SNPs to test whether they provide any additional information to infer the trait of interest. Conditional effect estimates at SNPs in LD with the top SNP(s) differ from their unconditional values, so that they capture the residual evidence for association, but conditional and unconditional effect estimates are (effectively) the same at SNPs independent from the top SNP(s). Our proposed masking algorithm relaxes the assumption of a single causal variant by instead assuming that if multiple causal variants exist for any individual trait, they are in linkage equilibrium. It therefore first identifies lead SNPs, then successively masks all SNPs in LD with the top signals(s), testing for significant association in the remainder, and adding SNPs sequentially while residual association remains (Fig 6). When colocalising, each lead SNP is taken in turn, and any SNPs in LD with any other lead SNP are masked, by setting the per-SNP Bayes factor to 1 for any SNP-specific hypothesis relating to that SNP/trait pair. We have implemented both approaches in the development version of the coloc package, https://github.com/chr1swallace/coloc/tree/condmask, and document their use at http://chr1swallace.github.io/coloc/articles/a05_conditioning.html.

Fig 6

Masking as an alternative strategy to conditioning when attempting to colocalise trait signals with multiple causal variants in a region.

Masking as an alternative strategy to conditioning when attempting to colocalise trait signals with multiple causal variants in a region.

Top panel: input local Manhattan plots, with causal variants for each trait highlighted in red. We can use conditioning (left column) to perform multiple colocalisation analyses in a region. First, lead SNPs for each signal are identified through successively conditioning on selected SNPs and adding the most significant SNP out of the remainder, until some significance threshold is no longer reached. Then we condition on all but one lead SNP for each parallel coloc analysis. Note that when multiple lead SNPs are identified for each trait, eg n and m for traits 1 and 2 respectively, then n × m coloc analyses are performed. When an allele-aligned LD matrix is not available, an alternative is masking (right column) which differs by successively restricting the search space to SNPs not in LD with any lead SNPs instead of conditioning. Multiple coloc analyses are again performed, but setting the per SNP Bayes factor to 1 for hypotheses containing SNPs in LD with any but one of the lead SNPs. Note that for convenience of display, all SNPs in r2 > α with the lead SNP are assumed to be in a contiguous block, shaded gray. We compared conditioning and masking to single coloc analysis across a variety of simulated datasets (Figs 7 and 8). A single coloc comparison generally relates to the strongest signals for each of the two traits, as previously reported, [10] which can miss colocalising signals that are secondary to a primary independent signal (Fig 7, row 3) or that have differently ordered effect sizes (Fig 8, row 5). Conditioning allows more distinct comparisons and shows a marked improvement on single coloc, in particular being able to identify a greater proportion of the truly colocalising signals. Masking increases the number of comparisons compared to single coloc, but is less informative than conditioning. In particular, the number of comparisons that cannot be clearly assigned to a specific causal variant pair (at least one lead SNP does not have r2 > 0.8 with a causal variant) increases when multiple causal variants are in LD (S2 and S3 Figs) and this fraction of comparisons are often inaccurate, finding posterior support for H3 when H4 is true.

Fig 7

Average posterior probabilities for each hypothesis under different analysis strategies when trait 1 has two causal variants, A and B, and trait 2 has just one.

The left column shows the identity of causal variants for each trait and their relative effect sizes under four different models. The right column shows the average posterior that can be assigned to specific comparisons for of variants for trait 1: trait 2. We exploit our knowledge of the identity of the causal variants in simulated data to label each comparison according to LD between the lead SNP for each trait and the simulated causal variants. When labels cannot be unambiguously assigned (r2 < 0.8 with any causal variant) we use “?”.

Fig 8

Average posterior probabilities for each hypothesis under different analysis strategies when both traits have two causal variants.

Information is displayed as described in Fig 7.

Average posterior probabilities for each hypothesis under different analysis strategies when trait 1 has two causal variants, A and B, and trait 2 has just one.

Average posterior probabilities for each hypothesis under different analysis strategies when both traits have two causal variants.

Information is displayed as described in Fig 7.

Discussion

This paper has focused on two practical aspects of Bayesian colocalisation analysis that hitherto have not received detailed attention. The ability of Bayesian methods to incorporate prior knowledge and beliefs is a strength of the coloc approach, but also places onus on a researcher to evaluate their prior beliefs. Elicitation of informative priors is a subject that has received much attention in the statistical literature [51] but rather less within the genetics community. Nonetheless, the use of Bayesian methods in genomics is growing in popularity, as a natural way to fit joint models to large and complex data sets and to enable integrative analysis over different traits or datasets. When data are large, and the number of events is also large, then empirical Bayes can enable an analyst to learn the prior from the same data used for testing. However, in the case of smaller studies or less common events, the wealth of existing information from other large studies as well as investigators’ own beliefs can be used. For coloc, the choice of marginal prior parameter values can be readily informed in this way. For joint causality this is harder and while we suggest and walk through several alternative ways of doing this the conclusions we draw are not universally applicable; each investigator should use both available data and their own judgement to elicit their own prior beliefs and those of their co investigators. Perhaps the most widely applicable are the results of simulations, that suggest values of the order p12 ≈ 5×10−6 lead to robust inference over a range of scenarios, but the adoption of sensitivity analysis will help evalutate robustness of inference to changes in prior parameter values. Attempts to colocalise disease and eQTL signals have ranged from underwhelming [52] to positive.[53] One key difference between outcomes is the disease-specific relevance of the cell types considered, which is consistent with variable chromatin state enrichment in different GWAS according to cell type.[54] For example, studies considering the overlap of open chromatin and GWAS signals have convincingly shown that tissue relevance varies by up to 10 fold, [55] with pancreatic islets of greatest relevance for traits like insulin sensitivity and immune cells for immune-mediated diseases.[54] This suggests that p12 should depend explicitly on the specific pair of traits under consideration, including cell type in the case of eQTL or chromatin mark studies. One avenue for future exploration is whether fold change in enrichment of open chromatin/GWAS signal overlap between cell types could be used to modulate p12 and select larger values for more a priori relevant tissues. The other focus of this paper is on dealing with multiple causal variants for single traits in a single region. Single coloc can be misleading when there are completely shared causal variants in the two traits, but with different effect sizes, such that colocalisation concludes there are single effects in each trait, different to each other (e.g. row 5 of Fig 8). Inference is much improved with conditioning, and we hope that by including the conditioning method within coloc we will enable more widespread use of this step. Note that if the two traits are measured in different populations, then colocalisation can still be performed, with a separate LD matrix for each. However, if the summary statistics from a single trait are the results of meta analysis of different populations, then conditioning needs to be performed in each population separately. One advantage of coloc has been the minimal amount of data pre-processing required. In particular, there is no need to harmonize alleles between the two datasets or to some reference dataset. However, harmonization cannot be avoided if multiple causal variants are to be dealt with via conditioning. Here, we propose successively masking most associated SNPs and SNPs in LD with them. This has conceptual similarities to clumping, used in polygenic risk score construction to select the strongest signal in each LD-independent set of SNPs [56], and to the division of the genome into LD-independent blocks, [57] but differs to each. Our motivation is inverted compared to that for clumping: We aim to identify the set of SNPs whose GWAS summary statistics are likely to be unrelated to the masked signal, rather than select a single SNP from the masked group. We also select smaller sets of SNPs than found by dividing the genome into blocks, because we select SNPs only according to LD with the sentinel SNP, rather than finding breakpoints such that every SNP in a block is likely to have minimal LD with any SNP outside that block. While masking loses accuracy in comparison to conditioning, it improves on single coloc, and importantly doesn’t appear to lead to erroneous positive conclusions for H4 when H3 is true, although the reverse—supporting H3 for a secondary comparison when H4 is true—can occur when causal variants are themselves in LD. Therefore secondary H3 conclusions should be treated with some caution, but secondary H4 conclusions may signal true colocalisations that would have otherwise been missed. Often a researcher may be colocalising results from one dataset for which they have complete information (e.g. because it was generated in their lab) with a public disease GWAS with less information, and here we recommend the hybrid strategy of conditioning in the dataset with full information and masking in the public dataset. Masking is also likely to avoid substantial errors in the results of approximate conditioning that can occasionally result from small deviations from LD estimated in a reference population to that in the study sample, particularly when the reference population is smaller than that used to the GWAS [58]. While we have discussed the thought process required to consider prior parameter values, thought is also required to interpret partially colocalising signals (i.e. a convincing mixture of one colocalising and one non-colocalising variant). When the two datasets are different disease GWAS, it may be reasonable that they share only one signal, with the alternate signal operating through a different mechanism. But if there are two signals for an eQTL only one of which colocalises with a disease signal, then this should be interpreted with greater caution than complete colocalisation. It suggests that there are two ways of modifying expression of a gene but that only one of those ways is also associated with variable disease risk. This might mean that the right gene has been identified in the wrong tissue, given the overlap in eQTL signals between tissues, [45] but it might also indicate incidental colocalisation. Similarly, lack of colocalisation may indicate only that the correct tissue or state has not been assayed. We anticipate that systematic analysis of multiple tissues and genes with a single disease may lead to a set of posterior probabilities that are jointly more amenable to interpretion than a single isolated analysis. However, colocalisation will always be limited by its basis in analysis of observational data, and experimental manipulation through CRISPR or through genotype-targeted assays will be required to establish causality. In summary, we find that coloc default values for the prior probabilities of single trait association, p1, p2, are well supported by data across a range of data types, but that the choice of p12 needs careful thought, and is expected to vary according to the pair of traits being considered. We recommend taking some time to do this before any analysis, documenting and justifying choices, using the coloc explorer app to translate between per-SNP and per-hypothesis values. The simulations here (Fig (4)) suggest that p12 = 5×10−5 provides a reasonable balance between power and false positive calls, but it is unlikely that any single point distribution on p12 captures all prior knowledge. As varying p12 can sometimes have a substantial impact on inference, we strongly advise users to perform sensitivity analysis for key results. Both the justification of choices and the results of sensitivity analyses should be presented to accompany any published results.

Materials and methods

Code to run the simulations and analyses described below is available at https://github.com/chr1swallace/coloc-mask-paper. A statistical description of the coloc method, including calculation of per-SNP and per-hypothesis Bayes factors and posterior probabilities is given in S1 Text. To calculate the posterior probability of association shown in Fig 3c and 3d, we use the Bayes factor for association at a single SNP defined in S1 Text, BF1. We calculate the posterior probability for association as a function of the prior probability that a SNP is associated with the trait, π, as where we use J0, J1 to denote the competing hypotheses of association and non-association at this SNP.

Simulations

We evaluated different prior parameter settings, sensitivity analysis, or strategies for dealing with multiple causal variants by simulation. In each case, we simulated GWAS data by sampling 2N haplotypes of length M SNPs for N individuals from 1000 Genomes samples (either EUR or YRI), and selected one or two causal variants at random from amongst common SNPs (MAF>5%) according to the question being addressed. Effect estimates at each variant were sampled from the set {0.17, 0.33, 0.50, 0.67, 0.83, 1.00, 1.17, 1.33, 1.50}, sample sizes N from the set {100, 200, 500, 1000, 2000, 5000, 10000} and number of SNPs M from {250, 500, 750}. Quantitative traits with residual standard deviation 1 were then simulated according to linear models, i.e. as where i indexes causal variants, b and G the effect estimate and genotype at variant i, and e ∼ N(0, 1). For all analyses, we used p1 = p2 = 10−4 and varied p12 as described in the text.

GTEx analysis

We used GTEx data to estimate the probability that a random SNP could be causally associated with the expression of a gene within some bp-defined window. We analysed GTEx v7 Whole Blood significant eQTLs, downloaded from https://storage.googleapis.com/gtex_analysis_v7/single_tissue_eqtl_data/GTEx_Analysis_v7_eQTL.tar.gz on 25 June 2019. We used masking to define independent signals within this set for each gene (r2 < 0.01) using 1000 Genomes EUR samples to estimate LD. We estimated q as the ratio of the number of significant lead eQTLs in multiples of 100 kb windows around the TSS to the number of SNPs in 1000 Genomes with SNPs grouped by MAF into 5 groups: [0, 0.1], (0.1, 0.2], (0.2, 0.3], (0.3, 0.4], (0.4, 0.5].

GWAS catalog analysis

We used the GWAS summaries in the GWAS catalog (https://www.ebi.ac.uk/gwas/api/search/downloads/full, download date: 12 June 2019) to estimate the proportion of common SNPs that were independently associated with any given case/control or quantitative trait and examined how this varied according to reported sample size.

Summary of applied papers from 2018 using coloc.

(PDF) Click here for additional data file.

Supporting mathematical derivations.

(PDF) Click here for additional data file.

Example of sensitivity analysis on a dataset which shows evidence for colocalisation at a predefined rule of posterior P(H4) > 0.5 across a wide range of p12.

(TIF) Click here for additional data file.

Average posterior probabilities for each hypothesis when trait 1 has two causal variants, and trait 2 has just one, according to whether the maximum r2 between multiple causal variants is ≤ 0.01 or > 0.01.

(TIF) Click here for additional data file.

Average posterior probabilities for each hypothesis when both traits have two causal variants, according to whether the maximum r2 between multiple causal variants is ≤ 0.01 or > 0.01.

(TIF) Click here for additional data file. 26 Feb 2020 * Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. * Dear Dr Wallace, Thank you very much for submitting your Research Article entitled 'Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved. We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer. In addition we ask that you: 1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] Please let us know if you have any questions while making these revisions. Yours sincerely, Michael P. Epstein Associate Editor PLOS Genetics Hua Tang Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Colocalization is an increasingly important aspect of genetic fine mapping efforts (>60 papers in 2018) but, unusually in statistical genetics, the most popular software (“coloc”) implements a Bayesian analysis with subjective priors. This paper demonstrates the potential sensitivity of coloc to the prior probability of colocalization, examines a huge amount of data to elicit suggestions for setting reasonable values, and provides software for performing sensitivity analysis. In addition, the assumption of one causal variant per region per trait is examined and a new approach, called masking, is suggested for situations in which current methods cannot be applied. Overall this paper gives useful guidance to users of coloc and provides insights into the method that should be of value. Minor comments 1. P4 L52 “ubiquity of genetic effects … concordant with an omnigenic model” – suggests that such ubiquity has been established when it remains a conjecture. Some rewording needed. 2. The Introduction starts off by introducing MR and appears to motivate colocalization primarily as a way to validate instruments in MR studies. But, as seen elsewhere in the paper, most of the applications are in delineating molecular pathways to disease. I’d suggest reworking the opening paragraph to better reflect the broader motivations for colocalization. 3. P7 L101 full text could be accessed for only 25 of 60 papers. Was this due to limitations of institutional subscriptions? Could not the corresponding authors provide manuscripts for research purposes? 4. P7 L104 it would be interesting to know also how many papers used eCaviar or some other method to deal with multiple causal variants. Also, how often did the original discovery studies perform conditional analyses and rule out additional causal variants? So that when going to colocalization, the single causal variant assumption can be justified to some extent. 5. P7 L107 “prior probability … will depend” – should say “may depend” since at this point we haven’t established this, and anyway since priors are subjective the user is free to believe that there is no dependence on the traits (but may then draw the wrong conclusion). 6. P8 L124 “more likely” should be “relatively more likely”, otherwise this sentence is confusing. Initially I found this sentence counter-intuitive – seems that by looking at fewer SNPs we are more likely to find colocalization – but the point is that the prior probability of colocalization is higher relative to distinct variants when fewer SNPs are considered. However the lower number of SNPs would provide less evidence for colocalization so this is a false economy. Anyway some interpretation should be added to this and the previous paragraph as it is unclear what one should conclude from the observations. 7. P8 L132 note that all the estimates of p’s and q’s are based on statistically significant SNPs, and the number of truly associated variants must be larger. So the elicited priors must be lower bounds. What implications does this have for the final inferences? 8. P9 L145 not clear how to get a posterior probability of association from just a prior and a p-value. 9. P10 L163, 165 the Appendix was not available to review. 10. P12 L208 “unlinked” -> “not in linkage disequilibrium”. There is a difference between linkage and LD. 11. P12 The masking method still needs an LD matrix, so the only real advantage over CoJo is that there is no need to align the alleles. 12. P12 The masking method looks a lot like “clumping” as often used, for example, in constructing polygenic risk scores. Please clarify the difference, or use the same term to prevent jargon creep. 13. P213 Figure 6 caption, “setting to 1 the Bayes factor” –the main text suggests setting the log Bayes factor to -3. Log in what base? 14. P14 L244 is it feasible to make the sensitivity analysis a default action in coloc, with the results being returned in the same object as the posteriors? 15. P16 the Discussion would benefit from a summary take-home message, such as that the default values of p1 and p2 are OK but p12 needs more thought (and a summary of how to do this would also help). Typos etc 1. P3 L32 “underly” -> “underlie” 2. P4 L59 “For example…” – the sentence has no active verb. 3. P7 L117 delete “a”; change final “,” to “.” 4. P8 L140 the double “-“ is confusing, suggest just saying “to” or writing as an interval. 5. P9 L150 “One” -> “On” 6. P11 L178 in the equation below, can delete the intersection with A2 in the third expression. 7. P11 L180 spelling of “asymmetric” 8. P12 Figure 5 caption line 2, “belief” -> “beliefs”. What does the dotted line marked “results” mean? 9. P12 L212 “is” -> “are” 10. P14 L234 “are” -> “is” 11. P16 L288 “interpretable” -> “interpretation” 12. References are a bit sloppy, eg page numbers for refs 11 and 14. 13. Supplement P1, footnote 4 “P=1105” etc looks incorrect. Reviewer #2: This paper considers two important extensions to the currently most popular and influential colocalization method/software "coloc": a more suitable prior specification (than the current default) and relaxing the assumption of only one causal SNP. In particular, the first problem has been largely ignored in practice while its implication is significant, as the author has clearly shown in the paper. Although the proposed methods are not technically sophisticated, they can be tremendously useful as implemented in the "coloc" software. The paper was well written. I only have two very minor comments. Minor comments: 1. Prior elicitation is a well known and general problem in Bayesian statistics, both important and challenging. I agree with the author on all her points, and commend the author for providing a useful online tool "coloc explorer". However, without a "default" prior, I am not sure how useful it would be to a "typical" biologist without deep understanding of Bayesian statistics or "coloc" method; in fact, I would be a bit worried that someone might do "prior mining" to try to get more significant results. Some comments or guidelines might be helpful to a typical user. 2. I completely agree with the author on both the advantages and limitations of the conditioning approach as compared to the proposed "masking" approach. However, if I understand correctly, with a typical small genomic region of interest, one would potentially mask out ALL SNPs in the region that are in LD with the lead SNP; in other words, is the new assumption simply that there is at most only one causal SNP in EACH LD block? If true, it is still like doing coloc analysis under the single causal SNP assumption for each LD block, which can be too restrictive given that there are only about two thousand (approximately independent) LD blocks in the huiman genome. Some clarifications and comments would be helpful. Reviewer #3: This manuscript investigated how to derive data driven priors for best power of COLOC, provided a sensitivity analysis framework to assess the robustness of priors, and proposed a new masking approach for dealing with scenarios with multiple signals per region. It is very useful to provide guidelines for users of COLOC about how to setup priors to achieve the best power. However, this paper does not provide a clear guideline to readers. I have the following comments: 1) It would help refresh reader’s mind if a brief description about the statistical procedure of the COLOC tool could be provided either in the Introduction section along with the five stated hypotheses, or at the beginning of the Results section. 2) I think it would be helpful to make a clear guideline table for readers, e.g., suggestive p1, p2, p12 prior values for a few different combinations of number of SNPs in the test region, total number of trait signals, if multiple signals exist in the test region. Or a such table could be provided for GTEx expression traits of different tissue types, which will provide readers a concrete example. 3) It would be helpful if the authors could provide some descriptions about “coloc explorer” and “condmask coloc” and how to implement these two tools in the supplementary text. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No 11 Mar 2020 Submitted filename: coloc response.pdf Click here for additional data file. 17 Mar 2020 Dear Dr Wallace, We are pleased to inform you that your manuscript entitled "Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Michael P. Epstein Associate Editor PLOS Genetics Hua Tang Section Editor: Natural Variation PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-02090R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 9 Apr 2020 PGENETICS-D-19-02090R1 Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses Dear Dr Wallace, We are pleased to inform you that your manuscript entitled "Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Kaitlin Butler PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics

56 in total

1. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease?

Authors: George Davey Smith; Shah Ebrahim
Journal: Int J Epidemiol Date: 2003-02 Impact factor: 7.196

Review 2. How to avoid bias when comparing bone marrow transplantation with chemotherapy.

Authors: R Gray; K Wheatley
Journal: Bone Marrow Transplant Date: 1991 Impact factor: 5.483

3. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets.

Authors: Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang
Journal: Nat Genet Date: 2016-03-28 Impact factor: 38.330

4. Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.

Authors: Christian Benner; Aki S Havulinna; Marjo-Riitta Järvelin; Veikko Salomaa; Samuli Ripatti; Matti Pirinen
Journal: Am J Hum Genet Date: 2017-09-21 Impact factor: 11.025

Review 5. Research review: Polygenic methods and their application to psychiatric traits.

Authors: Naomi R Wray; Sang Hong Lee; Divya Mehta; Anna A E Vinkhuyzen; Frank Dudbridge; Christel M Middeldorp
Journal: J Child Psychol Psychiatry Date: 2014-08-01 Impact factor: 8.982

Review 6. Defining functional DNA elements in the human genome.

Authors: Manolis Kellis; Barbara Wold; Michael P Snyder; Bradley E Bernstein; Anshul Kundaje; Georgi K Marinov; Lucas D Ward; Ewan Birney; Gregory E Crawford; Job Dekker; Ian Dunham; Laura L Elnitski; Peggy J Farnham; Elise A Feingold; Mark Gerstein; Morgan C Giddings; David M Gilbert; Thomas R Gingeras; Eric D Green; Roderic Guigo; Tim Hubbard; Jim Kent; Jason D Lieb; Richard M Myers; Michael J Pazin; Bing Ren; John A Stamatoyannopoulos; Zhiping Weng; Kevin P White; Ross C Hardison
Journal: Proc Natl Acad Sci U S A Date: 2014-04-21 Impact factor: 12.779

7. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function.

Authors: Annah B Wyss; Tamar Sofer; Mi Kyeong Lee; Natalie Terzikhan; Jennifer N Nguyen; Lies Lahousse; Jeanne C Latourelle; Albert Vernon Smith; Traci M Bartz; Mary F Feitosa; Wei Gao; Tarunveer S Ahluwalia; Wenbo Tang; Christopher Oldmeadow; Qing Duan; Kim de Jong; Mary K Wojczynski; Xin-Qun Wang; Raymond Noordam; Fernando Pires Hartwig; Victoria E Jackson; Tianyuan Wang; Ma'en Obeidat; Brian D Hobbs; Tianxiao Huan; Hongsheng Gui; Margaret M Parker; Donglei Hu; Lauren S Mogil; Gleb Kichaev; Jianping Jin; Mariaelisa Graff; Tamara B Harris; Ravi Kalhan; Susan R Heckbert; Lavinia Paternoster; Kristin M Burkart; Yongmei Liu; Elizabeth G Holliday; James G Wilson; Judith M Vonk; Jason L Sanders; R Graham Barr; Renée de Mutsert; Ana Maria Baptista Menezes; Hieab H H Adams; Maarten van den Berge; Roby Joehanes; Albert M Levin; Jennifer Liberto; Lenore J Launer; Alanna C Morrison; Colleen M Sitlani; Juan C Celedón; Stephen B Kritchevsky; Rodney J Scott; Kaare Christensen; Jerome I Rotter; Tobias N Bonten; Fernando César Wehrmeister; Yohan Bossé; Shujie Xiao; Sam Oh; Nora Franceschini; Jennifer A Brody; Robert C Kaplan; Kurt Lohman; Mark McEvoy; Michael A Province; Frits R Rosendaal; Kent D Taylor; David C Nickle; L Keoki Williams; Esteban G Burchard; Heather E Wheeler; Don D Sin; Vilmundur Gudnason; Kari E North; Myriam Fornage; Bruce M Psaty; Richard H Myers; George O'Connor; Torben Hansen; Cathy C Laurie; Patricia A Cassano; Joohon Sung; Woo Jin Kim; John R Attia; Leslie Lange; H Marike Boezen; Bharat Thyagarajan; Stephen S Rich; Dennis O Mook-Kanamori; Bernardo Lessa Horta; André G Uitterlinden; Hae Kyung Im; Michael H Cho; Guy G Brusselle; Sina A Gharib; Josée Dupuis; Ani Manichaikul; Stephanie J London
Journal: Nat Commun Date: 2018-07-30 Impact factor: 14.919

8. Elevated polygenic burden for autism is associated with differential DNA methylation at birth.

Authors: Eilis Hannon; Diana Schendel; Christine Ladd-Acosta; Jakob Grove; Christine Søholm Hansen; Shan V Andrews; David Michael Hougaard; Michaeline Bresnahan; Ole Mors; Mads Vilhelm Hollegaard; Marie Bækvad-Hansen; Mady Hornig; Preben Bo Mortensen; Anders D Børglum; Thomas Werge; Marianne Giørtz Pedersen; Merete Nordentoft; Joseph Buxbaum; M Daniele Fallin; Jonas Bybjerg-Grauholm; Abraham Reichenberg; Jonathan Mill
Journal: Genome Med Date: 2018-03-28 Impact factor: 11.117

9. Co-occurring expression and methylation QTLs allow detection of common causal variants and shared biological mechanisms.

Authors: Brandon L Pierce; Lin Tong; Maria Argos; Kathryn Demanelis; Farzana Jasmine; Muhammad Rakibuz-Zaman; Golam Sarwar; Md Tariqul Islam; Hasan Shahriar; Tariqul Islam; Mahfuzar Rahman; Md Yunus; Muhammad G Kibriya; Lin S Chen; Habibul Ahsan
Journal: Nat Commun Date: 2018-02-23 Impact factor: 17.694

10. Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia.

Authors: Julien Bryois; Melanie E Garrett; Lingyun Song; Alexias Safi; Paola Giusti-Rodriguez; Graham D Johnson; Annie W Shieh; Alfonso Buil; John F Fullard; Panos Roussos; Pamela Sklar; Schahram Akbarian; Vahram Haroutunian; Craig A Stockmeier; Gregory A Wray; Kevin P White; Chunyu Liu; Timothy E Reddy; Allison Ashley-Koch; Patrick F Sullivan; Gregory E Crawford
Journal: Nat Commun Date: 2018-08-07 Impact factor: 14.919

36 in total

1. Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer.

Authors: Jinyoung Byun; Younghun Han; Yafang Li; Jun Xia; Erping Long; Jiyeon Choi; Xiangjun Xiao; Meng Zhu; Wen Zhou; Ryan Sun; Yohan Bossé; Zhuoyi Song; Ann Schwartz; Christine Lusk; Thorunn Rafnar; Kari Stefansson; Tongwu Zhang; Wei Zhao; Rowland W Pettit; Yanhong Liu; Xihao Li; Hufeng Zhou; Kyle M Walsh; Ivan Gorlov; Olga Gorlova; Dakai Zhu; Susan M Rosenberg; Susan Pinney; Joan E Bailey-Wilson; Diptasri Mandal; Mariza de Andrade; Colette Gaba; James C Willey; Ming You; Marshall Anderson; John K Wiencke; Demetrius Albanes; Stephan Lam; Adonina Tardon; Chu Chen; Gary Goodman; Stig Bojeson; Hermann Brenner; Maria Teresa Landi; Stephen J Chanock; Mattias Johansson; Thomas Muley; Angela Risch; H-Erich Wichmann; Heike Bickeböller; David C Christiani; Gad Rennert; Susanne Arnold; John K Field; Sanjay Shete; Loic Le Marchand; Olle Melander; Hans Brunnstrom; Geoffrey Liu; Angeline S Andrew; Lambertus A Kiemeney; Hongbing Shen; Shanbeh Zienolddiny; Kjell Grankvist; Mikael Johansson; Neil Caporaso; Angela Cox; Yun-Chul Hong; Jian-Min Yuan; Philip Lazarus; Matthew B Schabath; Melinda C Aldrich; Alpa Patel; Qing Lan; Nathaniel Rothman; Fiona Taylor; Linda Kachuri; John S Witte; Lori C Sakoda; Margaret Spitz; Paul Brennan; Xihong Lin; James McKay; Rayjean J Hung; Christopher I Amos
Journal: Nat Genet Date: 2022-08-01 Impact factor: 41.307

2. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer's disease.

Authors: Ole A Andreassen; Danielle Posthuma; Douglas P Wightman; Iris E Jansen; Jeanne E Savage; Alexey A Shadrin; Shahram Bahrami; Dominic Holland; Arvid Rongve; Sigrid Børte; Bendik S Winsvold; Ole Kristian Drange; Amy E Martinsen; Anne Heidi Skogholt; Cristen Willer; Geir Bråthen; Ingunn Bosnes; Jonas Bille Nielsen; Lars G Fritsche; Laurent F Thomas; Linda M Pedersen; Maiken E Gabrielsen; Marianne Bakke Johnsen; Tore Wergeland Meisingset; Wei Zhou; Petroula Proitsi; Angela Hodges; Richard Dobson; Latha Velayudhan; Karl Heilbron; Adam Auton; Julia M Sealock; Lea K Davis; Nancy L Pedersen; Chandra A Reynolds; Ida K Karlsson; Sigurdur Magnusson; Hreinn Stefansson; Steinunn Thordardottir; Palmi V Jonsson; Jon Snaedal; Anna Zettergren; Ingmar Skoog; Silke Kern; Margda Waern; Henrik Zetterberg; Kaj Blennow; Eystein Stordal; Kristian Hveem; John-Anker Zwart; Lavinia Athanasiu; Per Selnes; Ingvild Saltvedt; Sigrid B Sando; Ingun Ulstein; Srdjan Djurovic; Tormod Fladby; Dag Aarsland; Geir Selbæk; Stephan Ripke; Kari Stefansson
Journal: Nat Genet Date: 2021-09-07 Impact factor: 41.307

3. PolarMorphism enables discovery of shared genetic variants across multiple traits from GWAS summary statistics.

Authors: Joanna von Berg; Michelle Ten Dam; Sander W van der Laan; Jeroen de Ridder
Journal: Bioinformatics Date: 2022-06-24 Impact factor: 6.931

4. Immune disease risk variants regulate gene expression dynamics during CD4⁺ T cell activation.

Authors: Blagoje Soskic; Eddie Cano-Gamez; Deborah J Smyth; Kirsty Ambridge; Ziying Ke; Julie C Matte; Lara Bossini-Castillo; Joanna Kaplanis; Lucia Ramirez-Navarro; Anna Lorenc; Nikolina Nakic; Jorge Esparza-Gordillo; Wendy Rowan; David Wille; David F Tough; Paola G Bronson; Gosia Trynka
Journal: Nat Genet Date: 2022-05-26 Impact factor: 41.307

5. Using genetic variants to evaluate the causal effect of cholesterol lowering on head and neck cancer risk: A Mendelian randomization study.

Authors: Mark Gormley; James Yarmolinsky; Tom Dudding; Kimberley Burrows; Richard M Martin; Steven Thomas; Jessica Tyrrell; Paul Brennan; Miranda Pring; Stefania Boccia; Andrew F Olshan; Brenda Diergaarde; Rayjean J Hung; Geoffrey Liu; Danny Legge; Eloiza H Tajara; Patricia Severino; Martin Lacko; Andrew R Ness; George Davey Smith; Emma E Vincent; Rebecca C Richmond
Journal: PLoS Genet Date: 2021-04-22 Impact factor: 5.917

6. Discovery and fine-mapping of kidney function loci in first genome-wide association study in Africans.

Authors: Segun Fatumo; Tinashe Chikowore; Robert Kalyesubula; Rebecca N Nsubuga; Gershim Asiki; Oyekanmi Nashiru; Janet Seeley; Amelia C Crampin; Dorothea Nitsch; Liam Smeeth; Pontiano Kaleebu; Stephen Burgess; Moffat Nyirenda; Nora Franceschini; Andrew P Morris; Laurie Tomlinson; Robert Newton
Journal: Hum Mol Genet Date: 2021-07-28 Impact factor: 6.150

7. Identification of Novel Pleiotropic SNPs Associated with Osteoporosis and Rheumatoid Arthritis.

Authors: Ying-Qi Liu; Yong Liu; Qiang Zhang; Tao Xiao; Hong-Wen Deng
Journal: Calcif Tissue Int Date: 2021-03-19 Impact factor: 4.333

8. Genome sequencing analysis identifies new loci associated with Lewy body dementia and provides insights into its genetic architecture.

Authors: Ruth Chia; Marya S Sabir; Sara Bandres-Ciga; Sara Saez-Atienzar; Regina H Reynolds; Emil Gustavsson; Ronald L Walton; Sarah Ahmed; Coralie Viollet; Jinhui Ding; Mary B Makarious; Monica Diez-Fairen; Makayla K Portley; Zalak Shah; Yevgeniya Abramzon; Dena G Hernandez; Cornelis Blauwendraat; David J Stone; John Eicher; Laura Parkkinen; Olaf Ansorge; Lorraine Clark; Lawrence S Honig; Karen Marder; Afina Lemstra; Peter St George-Hyslop; Elisabet Londos; Kevin Morgan; Tammaryn Lashley; Thomas T Warner; Zane Jaunmuktane; Douglas Galasko; Isabel Santana; Pentti J Tienari; Liisa Myllykangas; Minna Oinas; Nigel J Cairns; John C Morris; Glenda M Halliday; Vivianna M Van Deerlin; John Q Trojanowski; Maurizio Grassano; Andrea Calvo; Gabriele Mora; Antonio Canosa; Gianluca Floris; Ryan C Bohannan; Francesca Brett; Ziv Gan-Or; Joshua T Geiger; Anni Moore; Patrick May; Rejko Krüger; David S Goldstein; Grisel Lopez; Nahid Tayebi; Ellen Sidransky; Lucy Norcliffe-Kaufmann; Jose-Alberto Palma; Horacio Kaufmann; Vikram G Shakkottai; Matthew Perkins; Kathy L Newell; Thomas Gasser; Claudia Schulte; Francesco Landi; Erika Salvi; Daniele Cusi; Eliezer Masliah; Ronald C Kim; Chad A Caraway; Edwin S Monuki; Maura Brunetti; Ted M Dawson; Liana S Rosenthal; Marilyn S Albert; Olga Pletnikova; Juan C Troncoso; Margaret E Flanagan; Qinwen Mao; Eileen H Bigio; Eloy Rodríguez-Rodríguez; Jon Infante; Carmen Lage; Isabel González-Aramburu; Pascual Sanchez-Juan; Bernardino Ghetti; Julia Keith; Sandra E Black; Mario Masellis; Ekaterina Rogaeva; Charles Duyckaerts; Alexis Brice; Suzanne Lesage; Georgia Xiromerisiou; Matthew J Barrett; Bension S Tilley; Steve Gentleman; Giancarlo Logroscino; Geidy E Serrano; Thomas G Beach; Ian G McKeith; Alan J Thomas; Johannes Attems; Christopher M Morris; Laura Palmer; Seth Love; Claire Troakes; Safa Al-Sarraj; Angela K Hodges; Dag Aarsland; Gregory Klein; Scott M Kaiser; Randy Woltjer; Pau Pastor; Lynn M Bekris; James B Leverenz; Lilah M Besser; Amanda Kuzma; Alan E Renton; Alison Goate; David A Bennett; Clemens R Scherzer; Huw R Morris; Raffaele Ferrari; Diego Albani; Stuart Pickering-Brown; Kelley Faber; Walter A Kukull; Estrella Morenas-Rodriguez; Alberto Lleó; Juan Fortea; Daniel Alcolea; Jordi Clarimon; Mike A Nalls; Luigi Ferrucci; Susan M Resnick; Toshiko Tanaka; Tatiana M Foroud; Neill R Graff-Radford; Zbigniew K Wszolek; Tanis Ferman; Bradley F Boeve; John A Hardy; Eric J Topol; Ali Torkamani; Andrew B Singleton; Mina Ryten; Dennis W Dickson; Adriano Chiò; Owen A Ross; J Raphael Gibbs; Clifton L Dalgard; Bryan J Traynor; Sonja W Scholz
Journal: Nat Genet Date: 2021-02-15 Impact factor: 38.330

9. Untangling the genetic link between type 1 and type 2 diabetes using functional genomics.

Authors: Denis M Nyaga; Mark H Vickers; Craig Jefferies; Tayaza Fadason; Justin M O'Sullivan
Journal: Sci Rep Date: 2021-07-06 Impact factor: 4.379

10. Genome-wide association study identifies five risk loci for pernicious anemia.

Authors: Triin Laisk; Maarja Lepamets; Mariann Koel; Erik Abner; Reedik Mägi
Journal: Nat Commun Date: 2021-06-18 Impact factor: 14.919