Literature DB >> 33795679

Detecting local genetic correlations with scan statistics.

Hanmin Guo1,2, James J Li3,4, Qiongshi Lu5, Lin Hou6,7,8.   

Abstract

Genetic correlation analysis has quickly gained popularity in the past few years and provided insights into the genetic etiology of numerous complex diseases. However, existing approaches oversimplify the shared genetic architecture between different phenotypes and cannot effectively identify precise genetic regions contributing to the genetic correlation. In this work, we introduce LOGODetect, a powerful and efficient statistical method to identify small genome segments harboring local genetic correlation signals. LOGODetect automatically identifies genetic regions showing consistent associations with multiple phenotypes through a scan statistic approach. It uses summary association statistics from genome-wide association studies (GWAS) as input and is robust to sample overlap between studies. Applied to seven phenotypically distinct but genetically correlated neuropsychiatric traits, we identify 227 non-overlapping genome regions associated with multiple traits, including multiple hub regions showing concordant effects on five or more traits. Our method addresses critical limitations in existing analytic strategies and may have wide applications in post-GWAS analysis.

Entities:  

Mesh:

Year:  2021        PMID: 33795679      PMCID: PMC8016883          DOI: 10.1038/s41467-021-22334-6

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


Introduction

Genome-wide association studies (GWASs) have been carried out for numerous cosmplex traits and diseases, identifying tens of thousands of single-nucleotide polymorphisms (SNPs) associated with these phenotypes. However, our understanding of most traits’ genetic basis remains incomplete, in part due to the limited power and interpretability of the traditional GWAS approach that correlates one trait with one SNP at a time. Recently, statistical methods that jointly model multiple phenotypes have quickly gained popularity in human genetics research[1-3]. Leveraging pervasive pleiotropy in the human genome, these methods enhanced the statistical power to identify genetic associations[1,4-7], improved the accuracy of genetic risk prediction[8,9], revealed novel genetic sharing across diverse phenotypes[10-12], and provided great insights into the genetic basis of a variety of diseases and traits[13,14]. Genetic similarity between traits can be modeled at different scales. Methods that identify SNPs associated with multiple phenotypes have achieved some success[15-17]. However, most complex human traits and their genetic overlaps are highly polygenic, with top SNPs showing weak to moderate effects[18-20]. Thus, single SNP-based methods modeling pleiotropy effects may not be sufficient to characterize the full landscape of genetic similarity of complex traits. An alternative approach is to estimate the genetic correlation between different traits[10,12,21,22]. These methods effectively utilize genome-wide genetic data, including SNPs that do not reach statistical significance in GWAS, to quantify the overall genetic sharing between two traits. In addition, recent methodological advances have enabled estimation of genetic correlation with GWAS summary statistics[10,11,23], making these approaches widely applicable to a large number of complex phenotypes. With these advances, genetic correlation analysis has become a routine procedure in post-GWAS analysis and was implemented in almost all large-scale GWASs published in the past few years. However, despite improved statistical power and wide applications, genetic correlation approaches fail to provide detailed, mechanistic insights due to its oversimplification of complex genetic sharing into a single metric. Two recent methods improved genetic correlation analysis by providing local[12] and annotation-stratified estimates[11]. However, these methods rely on strong prior evidence about which local region or functional annotation to investigate. When applied to hypothesis-free scans, statistical power is reduced. In this work, we introduce LOGODetect (LOcal Genetic cOrrelation Detector), a method that uses scan statistics to identify genome segments harboring local genetic correlation between two complex traits. Compared to other methods, LOGODetect does not pre-specify candidate regions of interest, and instead, automatically detects regions with shared genetic components with great resolution and statistical power. In addition, LOGODetect only uses GWAS summary statistics as input and is robust to sample overlap between GWASs. We demonstrate its performance through extensive simulations and analysis of well-powered GWASs for seven distinct but genetically correlated neuropsychiatric traits[24,25]. Our analysis implicates a collection of hub regions (small genome segments harboring local genetic correlations for multiple trait pairs) in the genome that underlie the risk for several of these traits.

Results

Method overview

Our goal is to identify genome segments showing consistent association patterns with two different traits. Here, we provide an overview of our approach and the technical details are discussed in the “Methods” section. We propose the following scan statistic:to quantify the extent of local genetic similarity in a genome region, where R is the index set for all SNPs in the region, z1 and z2 are the association z-scores for the ith SNP with two traits, l is the linkage disequilibrium (LD) score for the ith SNP[10], and θ controls the impact of LD. Q(R) extends the scan statistic proposed for single trait analysis[26,27] to the framework of detecting local genetic correlation. The scan statistic Q(R) is a LD score-weighted inner product of local z-scores from two GWASs and is conceptually similar to local genetic correlation—regions with high absolute values of Q(R) show concordant association patterns across multiple SNPs in the region and the sign of Q(R) shows if the correlation is positive or negative. Of note, when the candidate region is the whole genome and θ is equal to 1, the scan statistic is an estimator for the global genetic covariance[11]. In our framework, we do not assume that per-SNP genetic covariance is the same for all SNPs across genome, but assume that genetic covariance is localized in some small genome regions. Therefore, we use the scan statistic in a local region, as a metric to detect significant local genetic sharing. A full discussion of the functional form of the scan statistic Q(R) is provided in Supplementary Notes. We search for genome segments with the highest |Q(R)| values by scanning the genome while allowing the segment size to vary (Fig. 1). Since we assume that the global genetic covariance can be solely attributed to some small regions, thus, the identified segments should collectively recapitulate a large proportion of genetic covariance of two traits. Therefore we select the optimal tuning parameter θ by maximizing the aggregated genetic covariance of all the identified regions. Statistical evidence of genetic sharing is assessed using a Monte Carlo approach.
Fig. 1

LOGODetect workflow.

a The inputs of LOGODetect include GWAS summary statistics for two traits and a reference panel for LD estimation. b Scan statistic is defined over a region, as the LD-weighted inner product of two z-score vectors in this region. A large absolute value of the scan statistic would hint at local genetic correlation. c LOGODetect identifies genome segments showing consistent associations with two different traits.

LOGODetect workflow.

a The inputs of LOGODetect include GWAS summary statistics for two traits and a reference panel for LD estimation. b Scan statistic is defined over a region, as the LD-weighted inner product of two z-score vectors in this region. A large absolute value of the scan statistic would hint at local genetic correlation. c LOGODetect identifies genome segments showing consistent associations with two different traits.

Simulation results

We conducted simulations to compare the performance of LOGODetect with three existing methods: ρ-HESS[12], coloc[28], and gwas-pw[17]. ρ-HESS estimates local genetic correlation in pre-specified genomic regions based on a fixed-effect model, and coloc and gwas-pw are Bayesian approaches that estimate the posterior probability of colocalization for two traits. Note that the definition of genetic covariance (correlation) in our study is consistent with the traditional definition of covariance (correlation) of additive genetic effects under fixed-effect model[10]. We used HAPGEN2[29] to simulate genotypes for 100,000 samples based on 503 individuals with European ancestry from the 1000 Genomes Project Phase 3 data[30], and assessed the type I error of the four approaches under a variety of settings (see the “Methods” section; Supplementary Notes). First, we simulated phenotypes under an infinitesimal model in which genetic effects were assumed to be the same for all SNPs. We evaluated our method across a range of heritability combinations for two traits. We then compared different methods in two additional model settings representing diverse genetic architecture: a heritability-enrichment model where 3% of randomly selected SNPs explain 30% of trait heritability and the LDAK model[31] with MAF-dependent and LD-dependent architecture. In addition, we investigated if overlapping samples between two studies, mis-specified models with non-normal effects, and binary phenotypes would bias the inference. The family-wise type I error rate of our method was well-calibrated in all simulation settings with varying heritability values, extent of sample overlap, and genetic architecture (Supplementary Tables 1–8), showcasing the statistical robustness of LOGODetect. Type I errors for ρ-HESS were too conservative when heritability varies from 0.01 to 0.05 but showed substantial inflation when heritability was large (e.g. 0.2) (Supplementary Table 9). We also assessed the statistical power of LOGODetect under various settings. Three different metrics (i.e. point detection rate, segment detection rate, and G-score) were used to quantify the statistical power (see the “Methods” section). Signal points detection rate and signal segments detection rate measure the sensitivity at the SNP level and segment level, respectively. However, they do not reflect specificity of the method, as both metrics will be 1 if the identified region is the entire genome. G-score is a more informative alternative, which can jointly quantify specificity and sensitivity. First, we evaluated the power of LOGODetect with different θ under a heritability enrichment model, where a higher level of heritability was attributed to correlated regions (Supplementary Fig. 1). LOGODetect with adaptive θ achieved universally higher statistical power in three measures compared to the fixed θ approach, which demonstrated that maximizing aggregated genetic covariance of the identified regions could lead to a reasonable estimate of θ. Further, we compared different methods under a heritability enrichment model. As heritability increases, LOGODetect showed improvements in all three measures of statistical power without inflating the type I error (Fig. 2a–c). LOGODetect achieved greater signal points detection rates compared to ρ-HESS when heritability is low to moderate, compared to gwas-pw when heritability is moderate to high, and compared to coloc in all heritability settings (Fig. 2a). Moreover, LOGODetect showed almost universally higher signal segments detection rates and G-scores compared to the other three methods (Fig. 2b, c). LOGODetect achieved only slightly lower signal segments detection rates than ρ-HESS in one exceptional case when heritability is 0.05. We obtained consistent results under the heritability enrichment model with varing proportion of heritability (Fig. 2d–f). The gain of G-score can be attributed to the fact that LOGODetect flexibly and precisely identifies true signal regions, while ρ-HESS, coloc, and gwas-pw pre-specify candidate regions, which in general are much larger than the true signal regions, regardless of the disease phenotype. We also investigated if sample overlaps or binary phenotypes would affect the performance of our method. In addition, we compared statistical power of different approaches under mis-specified models, including LDAK model[31] with MAF-dependent and LD-dependent effect sizes, non-infinitesimal models with sparse effects, and infinitesimal models with heavy-tailed effect distributions. Details of simulation settings are shown in the Supplementary Notes. We obtained consistent results under all simulation settings (Supplementary Figs. 2–10). Finally, the presence of correlated signal regions in the genome would not inflate type I error in non-signal regions (Supplementary Tables 10 and 11).
Fig. 2

Assessment of statistical power under a heritability-enrichment model with varying trait heritability and heritability enrichment.

The Y-axis shows the statistical power assessed by three different metrics: a, d signal points detection rate measures sensitivity at the SNP level, b, e signal segments detection rate measures sensitivity at the segment level, and c, f G-score jointly measures specificity and sensitivity. The heritability represents the trait heritability on chromosome 1 and the proportion of heritability represents the proportion of the trait heritability explained by the signal regions. Significance cutoffs for gwas-pw are adjusted so that the empirical type I error rate is controlled at 0.05. Details on the above three metrics and the adjustment procedure for the significance cutoff are discussed in the “Methods” section. Source data are provided as a Source Data file.

Assessment of statistical power under a heritability-enrichment model with varying trait heritability and heritability enrichment.

The Y-axis shows the statistical power assessed by three different metrics: a, d signal points detection rate measures sensitivity at the SNP level, b, e signal segments detection rate measures sensitivity at the segment level, and c, f G-score jointly measures specificity and sensitivity. The heritability represents the trait heritability on chromosome 1 and the proportion of heritability represents the proportion of the trait heritability explained by the signal regions. Significance cutoffs for gwas-pw are adjusted so that the empirical type I error rate is controlled at 0.05. Details on the above three metrics and the adjustment procedure for the significance cutoff are discussed in the “Methods” section. Source data are provided as a Source Data file.

Application to seven neuropsychiatric traits

Previous studies have revealed pervasive pleiotropy[32-34] and genetic covariance[35-38] among neuropsychiatric traits. However, there is limited understanding of the specific genetic loci contributing to multiple traits. We applied LOGODetect to study the pairwise local genetic correlation between seven neuropsychiatric traits (Supplementary Table 12): bipolar disorder (BIP; n = 51,710), schizophrenia (SCZ; n = 105,318), major depressive disorder (MDD; n = 173,005), neuroticism (NEU; n = 390,278), attention-deficit/hyperactivity disorder (ADHD; n = 53,293), autism spectrum disorder (ASD; n = 46,350), and intelligence (IQ; n = 269,867), using summary statistics from the latest GWASs[39-45]. We adaptively selected the best θ by maximizing the genetic covariance in all identified regions (Supplementary Table 13). In total, we identified 410 regions (merged into 227 non-overlapping segments) showing concordant associations with multiple traits (FDR < 0.05; Fig. 3a and Supplementary Figs. 11–28). 274 of the 410 regions showed positive correlations (Supplementary Data 1). Size of the identified genome segments varied from 4 KB to 1.6 MB (Supplementary Fig. 29). The number of significant segments identified in our analysis is proportional to the absolute value of genetic correlation between each trait pair (Supplementary Fig. 30; correlation r = 0.23). We identified 56 shared genomic regions for BIP and SCZ (Fig. 3b; genetic correlation rg = 0.68, p = 9.14e−87), 53 regions for SCZ and IQ (Supplementary Fig. 18; genetic correlation rg = −0.23, p = 4.36e−28), 40 regions for MDD and NEU (Supplementary Fig. 26; rg = 0.78, p = 6.38e−41), and 261 regions for 16 other trait pairs, which is consistent with the strong genetic overlap between these traits[46-49]. Overall, we identified strong genetic sharing (higher genetic correlation and more shared genome segments) among BIP, SCZ, MDD, and NEU and among MDD, ADHD, ASD, and IQ. Sharing between these two clusters was relatively weaker, which is consistent with previous reports[50].
Fig. 3

LOGODetect identifies genome regions contributing to multiple neuropsychiatric traits.

a Heatmap shows the genetic correlation estimates (upper triangle) and the number of segments with local genetic correlation identified by LOGODetect (lower triangle) between the seven neuropsychiatric traits; Barplot shows the observed scale heritability estimates and standard errors of the seven traits using LDSC[10]. b Mirrored Manhattan plot for BIP and SCZ. The 56 shared genome regions identified by LOGODetect are highlighted in red. One locus on chromosome 6 with in SCZ is truncated at 20 for visualization purpose.

LOGODetect identifies genome regions contributing to multiple neuropsychiatric traits.

a Heatmap shows the genetic correlation estimates (upper triangle) and the number of segments with local genetic correlation identified by LOGODetect (lower triangle) between the seven neuropsychiatric traits; Barplot shows the observed scale heritability estimates and standard errors of the seven traits using LDSC[10]. b Mirrored Manhattan plot for BIP and SCZ. The 56 shared genome regions identified by LOGODetect are highlighted in red. One locus on chromosome 6 with in SCZ is truncated at 20 for visualization purpose.

LOGODetect identifies precise regions with genetic sharing

To benchmark the performance of LOGODetect with existing approaches, we also applied ρ-HESS, coloc, and gwas-pw to the same seven neuropsychiatric traits. We first assumed full sample overlaps as suggested in the original paper that introduced ρ-HESS. In total, ρ-HESS detected 778 regions for BIP and SCZ, and 304 regions for SCZ and IQ (FDR < 0.05; Supplementary Table 14). It only detected three regions for MDD and NEU, and failed to detect any significant local genetic correlation for any disorder pairs of MDD, ADHD, and ASD. Additionally, we also estimated the shared sample sizes based on the reported size of cohorts included in multiple studies (Supplementary Table 15), and used these approximated values as inputs for ρ-HESS to correct for sample overlap bias. The results remained consistent (Supplementary Table 14). The colocalization methods also detected strong genetic sharing between BIP and SCZ, between SCZ and NEU, and between SCZ and IQ (Posterior probability > 0.95; Supplementary Table 14). We used the analysis of BIP and SCZ as an example to further illustrate the performance of LOGODetect. We used genetic covariance enrichment to quantify the precision of identified signal regions (Supplementary Notes). First, regions identified by LOGODetect showed the highest enrichment of genetic covariance compared to other methods (Fig. 4a). Although ρ-HESS identified more shared regions between BIP and SCZ, the enrichment of genetic covariance was 9.4-fold higher in the regions identified by LOGODetect, which is concordant with the simulation results based on G-scores. Second, we broke down the regions identified by ρ-HESS, coloc, and gwas-pw into two subsets: regions that overlap and do not overlap with regions identified by LOGODetect. The regions overlapping with LOGODetect results showed a higher enrichment for genetic covariance while enrichment in the regions identified by other methods alone were substantially lower (Fig. 4b). Third, to avoid comparison at an arbitrary significance cutoff, we ranked the regions identified by LOGODetect, ρ-HESS, coloc, and gwas-pw, by the corresponding p-values and posterior probability separately, and evaluated the proportion of explained genetic covariance at various thresholds. LOGODetect substantially outperformed other methods, explaining more genetic covariance with the same proportion of SNPs (Fig. 4c; Supplementary Figs. 31–50). We also used estimated overlapping sample sizes to de-bias ρ-HESS estimates and results remained consistent (Supplementary Fig. 51).
Fig. 4

LOGODetect identifies precise genomic regions harboring local genetic correlations.

Genetic covariance and its corresponding enrichment were calculated using stratified-LDSC[10]. a Genetic covariance fold enrichment (i.e. the ratio between the proportion of total genetic covariance and the proportion of the total SNP counts) in regions identified by LOGODetect, ρ-HESS, coloc, and gwas-pw, respectively. b Genetic covariance fold enrichment in regions identified by ρ-HESS, coloc, and gwas-pw that also overlapped with LOGODetect findings, and regions identified by ρ-HESS, coloc, and gwas-pw alone. c Genetic covariance explained and proportion of SNPs covered by regions identified by LOGODetect, ρ-HESS, coloc, and gwas-pw.

LOGODetect identifies precise genomic regions harboring local genetic correlations.

Genetic covariance and its corresponding enrichment were calculated using stratified-LDSC[10]. a Genetic covariance fold enrichment (i.e. the ratio between the proportion of total genetic covariance and the proportion of the total SNP counts) in regions identified by LOGODetect, ρ-HESS, coloc, and gwas-pw, respectively. b Genetic covariance fold enrichment in regions identified by ρ-HESS, coloc, and gwas-pw that also overlapped with LOGODetect findings, and regions identified by ρ-HESS, coloc, and gwas-pw alone. c Genetic covariance explained and proportion of SNPs covered by regions identified by LOGODetect, ρ-HESS, coloc, and gwas-pw. There are two reasons why our method showed improved performance compared to the other methods. First, ρ-HESS and the colocalization methods pre-specify regions of interest, which are generally much larger than the signal regions harboring true genetic sharing (Supplementary Fig. 52), while our scanning approach is data-adaptive and can precisely identify the boundaries for signal regions. Second, both BIP (heritability h2 = 0.35) and SCZ (h2 = 0.43) have high SNP heritability. As demonstrated in the simulations, regions identified by ρ-HESS may include a non-negligible proportion of false positive findings. Further, we evaluated the identified regions in an independent replication cohort. We tested whether the significantly correlated regions between BIP and SCZ can be replicated in the UK Biobank (UKBB). The summary statistics of BIP (ncase = 1064, ncontrol = 365,476) and SCZ (ncase = 571, ncontrol = 365,476) in the UKBB were collected (Supplementary Table 16). Due to the unbalanced case-control ratio and limited effective sample size, we used aggregated genetic covariance to evaluate the replication (Supplementary Notes). Stratified-LDSC was not applicable due to the imbalanced sample sizes of cases and controls, therefore we applied GNOVA[11] for stratified genetic covariance analysis of the regions identified by four methods in the UKBB data, respectively. The regions identified by LOGODetect and ρ-HESS both showed significant genetic covariance, but the regions identified by LOGODetect have a 6.7-fold higher genetic covariance enrichment than that of ρ-HESS, which demonstrates again that LOGODetect can more precisely detect the true signal regions (Table 1; Supplementary Fig. 53). Regions identified by gwas-pw showed no significant genetic covariance, while regions identified by coloc showed significant genetic covariance with the opposite sign.
Table 1

Stratified genetic covariance analysis on UKBB replication cohorts.

Genetic Covas.e.p-valueProportion of genetic cov (%)Proportion of SNPs (%)Fold enrichment
LOGODetect2.18e−46.65e−51.04e−311.501.1510.02
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\uprho}$$\end{document}ρ-HESS6.62e−43.16e−43.61e−230.0020.121.49
coloc−5.70e−52.30e−51.33e−2−2.85b0.34−8.36b
gwas-pw3.84e−56.54e−55.57e-11.921.611.20

aGenetic Cov represents estimated genetic covariance of the identified regions using GNOVA.

bGenetic covariance of regions identified by coloc has opposite sign compared to global genetic covariance, therefore the corresponding proportion of genetic covariance and fold enrichment are negative.

Stratified genetic covariance analysis on UKBB replication cohorts. aGenetic Cov represents estimated genetic covariance of the identified regions using GNOVA. bGenetic covariance of regions identified by coloc has opposite sign compared to global genetic covariance, therefore the corresponding proportion of genetic covariance and fold enrichment are negative. We also replicated findings for body-mass index (BMI) and height, for which independent replication cohorts of large sample size are available (Supplementary Notes). We identified 24 regions with significant local genetic correlation in the discovery analysis. 17 of 24 regions identified in the discovery stage were successfully replicated, suggesting the effectiveness of LOGODetect to identify replicable genomic regions with local genetic correlations (Supplementary Table 17).

Tissue enrichment of hub regions shared by neuropsychiatric traits

We used 66 GenoSkyline-Plus tissue-specific functional annotations[51] to investigate the functional relevance of the genomic regions found to harbor local genetic correlations among seven neuropsychiatric traits (Supplementary Table 18). We used permutation tests to assess the enrichment of genome regions shared by multiple traits in these annotation tracks. Genome regions identified by LOGODetect were significantly enriched in eight brain regions (minimum enrichment = 1.50, p = 4.00e−4) (Fig. 5a). In addition to brain tissues, regions shared by neuropsychiatric traits were also strongly enriched in mononuclear cells from peripheral blood (enrichment = 1.93, p = 1.00e−5) and pancreatic islets (enrichment = 2.11, p = 1.00e−5). Of note, annotated functional regions in mononuclear cells and pancreatic islets have substantial overlaps with annotations of brain tissues (Fig. 5b). After conditioning on functional regions in the brain, the enrichment in pancreatic islets was substantially reduced (enrichment = 1.1, p = 0.224; Fig. 5c), while enrichment in mononuclear cells remained significant (enrichment = 1.66, p = 3.55e−3).
Fig. 5

Tissue-specific enrichment of genome regions conferring risk for multiple neuropsychiatric traits.

a Permutation test results over 66 cell-type-specific annotations. Fold enrichment is labeled next to each bar. b The overlap of predicted functional regions in pancreatic islets, mononuclear cells from peripheral blood, and eight brain regions. c Enrichment in the predicted functional regions in pancreatic islets and mononuclear cells from peripheral blood after conditioning on the annotation overlap with brain regions.

Tissue-specific enrichment of genome regions conferring risk for multiple neuropsychiatric traits.

a Permutation test results over 66 cell-type-specific annotations. Fold enrichment is labeled next to each bar. b The overlap of predicted functional regions in pancreatic islets, mononuclear cells from peripheral blood, and eight brain regions. c Enrichment in the predicted functional regions in pancreatic islets and mononuclear cells from peripheral blood after conditioning on the annotation overlap with brain regions. To further assess whether enrichments are truly tissue-specific, we performed conditional analysis on six generic annotations (i.e., coding regions, enhancers, introns, promoters, 5′UTRs, and 3′UTRs, extended by a 500-bp window around each annotation) in Finucane et al. [52]. After conditioning on these annotations, the enrichment in brain tissues remained significant (minimum enrichment = 1.37, p = 1.98e−3), suggesting that the observed enrichment in functional genome in these brain tissues were not driven by generic annotations alone (Supplementary Fig. 54). We also ran Gene Ontology-enrichment analysis using FUMA[53]. The 968 genes in regions detected by LOGODetect were significantly enriched for 83 GO terms (Supplementary Table 19) after multiple testing correction, including RNA metabolic process (p = 5.36e−13), nucleolus (p = 9.30e−6), and protein arginine deiminase activity (p = 7.35e−9).

Hub regions contributing to multiple neuropsychiatric traits

Next, we investigated hub regions shared by five or more traits. Among the 227 non-overlapping genome regions identified in our analysis, 91 regions were identified in two or more different trait pairs (Supplementary Data 2). The five regions identified in at least seven pair-wise analyses are summarized in Supplementary Table 20. Notably, LOGODetect consistently identified these hub regions in more trait pairs compared to other methods. These hub regions show consistent associations with multiple neuropsychiatric traits and can potentially reveal key mechanisms and pathways underlying the shared genetics across traits. The region showing significant correlation in nine pair-wise analyses is a locus spanning 711 KB on chromosome 11 (Fig. 6). Interestingly, two independent peaks were identified in this region between SCZ and NEU and between MDD and NEU. SNPs in this region have previously reached genome-wide significance in the SCZ[40] (lead SNP rs2514218; p = 2.42e−12), NEU[45] (lead SNP rs35738585; p = 2.47e−17), and IQ GWAS[44] (lead SNP rs2885208; p = 4.58e−8). Additionally, SNPs at this locus showed consistent associations with BIP (lead SNP rs10502165; p = 3.90e−5). More importantly, this genome region covers the NTAD (NCAM1-TTC12-ANKK1-DRD2) gene cluster. Multiple variants of DRD2 and NCAM1 are reported to be associated with BIP, SCZ, MDD, and NEU[54-56]. Also, multiple eQTLs for DRD2 (lead SNP rs6589381; p = 1.10e−14) and NCAM1 (lead SNP rs1079021; p = 9.20e−16) are located in the region.
Fig. 6

Putative target genes for the hub region in chr11 shared by nine neuropsychiatric trait pairs.

Locuszoom plot, recombination rate, and the gene names are provided. The colored band denote the location of the significant region and which trait pair is detected in.

Putative target genes for the hub region in chr11 shared by nine neuropsychiatric trait pairs.

Locuszoom plot, recombination rate, and the gene names are provided. The colored band denote the location of the significant region and which trait pair is detected in. Another region on chromosome 11 spans 488 KB and shows significant correlations in seven pair-wise analyses (Supplementary Fig. 55). IGSF9B, a potential risk gene for SCZ[40] and IQ[44], and its multiple eQTLs (lead SNP rs558709; p = 1.80e−13) are located in this genomic region. The third hub region is located on chromosome 14 spanning 694 KB and shows significant correlations in seven trait pairs (Supplementary Fig. 56). Gene PRKD1 largely overlaps with this locus, and FOXG1, which is an associated gene for SCZ[40] and IQ[44], lies about 200 KB away from the locus. In addition, multiple eQTLs for PRKD1 (lead SNP rs80019464; p = 6.40e−5) and FOXG1 (lead SNP rs138384350; p = 6.10e−7), are located in the region. The fourth region on chromosome 3 spans 258 KB and was identified in seven pairs (Supplementary Fig. 57). Notably, most parts of this genomic region are covered by the gene FOXP1, which is an implicated risk gene for SCZ[40] and IQ[44]. The final region spans 450 KB on chromosome 10. This region was identified in seven trait pairs (Supplementary Fig. 58) and largely overlaps with SORCS3, a previously implicated risk gene for MDD and ADHD[42,57,58].

Discussion

Through simulations and analyses of GWAS data, we demonstrated that our method effectively identified genetic regions that show concordant associations across multiple complex traits with high resolution and statistical power. Compared to existing approaches, LOGODetect has greater statistical power and is robust across various heritability settings and in existence of sample overlaps. Applied to well-powered GWASs for seven phenotypically distinct but genetically correlated neuropsychiatric traits, LOGODetect identified numerous shared genomic regions including hub regions that showed consistent effects for more than four traits. The regions identified by LOGODetect explain a larger portion of genetic covariance than existing approaches. Furthermore, the enrichment holds true in independent replication studies. Two genes (i.e. DRD2 and NCAM1) are located in the hub region on chromosome 5 (Fig. 6). DRD2, also known as dopamine receptor D2, encodes the D2 subtype of the dopamine receptor. The dopamine hypothesis of schizophrenia suggests that dopaminergic pathways are overactive in schizophrenia[59]. In addition, multiple variants are reported to be associated with psychiatric disorders[54]. NCAM1, short for neural cell adhesion molecule 1, plays an important role in formation of plexiform layers, neurite fasciculation, nerve–muscle interactions and other aspects of neural development[60]. Expression of PSA-NCAM is increased in antidepressant treatment, while in animal models of depression or in depressed patients PSA-NCAM is reduced[56]. Notably, NCAM1 was identified by LOGODetect as implicated gene for MDD, but it cannot be identified by other three methods. Other identified hub regions also included a handful of interesting candidate genes. IGSF9B (Supplementary Fig. 55) encodes a brain-specific cell adhesion molecule which is highly expressed in GABAergic interneurons, concentrated to hippocampal and cortical inhibitory synapses for their development into interneurons[61]. Interestingly, promotion of IGSF9B for inhibitory synapses development is coupled with NLGN2, loss of function variants of which were found in autism and schizophrenia patients[62,63]. PRKD1 (Supplementary Fig. 56) encodes a serine/threonine protein kinase which is important in many cellular processes, and regulates neuronal polarity, synapse formation, and synaptic plasticity[64-66]. FOXG1 (Supplementary Fig. 56) encodes the fork-head box protein G1 which is strongly expressed in neural tissues, operates as a transcriptional repressor essential in brain development[67]. It was suggested that PRKD1 locus regulates FOXG1 in a cis-acting way, and is associated with the FOXG1 syndrome including mental retardation, absent language, and dyskinesia[67]. FOXP1 (Supplementary Fig. 57) is one of the FOXP transcription factor subfamily. It is expressed in cerebral cortex, striatum, and spinal cord of the central nervous system, and is shown to regulate striatum development, motor neuron migration, and midbrain dopamine neuron differentiation[68]. FOXP1 is associated with ASD, speech delay, and intellectual disability[69,70]. SORCS3 (Supplementary Fig. 58) is highly expressed in the CA1 region of the hippocampus, and is involved in synaptic depression and spatial learning ability[71,72]. It is also known to play an important role in protein networks associated with PICK1, NGF, and PDGF-BB[73,74], which have been implicated in ADHD, ASD, MDD, and SCZ[75-78]. Our method still has some limitations. First, the goal of LOGODetect is to identify genomic regions harboring local genetic correlations. We do not give explicit estimation of local genetic correlation, but the sign of the correlation can be inferred. Although local genetic correlation in identified regions can be estimated by other methods (e.g., ρ-HESS) in principle, this remains a statistically challenging problem. As shown in our simulations, the estimation is inaccurate. Under the null hypothesis that local genetic correlation is zero, ρ-HESS underestimates the standard error of local genetic covariance when heritability is high and leads to inflated type I error rates, and it overestimates the standard error of local genetic covariance when heritability is low and leads to deflated statistical power. We note that the deflation of type I error observed for ρ-HESS is not contradictory to results published in ρ-HESS paper[12]. ρ-HESS was formulated as an estimation problem instead of the hypothesis testing problem in our manuscript. In their paper, they have shown simulation results to demonstrate that the local genetic correlation can be accurately estimated when the true parameter is 0. However, the evidence shown in the ρ-HESS paper could not rule out deflation when the method is used for inference. These problems are further exacerbated when ρ-HESS is applied to very small local genomic regions identified by LOGODetect. Second, LOGODetect scans a large number of genomic regions to search for local regions where the scan statistic significantly deviates from the null distribution. We currently do not have an analytical solution to derive or approximate the theoretical null distribution. Instead, a Monte Carlo approach is employed to quantify the null distribution of the maximal scan statistic, which is computationally expensive. Third, we proposed an empirical method to select the tuning parameter based on the aggregated genetic covariance of the identified regions. Although there is no theoretical guarantee, we have conducted extensive simulations to demonstrate that the empirical strategy to estimate θ works well with frequently used genetic models and leads to superior performance of LOGODetect compared to competitive methods, in terms of error control and statistical power. Fourth, our simulations are conducted with simulated genotypes based on the European ancestry individual data in the 1000 Genomes Project. It would be interesting to simulate data with various population structures to test our method. In real data applications, GWAS summary statistics are usually corrected for the genomic control factor, thus we expect population structure to have minor impact on the performance of LOGODetect. Fifth, several recent methods have been proposed to jointly model more than two GWAS traits to infer the structure of shared genetics across multiple phenotypes[14,47,79]. A future direction is to generalize our method to search for genomic regions shared by more than two traits. Finally, LOGODetect studies genetic correlation from GWAS data, which uses a bivariate random effect model and defines the genetic correlation as the correlation between SNPs[10,18,21,80]. Under this model, the definition of genetic correlation is consistent with the traditional definition of correlation of additive genetic effects[10]. Yet the concept should be distinguished from the additive genetic correlation, since the estimation could be biased to the partial effects of tag SNPs, and the causal effects of untagged SNPs would be absorbed to effect of random error term[81]. Taken together, we have introduced LOGODetect, a scan statistic method to identify local genetic regions showing correlated effects with multiple neuropsychiatric traits. Complementary to single SNP-based approaches for pleiotropy mapping[17,28] and genetic correlation estimation methods utilizing genome-wide data[10,21], our method elucidates the shared genetic architecture between two traits by identifying local genomic segments that are concordant. The candidate genes and regions we identified may be tapping into a set of transdiagnostic mechanisms that underlie all of psychopathology (i.e., the “p” or general factor[47]). In practice, LOGODetect can be used in combination with other methods to further improve statistical power and biological interpretability. For example, it may be of interest to first screen the genome by identifying larger genetic regions[12,82] or certain functional annotations[11] enriched for the shared genetics between two traits. Then, LOGODetect can be applied to these candidate regions to identify the precise genetic segments that explain such sharing. Since high-dimensional sampling remains a challenge, a multi-tier analytical strategy would improve the statistical power and computational burden in the analysis. We believe that LOGODetect has addressed some key limitations in the current practice of cross-trait genetic correlation analysis and will benefit complex trait genetics research.

Methods

Genetic model

Suppose two standardized traits y1 and y1 follow the linear model with random effects:where X and Z are fixed and standardized genotype matrices with M columns (i.e. the number of SNPs is M); ε and δ are non-genetic effects; β and γ are M-dimensional vectors denoting genetic effects. They follow the multivariate normal distribution: where and denote the heritability for two traits; ρg is the global genetic covariance between two traits; is a diagonal matrix whose ith diagonal element equals 1 if the effects of the ith SNP on two traits (i.e. β and γ) are correlated and equals 0 if otherwise; K is the number of SNPs such that β and γ are correlated, i.e., . β and γ are independent from non-genetic effects ε and δ. The statistical model described here is similar to the polygenic model used in genetic correlation estimation[10]. The difference is that we allow local genetic sharing and do not assume the global genetic covariance to be equally attributed to all SNPs in the whole genome. Compared to the local genetic correlation estimation method in the literature[12], we do not assume genetic effects to be fixed. Instead, our framework is a direct generalization of the model developed for global genetic correlation estimation[10,11]. Under the alternative hypothesis, we denote the non-overlapping genetic regions that contribute to multiple traits to be and the union set as such that if and only if . While under the null hypothesis, two traits share no genetic covariance, i.e., .

Scan statistic and scanning procedure

We use a scan statistics approach to identify regions showing correlated effects between different traits. This type of approach has been used for burden test in a single-trait setting[83]. Suppose are the sample sizes for two GWASs, respectively, and we first consider the simpler case that there is no sample overlap between two GWASs. Additionally, we denote the association z-scores for two traits as Then, we can define the scan statistic:where R is the index set for SNPs in a genome region, is the LD score[80] for the ith SNP computed within a 1 MB window, and is a tuning parameter that controls the strength we penalize over the LD structure. If SNPs in the region show strong, concordant effects on both traits, then the inner product will tend to have a larger absolute value and therefore yield a larger scan statistic. On the contrary, if two traits are genetically independent in the local region, then the corresponding scan statistic would be close to 0. Therefore, the scan statistic is informative to detect local genetic correlation. The purpose of the LD score term in the denominator is to normalize the effect of LD. The expected absolute value of is larger in regions with strong LD (Supplementary Fig. 59; Supplementary Notes). Without the normalization term on the denominator, the method will favor regions with large LD that may not be of biological interest. Finally, we use the maximal scan statistic over all possible regions as the test statistic:where C is a pre-specified parameter that defines the upper boundary of the SNPs count in a region. In practice, C can be set based on the number of SNPs in the dataset (e.g. the average number of SNPs in 1 million bases). LOGODetect takes advantages of the flexible framework to scan local regions with varying sizes. Compared to a sliding-window approach based on a pre-specified window size, our method is more appealing since the size of signal region could vary substantially by locus and by trait. We use a Monte Carlo type approach to assess the distribution of under the null hypothesis. We draw 5000 pseudo-samples under the null distribution using a procedure detailed in the next section. Then, we estimate the empirical null distribution of and its 95% upper quantile, . Taken together, the scanning procedure works as follows. We scan the genome to find such that reaches the maximum. If , we claim that is a significant signal region and remove these SNPs from the analysis. Then, we repeat the procedure on the remaining SNPs until no region is declared significant. This procedure controls the family-wise type I error rate. Calculating over all possible candidate regions is indeed computationally expensive, so we constrain to be a multiple of 10 in practice, which reduced the computation burden by ~10 folds, with minimal reduction in accuracy. Finally, regions that are no more than 100 KB away from each other are merged into a single region.

Choice of parameter θ

Parameter θ affects the size of identified regions. A relatively long segment may not have a large absolute value of scan statistic, due to the penalty in the denominator . A larger θ implies stronger penalty, henceforth is more likely to detect smaller signal segments. In particular when θ equals 1, will attain local maximum with containing only one variant. A reasonable range for θ is between 0 and 1. In practice, it is important to consider the “best” θ adaptive to the data. We used the proportion of genetic covariance of the identified regions as the metric. We varied the value of θ in the candidate set, and chose the best θ such that the corresponding identified regions have the largest genetic covariance. In general, one can use any subset of [0, 1] as the candidate set of θ values. However, extensively searching for θ substantially increases the computation time. In practice, we suggest the set of {0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7} would be sufficient. Denote the regions detected by LOGODetect under parameter θ as . We denote their union as and denote the genetic covariance in as . In theory, , where is union set of true signal regions, is the global genetic covariance, and is the number of SNPs in . In practice, the true signal regions is unknown. can be estimated using the stratified-LDSC[10]. Let be the proportion of genetic covariance explained by to the global genetic covariance. We assume that and π(θ) = 0 if . We calculate for a candidate set of θ values, and then we determine θ adaptive to data via the following optimization problem:

Monte Carlo simulation of pseudo-z-score vectors

In order to simulate the null distribution of , we need to generate pseudo-z-score vectors. When two GWASs do not have sample overlap, it can be verified that And similarly for . Therefore, under H0, the combined z-score vector asymptotically. Note that in practice individual genotype data is hard to obtain due to privacy, it is meaningful to analyze based only on summary statistics. Here by using reference panel (e.g. the 1000 Genomes Project Phase 3 data[30]), XTX and ZTZ can be estimated as V, XTXXTX and ZTZZTZ can be estimated as , where n is the sample size of the reference panel and V is the LD matrix of the reference panel. And the genetic heritability for two traits can be estimated through LDSC[80]. After plugging in the reference LD matrix, we haveasymptotically under the null. The random multivariate normal vectors have complex covariance structure, which is computationally challenging as the dimension of the vector can be as high as 107 in GWAS. We developed a computationally tractable method that leverages the LD structure in the genome. First, we split the high-dimensional vector z into sub-vectors . These sub-vectors are defined by the genome positions, each spanning 1 MB genome block, i.e. chr1: 0–1 MB, chr1: 1–2 MB, etc. We denote the variance matrix of z as Σ and it can be written as the block matrix form. Denote as the submatrix of Σ, with rows indexed by the th block and columns indexed by the th block . Then we use a block-wise tridiagonal matrix to approximate Σ by shrinking to 0 if . This approximation is reasonable in the context of GWAS since SNPs should be independent if they are physically apart. Then, we can use an iterative approach to generate each block by conditioning on the previous block via the conditional normal distribution: In practice, may be rank deficient and therefore not invertible. We adopt the truncated singular value decomposition (TSVD) method[84] and use the top q singular values and their corresponding singular vectors to calculate the inverse matrix. For numerical stability, we choose q to be as large as possible such that the conditional number is <1000[85]. Finally, we standardize each pseudo z-score vector so that it has the same mean and variance as the z-score vector in real data.

Application to binary traits

So far, we have based the derivation on the setting that the both input traits are continuous. This is a common approach to introducing genetic correlation methodology[10,11]. However, most genetic correlation methods, including LOGODetect, can be directly applied to GWAS summary statistics of binary outcomes[10,11]. It is known that under the liability threshold model, the following formulas hold[10]:where and denote heritability and genetic covariance on the observed scale, respectively; P1 and P2 denote population prevalence for two traits; S1 and S2 denote sample prevalence for two traits; , and denote the standard normal distribution density and its cumulative distribution function, respectively. When applying LOGODetect to binary traits, we replace (i.e., heritability on the liability scale) with (i.e., heritability on the observed scale).

Extension for sample overlaps

Suppose there are shared samples in the two GWASs, then the linear models can be restated aswhere are the standardized phenotypes of all individuals in each GWAS. are standardized genotypes of all individuals in each GWAS. are the non-genetic effects where . It can be shown that While and have the same form as no sample overlaps setting. By using reference panel, can be replaced by V. Therefore, under , the combined z-score vectors asymptotically follows multivariate normal distributions Note that the variance matrix can be split into two terms asif is positive, and can be split into two terms as if is negative. We can independently simulate pseudosamples following the normal distribution with mean 0 and each variance term, respectively. Finally, by adding up two vectors simulated with respect to different variance terms, we get the pseudo z-score vector of interest. In particular, the parameters appearing in the z-score null distribution are not of our interest, but we need their values while doing Monte Carlo sampling of . We adopt LDSC[10] to estimate them. Note that LDSC is based on random effect random design model setup, which is incompatible with our model assumption, yet we believe it should yield little consequence.

Genome partition and FDR control

We separated the genome into 204 LD blocks using ldetect[86]. Each LD block spans 15 MB on average. We applied LOGODetect to each LD block separately and identified the local regions with p-value < 0.05 under a family-wise type I error control. We aggregated all the candidate regions across different LD blocks, and applied Benjamini–Hochberg procedure[87] to control FDR with a cutoff of 0.05, accounting for the multiple testing problem concerning all LD blocks.

Computation time

The major computation step in LOGODetect is to compute the maximal scan statistic in real data and in Monte Carlo samples. The computation time depends on the number of SNPs in GWAS. For a typical GWAS with 6 million SNPs, it takes about 12 h on a 2.5GHz cluster with 22 computation cores.

Simulation settings

Based on 503 individuals with European ancestry from the 1000 Genomes Project Phase 3 data, we simulated genotype data for 100,000 individuals with minor allele frequency (MAF) > 5% on chromosome 1 using HAPGEN2[29]. 336,532 variants remained in the dataset after removing strand-ambiguous SNPs. Samples were randomly divided into two subsets with equal sample size, each with 50,000 individuals. We used each subset to simulate the phenotype data. First, we performed simulations under the null hypothesis to see whether our approach would produce false positive findings. We follow the infinitesimal model, where the effect size level of all the normalized SNPs are the same, and the per-normalized-SNP genetic effect was drawn from a normal distribution for both traits. To realistically model the polygenic genetic architecture with different levels of genetic effects, we attributed 30% of the trait heritability to 5000 randomly chosen SNPs, while the remaining SNPs explain 70% of the trait heritability. The per-SNP genetic effect was drawn from a normal distribution for SNPs with high heritability enrichment, and from for SNPs with low heritability enrichment. The trait heritability was set to vary from 0.01 to 0.05 in each scenario. Note that a heritability value of 0.01 or 0.05 on chromosome 1 will approximately correspond to heritability values of 0.12 or 0.60 in the whole genome, which are realistic values for typical GWAS traits. Each simulation setting was repeated for 100 times. Next, we performed simulations to assess the statistical power under a heritability enrichment model. We randomly selected N = 5 segments, each containing L = 1000 SNPs, as the signal regions shared between two traits. We attributed of trait heritability to the signal regions. The genetic effect size for the SNPs in the signal regions follows a multivariate normal distribution The genetic effect size for the SNPs outside the signal regions follows a different multivariate normal distribution without local genetic correlation. The trait heritability h2 was set to vary from 0.01 to 0.05, and the correlation of genetic effect size of two traits ρ was set to 0.9. Each simulation setting was repeated for 100 times. Further simulation settings are described in detail in the Supplementary Notes. We adjusted the significance cutoff of different approaches to achieve the same type I error. For coloc and gwas-pw, in those heritability settings with empirical type I error >0.05, we increased the cutoff of the posterior probabilities so that the empirical type I error is controlled at 0.05.

Evaluate model performance

We use three different metrics to quantify the performance of our approach. Denote the true signal segments as , and the segments detected by LOGODetect as . We define the signal points detection rate as the number of true signal SNPs detected by LOGODetect divided by the number of true signal SNPs, that is . Similarly, we define signal segments detection rate as the number of true signal segments detected by LOGODetect divided by the number of true signal segments, namely , where we call a segment true positive if it overlaps with a true signal segment. Signal points detection rate and signal segments detection rate aim to measure the sensitivity at the SNPs level and segments level, respectively. To take the extent of the overlap into consideration, we also followed[88] to define , the G-score with respect to a signal region , as , and further define the G-score measure as . The G-score aims to measure the specificity and sensitivity together. The three metrics were also applied to quantify ρ-HESS, coloc, and gwas-pw.

Implementation of different methods

We used ldetect[86] to pre-specify 1703 approximately LD-independent blocks (spanning 1.6 Mb on average) as candidate genomic regions, as suggested by ρ-HESS and gwas-pw. We also used these LD-independent blocks as candidate genomic regions for coloc. In simulation studies, we used 133 approximately LD-independent regions in chromosome 1 as the pre-specified genomic regions for ρ-HESS, coloc, and gwas-pw. For ρ-HESS, the 1000 Genomes Project Phase 3 data[30] was used as the reference panel, the number of eigenvectors used in the truncated-SVD for LD matrix inversion is determined as 50 by default, and the minimum eigenvalue cut off in truncated-SVD is determined as 1.0 by default, as suggested by the ρ-HESS software (https://huwenboshi.github.io/hess/). ρ-HESS reported the estimate and significance of local genetic correlation for each candidate genomic region, and we applied Benjamini–Hochberg procedure[87] to control FDR with a cutoff of 0.05, accounting for the multiple testing problem concerning all genomic regions. Coloc (https://CRAN.R-project.org/package=coloc) and gwas-pw (https://github.com/joepickrell/gwas-pw) estimated the posterior probability that two traits shared at least one causal SNP for each genomic region, and those genomic regions with posterior probability above 0.95 are determined as identified regions. We used LDSC (https://github.com/bulik/ldsc) to estimate heritability in each chromosome. Stratified-LDSC was used to estimate genetic covariance of the identified regions. In detail, we manually created two annotations: the identified regions and the remaining genome, then we ran the standard LDSC software to calculate the genetic covariance and the proportion of genetic covariance of each annotation. For both LDSC and stratified-LDSC, LD scores were computed with the standard LDSC software from 503 individuals with European ancestry from the 1000 Genomes Project Phase 3 data. Both methods were applied with an unconstrained intercept, using all SNPs as observations in the dependent variable and LD scores as regression weights.

Application of LOGODetect to seven neuropsychiatric traits

We applied LOGODetect to seven neuropsychiatric traits. The European ancestry genotype data from 1000 Genomes Project was used as the reference panel to estimate the LD matrix. For each GWAS data, indels and SNPs not present in the reference panel were removed. The SNPs of MAF  < 0.01 in the reference panel were also removed. Then for each trait pair, we filtered out all the strand-ambiguous SNPs and took the overlaps. For SNPs whose effect alleles were the same in the two GWASs, the original z-scores were used. For SNPs whose effect alleles were reversed in two GWASs, we reversed the sign of z-score in the second GWAS accordingly. Thus, the allele coding schemes between any two studies were consistent. Then we applied LOGODetect to perform the downstream analysis.

Enrichment analysis

We aggregated 227 non-overlapping segments identified by LOGODetect in seven neuropsychiatric traits and investigated if these segments are enriched in predicted functional regions for a given tissue or cell type. Tissue or cell type-specific functional regions were defined using GenoSkyline-Plus annotations and dichotomized with a cutoff of 0.5. The annotation is robust to the cutoff due to the bimodal pattern in raw GenoSkyline-Plus annotation scores. To assess the statistical significance of enrichment, we randomly selected 227 non-overlapping segments across the genome while matching their sizes with the detected segments, and calculated the overlaps with GenoSkyline-Plus annotations. We repeated the permutation procedure 100,000 times to evaluate the significance of the observed overlap. We also assessed whether the detected regions were enriched in non-brain tissue types after adjusting for the overlap of brain and non-brain annotations. Specifically, for the pancreatic islets cell type annotation, we removed the annotations that overlap with any of the eight significant brain cell type annotations to define the conditional annotation of pancreatic islets. The same procedure was taken to define the conditional annotation of mononuclear cells from peripheral blood. Afterwards, permutation tests were performed on these two conditional annotations. We performed conditional analysis on six generic annotations including coding regions, enhancers, introns, promoters, 5′UTRs and 3′UTRs (extended by a 500-bp window around each of the annotations) in Finucane et al. [52] by removing the overlapped regions between each generic annotation and the brain tissue-specific annotations (merged from eight significant brain cell type annotations). We used permutation test to assess the statistical significance of enrichment in conditional analyses. Using GENCODE V33lift37 on the UCSC genome browser, we extracted 968 genes with recognized Ensembl IDs in the genomic regions found to harbor local genetic correlations among seven neuropsychiatric traits. We used FUMA[53] to run the Gene Ontology enrichment analysis with these 968 genes.
  84 in total

Review 1.  Dissecting the genetics of complex traits using summary association statistics.

Authors:  Bogdan Pasaniuc; Alkes L Price
Journal:  Nat Rev Genet       Date:  2016-11-14       Impact factor: 53.242

2.  Spatiotemporal expression analysis of the growth factor receptor SorCS3.

Authors:  Sandra Oetjen; Claudia Mahlke; Irm Hermans-Borgmeyer; Guido Hermey
Journal:  J Comp Neurol       Date:  2014-05-02       Impact factor: 3.215

3.  Recurrent infection progressively disables host protection against intestinal inflammation.

Authors:  Won Ho Yang; Douglas M Heithoff; Peter V Aziz; Markus Sperandio; Victor Nizet; Michael J Mahan; Jamey D Marth
Journal:  Science       Date:  2017-12-22       Impact factor: 47.728

4.  Serum nerve growth factor (NGF) levels in children with attention deficit/hyperactivity disorder (ADHD).

Authors:  Esra Guney; Mehmet Fatih Ceylan; Mehmet Kara; Neslihan Tekin; Zeynep Goker; Gulser Senses Dinc; Onder Ozturk; Sevda Eker; Murat Kizilgun
Journal:  Neurosci Lett       Date:  2013-12-19       Impact factor: 3.046

Review 5.  The schizophrenic faces of PICK1.

Authors:  Kumlesh K Dev; Jeremy M Henley
Journal:  Trends Pharmacol Sci       Date:  2006-09-29       Impact factor: 14.819

6.  Genome-wide efficient mixed-model analysis for association studies.

Authors:  Xiang Zhou; Matthew Stephens
Journal:  Nat Genet       Date:  2012-06-17       Impact factor: 38.330

7.  Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits.

Authors:  Shashaank Vattikuti; Juen Guo; Carson C Chow
Journal:  PLoS Genet       Date:  2012-03-29       Impact factor: 5.917

8.  An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations.

Authors:  Arunabha Majumdar; Tanushree Haldar; Sourabh Bhattacharya; John S Witte
Journal:  PLoS Genet       Date:  2018-02-12       Impact factor: 5.917

9.  Analysis of shared heritability in common disorders of the brain.

Authors:  Verneri Anttila; Brendan Bulik-Sullivan; Hilary K Finucane; Raymond K Walters; Jose Bras; Laramie Duncan; Valentina Escott-Price; Guido J Falcone; Padhraig Gormley; Rainer Malik; Nikolaos A Patsopoulos; Stephan Ripke; Zhi Wei; Dongmei Yu; Phil H Lee; Patrick Turley; Benjamin Grenier-Boley; Vincent Chouraki; Yoichiro Kamatani; Claudine Berr; Luc Letenneur; Didier Hannequin; Philippe Amouyel; Anne Boland; Jean-François Deleuze; Emmanuelle Duron; Badri N Vardarajan; Christiane Reitz; Alison M Goate; Matthew J Huentelman; M Ilyas Kamboh; Eric B Larson; Ekaterina Rogaeva; Peter St George-Hyslop; Hakon Hakonarson; Walter A Kukull; Lindsay A Farrer; Lisa L Barnes; Thomas G Beach; F Yesim Demirci; Elizabeth Head; Christine M Hulette; Gregory A Jicha; John S K Kauwe; Jeffrey A Kaye; James B Leverenz; Allan I Levey; Andrew P Lieberman; Vernon S Pankratz; Wayne W Poon; Joseph F Quinn; Andrew J Saykin; Lon S Schneider; Amanda G Smith; Joshua A Sonnen; Robert A Stern; Vivianna M Van Deerlin; Linda J Van Eldik; Denise Harold; Giancarlo Russo; David C Rubinsztein; Anthony Bayer; Magda Tsolaki; Petra Proitsi; Nick C Fox; Harald Hampel; Michael J Owen; Simon Mead; Peter Passmore; Kevin Morgan; Markus M Nöthen; Martin Rossor; Michelle K Lupton; Per Hoffmann; Johannes Kornhuber; Brian Lawlor; Andrew McQuillin; Ammar Al-Chalabi; Joshua C Bis; Agustin Ruiz; Mercè Boada; Sudha Seshadri; Alexa Beiser; Kenneth Rice; Sven J van der Lee; Philip L De Jager; Daniel H Geschwind; Matthias Riemenschneider; Steffi Riedel-Heller; Jerome I Rotter; Gerhard Ransmayr; Bradley T Hyman; Carlos Cruchaga; Montserrat Alegret; Bendik Winsvold; Priit Palta; Kai-How Farh; Ester Cuenca-Leon; Nicholas Furlotte; Tobias Kurth; Lannie Ligthart; Gisela M Terwindt; Tobias Freilinger; Caroline Ran; Scott D Gordon; Guntram Borck; Hieab H H Adams; Terho Lehtimäki; Juho Wedenoja; Julie E Buring; Markus Schürks; Maria Hrafnsdottir; Jouke-Jan Hottenga; Brenda Penninx; Ville Artto; Mari Kaunisto; Salli Vepsäläinen; Nicholas G Martin; Grant W Montgomery; Mitja I Kurki; Eija Hämäläinen; Hailiang Huang; Jie Huang; Cynthia Sandor; Caleb Webber; Bertram Muller-Myhsok; Stefan Schreiber; Veikko Salomaa; Elizabeth Loehrer; Hartmut Göbel; Alfons Macaya; Patricia Pozo-Rosich; Thomas Hansen; Thomas Werge; Jaakko Kaprio; Andres Metspalu; Christian Kubisch; Michel D Ferrari; Andrea C Belin; Arn M J M van den Maagdenberg; John-Anker Zwart; Dorret Boomsma; Nicholas Eriksson; Jes Olesen; Daniel I Chasman; Dale R Nyholt; Andreja Avbersek; Larry Baum; Samuel Berkovic; Jonathan Bradfield; Russell J Buono; Claudia B Catarino; Patrick Cossette; Peter De Jonghe; Chantal Depondt; Dennis Dlugos; Thomas N Ferraro; Jacqueline French; Helle Hjalgrim; Jennifer Jamnadas-Khoda; Reetta Kälviäinen; Wolfram S Kunz; Holger Lerche; Costin Leu; Dick Lindhout; Warren Lo; Daniel Lowenstein; Mark McCormack; Rikke S Møller; Anne Molloy; Ping-Wing Ng; Karen Oliver; Michael Privitera; Rodney Radtke; Ann-Kathrin Ruppert; Thomas Sander; Steven Schachter; Christoph Schankin; Ingrid Scheffer; Susanne Schoch; Sanjay M Sisodiya; Philip Smith; Michael Sperling; Pasquale Striano; Rainer Surges; G Neil Thomas; Frank Visscher; Christopher D Whelan; Federico Zara; Erin L Heinzen; Anthony Marson; Felicitas Becker; Hans Stroink; Fritz Zimprich; Thomas Gasser; Raphael Gibbs; Peter Heutink; Maria Martinez; Huw R Morris; Manu Sharma; Mina Ryten; Kin Y Mok; Sara Pulit; Steve Bevan; Elizabeth Holliday; John Attia; Thomas Battey; Giorgio Boncoraglio; Vincent Thijs; Wei-Min Chen; Braxton Mitchell; Peter Rothwell; Pankaj Sharma; Cathie Sudlow; Astrid Vicente; Hugh Markus; Christina Kourkoulis; Joana Pera; Miriam Raffeld; Scott Silliman; Vesna Boraska Perica; Laura M Thornton; Laura M Huckins; N William Rayner; Cathryn M Lewis; Monica Gratacos; Filip Rybakowski; Anna Keski-Rahkonen; Anu Raevuori; James I Hudson; Ted Reichborn-Kjennerud; Palmiero Monteleone; Andreas Karwautz; Katrin Mannik; Jessica H Baker; Julie K O'Toole; Sara E Trace; Oliver S P Davis; Sietske G Helder; Stefan Ehrlich; Beate Herpertz-Dahlmann; Unna N Danner; Annemarie A van Elburg; Maurizio Clementi; Monica Forzan; Elisa Docampo; Jolanta Lissowska; Joanna Hauser; Alfonso Tortorella; Mario Maj; Fragiskos Gonidakis; Konstantinos Tziouvas; Hana Papezova; Zeynep Yilmaz; Gudrun Wagner; Sarah Cohen-Woods; Stefan Herms; Antonio Julià; Raquel Rabionet; Danielle M Dick; Samuli Ripatti; Ole A Andreassen; Thomas Espeseth; Astri J Lundervold; Vidar M Steen; Dalila Pinto; Stephen W Scherer; Harald Aschauer; Alexandra Schosser; Lars Alfredsson; Leonid Padyukov; Katherine A Halmi; James Mitchell; Michael Strober; Andrew W Bergen; Walter Kaye; Jin Peng Szatkiewicz; Bru Cormand; Josep Antoni Ramos-Quiroga; Cristina Sánchez-Mora; Marta Ribasés; Miguel Casas; Amaia Hervas; Maria Jesús Arranz; Jan Haavik; Tetyana Zayats; Stefan Johansson; Nigel Williams; Astrid Dempfle; Aribert Rothenberger; Jonna Kuntsi; Robert D Oades; Tobias Banaschewski; Barbara Franke; Jan K Buitelaar; Alejandro Arias Vasquez; Alysa E Doyle; Andreas Reif; Klaus-Peter Lesch; Christine Freitag; Olga Rivero; Haukur Palmason; Marcel Romanos; Kate Langley; Marcella Rietschel; Stephanie H Witt; Soeren Dalsgaard; Anders D Børglum; Irwin Waldman; Beth Wilmot; Nikolas Molly; Claiton H D Bau; Jennifer Crosbie; Russell Schachar; Sandra K Loo; James J McGough; Eugenio H Grevet; Sarah E Medland; Elise Robinson; Lauren A Weiss; Elena Bacchelli; Anthony Bailey; Vanessa Bal; Agatino Battaglia; Catalina Betancur; Patrick Bolton; Rita Cantor; Patrícia Celestino-Soper; Geraldine Dawson; Silvia De Rubeis; Frederico Duque; Andrew Green; Sabine M Klauck; Marion Leboyer; Pat Levitt; Elena Maestrini; Shrikant Mane; Daniel Moreno- De-Luca; Jeremy Parr; Regina Regan; Abraham Reichenberg; Sven Sandin; Jacob Vorstman; Thomas Wassink; Ellen Wijsman; Edwin Cook; Susan Santangelo; Richard Delorme; Bernadette Rogé; Tiago Magalhaes; Dan Arking; Thomas G Schulze; Robert C Thompson; Jana Strohmaier; Keith Matthews; Ingrid Melle; Derek Morris; Douglas Blackwood; Andrew McIntosh; Sarah E Bergen; Martin Schalling; Stéphane Jamain; Anna Maaser; Sascha B Fischer; Céline S Reinbold; Janice M Fullerton; José Guzman-Parra; Fermin Mayoral; Peter R Schofield; Sven Cichon; Thomas W Mühleisen; Franziska Degenhardt; Johannes Schumacher; Michael Bauer; Philip B Mitchell; Elliot S Gershon; John Rice; James B Potash; Peter P Zandi; Nick Craddock; I Nicol Ferrier; Martin Alda; Guy A Rouleau; Gustavo Turecki; Roel Ophoff; Carlos Pato; Adebayo Anjorin; Eli Stahl; Markus Leber; Piotr M Czerski; Cristiana Cruceanu; Ian R Jones; Danielle Posthuma; Till F M Andlauer; Andreas J Forstner; Fabian Streit; Bernhard T Baune; Tracy Air; Grant Sinnamon; Naomi R Wray; Donald J MacIntyre; David Porteous; Georg Homuth; Margarita Rivera; Jakob Grove; Christel M Middeldorp; Ian Hickie; Michele Pergadia; Divya Mehta; Johannes H Smit; Rick Jansen; Eco de Geus; Erin Dunn; Qingqin S Li; Matthias Nauck; Robert A Schoevers; Aartjan Tf Beekman; James A Knowles; Alexander Viktorin; Paul Arnold; Cathy L Barr; Gabriel Bedoya-Berrio; O Joseph Bienvenu; Helena Brentani; Christie Burton; Beatriz Camarena; Carolina Cappi; Danielle Cath; Maria Cavallini; Daniele Cusi; Sabrina Darrow; Damiaan Denys; Eske M Derks; Andrea Dietrich; Thomas Fernandez; Martijn Figee; Nelson Freimer; Gloria Gerber; Marco Grados; Erica Greenberg; Gregory L Hanna; Andreas Hartmann; Matthew E Hirschtritt; Pieter J Hoekstra; Alden Huang; Chaim Huyser; Cornelia Illmann; Michael Jenike; Samuel Kuperman; Bennett Leventhal; Christine Lochner; Gholson J Lyon; Fabio Macciardi; Marcos Madruga-Garrido; Irene A Malaty; Athanasios Maras; Lauren McGrath; Eurípedes C Miguel; Pablo Mir; Gerald Nestadt; Humberto Nicolini; Michael S Okun; Andrew Pakstis; Peristera Paschou; John Piacentini; Christopher Pittenger; Kerstin Plessen; Vasily Ramensky; Eliana M Ramos; Victor Reus; Margaret A Richter; Mark A Riddle; Mary M Robertson; Veit Roessner; Maria Rosário; Jack F Samuels; Paul Sandor; Dan J Stein; Fotis Tsetsos; Filip Van Nieuwerburgh; Sarah Weatherall; Jens R Wendland; Tomasz Wolanczyk; Yulia Worbe; Gwyneth Zai; Fernando S Goes; Nicole McLaughlin; Paul S Nestadt; Hans-Jorgen Grabe; Christel Depienne; Anuar Konkashbaev; Nuria Lanzagorta; Ana Valencia-Duarte; Elvira Bramon; Nancy Buccola; Wiepke Cahn; Murray Cairns; Siow A Chong; David Cohen; Benedicto Crespo-Facorro; James Crowley; Michael Davidson; Lynn DeLisi; Timothy Dinan; Gary Donohoe; Elodie Drapeau; Jubao Duan; Lieuwe Haan; David Hougaard; Sena Karachanak-Yankova; Andrey Khrunin; Janis Klovins; Vaidutis Kučinskas; Jimmy Lee Chee Keong; Svetlana Limborska; Carmel Loughland; Jouko Lönnqvist; Brion Maher; Manuel Mattheisen; Colm McDonald; Kieran C Murphy; Igor Nenadic; Jim van Os; Christos Pantelis; Michele Pato; Tracey Petryshen; Digby Quested; Panos Roussos; Alan R Sanders; Ulrich Schall; Sibylle G Schwab; Kang Sim; Hon-Cheong So; Elisabeth Stögmann; Mythily Subramaniam; Draga Toncheva; John Waddington; James Walters; Mark Weiser; Wei Cheng; Robert Cloninger; David Curtis; Pablo V Gejman; Frans Henskens; Morten Mattingsdal; Sang-Yun Oh; Rodney Scott; Bradley Webb; Gerome Breen; Claire Churchhouse; Cynthia M Bulik; Mark Daly; Martin Dichgans; Stephen V Faraone; Rita Guerreiro; Peter Holmans; Kenneth S Kendler; Bobby Koeleman; Carol A Mathews; Alkes Price; Jeremiah Scharf; Pamela Sklar; Julie Williams; Nicholas W Wood; Chris Cotsapas; Aarno Palotie; Jordan W Smoller; Patrick Sullivan; Jonathan Rosand; Aiden Corvin; Benjamin M Neale; Jonathan M Schott; Richard Anney; Josephine Elia; Maria Grigoroiu-Serbanescu; Howard J Edenberg; Robin Murray
Journal:  Science       Date:  2018-06-22       Impact factor: 47.728

10.  Single-trait and multi-trait genome-wide association analyses identify novel loci for blood pressure in African-ancestry populations.

Authors:  Jingjing Liang; Thu H Le; Digna R Velez Edwards; Bamidele O Tayo; Kyle J Gaulton; Jennifer A Smith; Yingchang Lu; Richard A Jensen; Guanjie Chen; Lisa R Yanek; Karen Schwander; Salman M Tajuddin; Tamar Sofer; Wonji Kim; James Kayima; Colin A McKenzie; Ervin Fox; Michael A Nalls; J Hunter Young; Yan V Sun; Jacqueline M Lane; Sylvia Cechova; Jie Zhou; Hua Tang; Myriam Fornage; Solomon K Musani; Heming Wang; Juyoung Lee; Adebowale Adeyemo; Albert W Dreisbach; Terrence Forrester; Pei-Lun Chu; Anne Cappola; Michele K Evans; Alanna C Morrison; Lisa W Martin; Kerri L Wiggins; Qin Hui; Wei Zhao; Rebecca D Jackson; Erin B Ware; Jessica D Faul; Alex P Reiner; Michael Bray; Joshua C Denny; Thomas H Mosley; Walter Palmas; Xiuqing Guo; George J Papanicolaou; Alan D Penman; Joseph F Polak; Kenneth Rice; Ken D Taylor; Eric Boerwinkle; Erwin P Bottinger; Kiang Liu; Neil Risch; Steven C Hunt; Charles Kooperberg; Alan B Zonderman; Cathy C Laurie; Diane M Becker; Jianwen Cai; Ruth J F Loos; Bruce M Psaty; David R Weir; Sharon L R Kardia; Donna K Arnett; Sungho Won; Todd L Edwards; Susan Redline; Richard S Cooper; D C Rao; Jerome I Rotter; Charles Rotimi; Daniel Levy; Aravinda Chakravarti; Xiaofeng Zhu; Nora Franceschini
Journal:  PLoS Genet       Date:  2017-05-12       Impact factor: 6.020

View more
  9 in total

Review 1.  Recent innovations and in-depth aspects of post-genome wide association study (Post-GWAS) to understand the genetic basis of complex phenotypes.

Authors:  Zahra Mortezaei; Mahmood Tavallaei
Journal:  Heredity (Edinb)       Date:  2021-10-23       Impact factor: 3.821

2.  Leveraging the local genetic structure for trans-ancestry association mapping.

Authors:  Jiashun Xiao; Mingxuan Cai; Xinyi Yu; Xianghong Hu; Gang Chen; Xiang Wan; Can Yang
Journal:  Am J Hum Genet       Date:  2022-06-16       Impact factor: 11.043

3.  Quantifying concordant genetic effects of de novo mutations on multiple disorders.

Authors:  Hanmin Guo; Lin Hou; Yu Shi; Sheng Chih Jin; Xue Zeng; Boyang Li; Richard P Lifton; Martina Brueckner; Hongyu Zhao; Qiongshi Lu
Journal:  Elife       Date:  2022-06-06       Impact factor: 8.713

4.  An integrated framework for local genetic correlation analysis.

Authors:  Josefin Werme; Sophie van der Sluis; Danielle Posthuma; Christiaan A de Leeuw
Journal:  Nat Genet       Date:  2022-03-14       Impact factor: 41.307

5.  SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits.

Authors:  Yiliang Zhang; Qiongshi Lu; Yixuan Ye; Kunling Huang; Wei Liu; Yuchang Wu; Xiaoyuan Zhong; Boyang Li; Zhaolong Yu; Brittany G Travers; Donna M Werling; James J Li; Hongyu Zhao
Journal:  Genome Biol       Date:  2021-09-07       Impact factor: 17.906

6.  Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics.

Authors:  Yiliang Zhang; Youshu Cheng; Wei Jiang; Yixuan Ye; Qiongshi Lu; Hongyu Zhao
Journal:  Brief Bioinform       Date:  2021-09-02       Impact factor: 11.622

7.  Genetic Correlation and Bidirectional Causal Association Between Type 2 Diabetes and Pulmonary Function.

Authors:  Jiahao Zhu; Huanling Zhao; Dingwan Chen; Lap Ah Tse; Sanjay Kinra; Yingjun Li
Journal:  Front Endocrinol (Lausanne)       Date:  2021-11-25       Impact factor: 5.555

8.  Differential analysis of RNA structure probing experiments at nucleotide resolution: uncovering regulatory functions of RNA structure.

Authors:  Bo Yu; Pan Li; Qiangfeng Cliff Zhang; Lin Hou
Journal:  Nat Commun       Date:  2022-07-22       Impact factor: 17.694

9.  Sex-specific genetic association between psychiatric disorders and cognition, behavior and brain imaging in children and adults.

Authors:  Yuanyuan Gui; Xiaocheng Zhou; Zixin Wang; Yiliang Zhang; Zhaobin Wang; Geyu Zhou; Yize Zhao; Manhua Liu; Hui Lu; Hongyu Zhao
Journal:  Transl Psychiatry       Date:  2022-08-26       Impact factor: 7.989

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.