Amanda Dobbyn1, Laura M Huckins1, James Boocock2, Laura G Sloofman3, Benjamin S Glicksberg4, Claudia Giambartolomei5, Gabriel E Hoffman6, Thanneer M Perumal7, Kiran Girdhar1, Yan Jiang8, Towfique Raj9, Douglas M Ruderfer10, Robin S Kramer11, Dalila Pinto12, Schahram Akbarian8, Panos Roussos13, Enrico Domenici14, Bernie Devlin15, Pamela Sklar16, Eli A Stahl17, Solveig K Sieberts18. 1. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 2. Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90024, USA. 3. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 4. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Next Generation Healthcare, Mount Sinai Health System, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 5. Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA. 6. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 7. Systems Biology, Sage Bionetworks, Seattle, WA 98109, USA. 8. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 9. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 10. Division of Genetic Medicine, Department of Medicine, Psychiatry and Biomedical Informatics, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37235, USA. 11. Human Brain Collection Core, National Institute of Mental Health, Bethesda, MD 20892, USA. 12. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry and Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 13. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 14. Laboratory of Neurogenomic Biomarkers, Centre for Integrative Biology (CIBIO), University of Trento, Trento, Italy; The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto (TN), Italy. 15. Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA. 16. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 17. Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. Electronic address: eli.stahl@mssm.edu. 18. Systems Biology, Sage Bionetworks, Seattle, WA 98109, USA. Electronic address: solly.sieberts@sagebase.org.
Abstract
Causal genes and variants within genome-wide association study (GWAS) loci can be identified by integrating GWAS statistics with expression quantitative trait loci (eQTL) and determining which variants underlie both GWAS and eQTL signals. Most analyses, however, consider only the marginal eQTL signal, rather than dissect this signal into multiple conditionally independent signals for each gene. Here we show that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations. Using genotypes and gene expression levels from post-mortem human brain samples (n = 467) reported by the CommonMind Consortium (CMC), we find that conditional eQTL are widespread; 63% of genes with primary eQTL also have conditional eQTL. In addition, genomic features associated with conditional eQTL are consistent with context-specific (e.g., tissue-, cell type-, or developmental time point-specific) regulation of gene expression. Integrating the 2014 Psychiatric Genomics Consortium schizophrenia (SCZ) GWAS and CMC primary and conditional eQTL data reveals 40 loci with strong evidence for co-localization (posterior probability > 0.8), including six loci with co-localization of conditional eQTL. Our co-localization analyses support previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up.
Causal genes and variants within genome-wide association study (GWAS) loci can be identified by integrating GWAS statistics with expression quantitative trait loci (eQTL) and determining which variants underlie both GWAS and eQTL signals. Most analyses, however, consider only the marginal eQTL signal, rather than dissect this signal into multiple conditionally independent signals for each gene. Here we show that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations. Using genotypes and gene expression levels from post-mortem human brain samples (n = 467) reported by the CommonMind Consortium (CMC), we find that conditional eQTL are widespread; 63% of genes with primary eQTL also have conditional eQTL. In addition, genomic features associated with conditional eQTL are consistent with context-specific (e.g., tissue-, cell type-, or developmental time point-specific) regulation of gene expression. Integrating the 2014 Psychiatric Genomics Consortium schizophrenia (SCZ) GWAS and CMC primary and conditional eQTL data reveals 40 loci with strong evidence for co-localization (posterior probability > 0.8), including six loci with co-localization of conditional eQTL. Our co-localization analyses support previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up.
Significant advances in understanding the genetic architecture of schizophrenia (MIM: 181500) have occurred within the last 10 years. However, for common variants identified in genome-wide association studies (GWASs), the success in locus identification is not yet matched by an understanding of their underlying basic mechanism or effect on pathophysiology. Expression quantitative trait loci (eQTL), which are responsible for a significant proportion of variation in gene expression, could serve as a link between the numerous non-coding genetic associations that have been identified in GWASs and susceptibility to common diseases directly through their association with gene expression regulation.1, 2, 3, 4 Accordingly, results from eQTL mapping studies have been successfully utilized to identify genes and causal variants from GWASs for various complex phenotypes, including asthma (MIM: 600807), body mass index (MIM: 601665), celiac disease (MIM: 212750), and Crohn disease (MIM: 266600).5, 6, 7, 8Studies integrating eQTL and GWAS data have almost exclusively used marginal association statistics which typically represent the primary, or most significant, eQTL signal when assessing co-localization with GWASs, ignoring other SNPs that affect expression independently of the primary eQTL for a given gene. However, recent findings indicating that conditionally independent eQTL are widespread9, 10, 11, 12 motivate examination of the extent to which considering conditional eQTL may provide additional power to identify likely causal genes in a GWAS locus. Recent reports provide evidence that conditional eQTL are less frequently shared across tissues than primary eQTL and, like tissue- and cell type-specific eQTL, are often found more distally to the genes they regulate.10, 13, 14 These lines of evidence suggest that conditionally independent eQTL may contribute to tissue-specific or other context-specific gene regulation (e.g., specific to a particular cell type, developmental stage, or stimulation condition). One mechanism by which disease risk could potentially be mediated by a conditional eQTL is the disruption of a tissue-specific enhancer by a given variant, leading to the dysregulation of the relevant eGene in only the tissue for which the enhancer is specific. For example, an eQTL affecting Parkinson disease risk through expression of SNCA was recently shown to act through the disruption of an enhancer; if this enhancer is specific to a disease-relevant cell type, such as nerve cells of the substantia nigra, then it could manifest as a conditional eQTL since it would be only partially represented in brain homogenate.Here, we leveraged genotype and dorsolateral prefrontal cortex (DLPFC) expression data provided by the CommonMind Consortium (CMC) to elucidate the role of conditional eQTL in the etiology of schizophrenia (SCZ). Currently comprising the largest existing postmortem brain genomic resource at nearly 600 samples, the CMC is generating and making publicly available an unprecedented array of functional genomic data, including gene expression (RNA sequencing), histone modification (chromatin immunoprecipitation [ChIP-seq]), and SNP genotypes, from individuals with psychiatric disorders as well as unaffected controls. We utilized SNP dosage and RNA-sequencing (RNA-seq) data from the CMC to identify primary and conditionally independent eQTL. We then characterized the resulting eQTL on various genomic attributes including distance to transcription start site and their genes’ specificities across tissues, cell types, and developmental periods. In addition, we quantified enrichment of primary and conditional eQTL in promoter and enhancer functional genomic elements inferred from epigenomic data. Finally, we isolated each independent eQTL signal by conducting a series of “all-but-one” conditional analyses for genes with multiple independent eQTL and then assessed the overlap between all eQTL association signals and the schizophrenia GWAS signals.
Material and Methods
CommonMind Consortium Data
We used pre-QC’ed genotype and expression data from the CommonMind Consortium, and detailed information on quality control, data adjustment, and normalization procedures can be found in Fromer et al. Briefly, samples were genotyped at 958,178 markers using the Illumina Infinium HumanOmniExpressExome array and markers were removed on the basis of having no alternate alleles, having a genotyping call rate ≤ 0.98, or having a Hardy-Weinberg p value < 5 × 10−5. After QC, 668 individuals genotyped at 767,368 markers were used for imputation. Phasing was performed on each chromosome using ShapeIt v2.r790, and variants were imputed in 5 Mb segments with Impute v2.3.1 using the 1000 Genomes Phase 1 integrated reference panel, excluding singleton variants. After phasing and imputation, then filtering out variants with INFO < 0.8 or MAF < 0.05, the number of markers included in the analysis totaled approximately 6.4 million. Gene expression was assayed via RNA-seq using 100 base pair paired end reads and was mapped to human Ensembl gene reference (v.70) using TopHat v.2.0.9 and Bowtie v.2.1.0. After discarding genes with less than 1 CPM (counts per million) in at least 50% of the samples, RNA-seq data for a total of 16,423 Ensembl genes was considered for analysis. The expression data was voom-adjusted for both known covariates (RIN, library batch, institution, diagnosis, post-mortem interval, and sex) and 20 surrogate variables identified via surrogate variable analysis (SVA). After the removal of samples that did not pass RNA sample QC (including but not limited to: having RIN < 5.5, having less than 50 million total reads or more than 5% of reads aligning to rRNA, having any discordance between genotyping and RNA-seq data, and having RNA outlier status or evidence for contamination) and retaining only genetically identified European-ancestry individuals, a total of 467 samples was used for downstream analyses. These 467 individuals comprised 209 SCZ-affected case subjects, 52 AFF (bipolar, major depressive disorder, or mood disorder, unspecified)-affected case subjects, and 206 control subjects.
eQTL Identification
An overview of our workflow can be found in Figure S1. First, to identify primary and conditional cis-eQTL, we a conducted forward stepwise conditional analysis implemented in MatrixEQTL using genotype data at 6.4 million markers and RNA-seq data for 16,423 genes. FDR was initially assessed using the Benjamini-Hochberg algorithm across all cis-eQTL tests within each chromosome. FDR was not re-assessed at each conditional step; instead, a fixed p value threshold was used as the inclusion criteria in the stepwise model selection. For each gene with at least one cis-eQTL (gene ± 1 Mb) association at a 5% false discovery rate (FDR), the most significant SNP was added as a covariate in order to identify additional independent associations (considered significant if the p value achieved was less than that corresponding to the initial 5% FDR for primary eQTL). This procedure was repeated iteratively until no further eQTL met the p value threshold criteria. We used a linear regression model, adjusting for diagnosis and five ancestry covariates inferred by GemTools. Following eQTL identification, only autosomal eQTL were retained for downstream analyses.
Replication in Independent Datasets
Replication was performed in the HBCC microarray cohort (dbGaP: phs000979, see Web Resources) and in the ROSMAP RNA-seq cohort by fitting the stepwise regression models identified in the CMC data. For cases in which a marker was unavailable in the replication cohort, all models including that marker (i.e., for that eQTL and higher-order eQTL conditional on it, for a given gene) were omitted from replication.Data from the HBCC cohort was QC’ed and normalized as described in Fromer et al. DLPFC tissue was profiled on the Illumina HumanHT-12_V4 BeadChip and normalized in an analogous manner to the CMC data. Genotypes were obtained using the HumanHap650Yv3 or Human1MDuov3 chips and imputed using the 1000 Genomes Phase 1 reference panel. Replication of the eQTL models was performed on 279 genetically inferred European-ancestry samples (76 control subjects, 72 SCZ-affected subjects, 43 BP-affected subjects, 88 MDD-affected subjects), adjusting for diagnosis and five ancestry components.ROSMAP data were obtained from the AMP-AD Knowledge Portal (see Web Resources). Quantile normalized FPKM expression values were adjusted for age of death, RIN, PMI, and 31 hidden confounders from SVA, conditional on diagnosis. Only genes with FPKM > 0 in more than 50 samples were retained. QC’ed genotypes were also obtained from the AMP-AD Knowledge Portal and imputed to the Haplotype Reference Consortium (v.1.1) reference panel via the Michigan Imputation Server. Only markers with imputation quality score R2 ≥ 0.7 were considered in the replication analysis. GemTools was used to infer ancestry components as was done for the CMC data above. After QC, 494 samples were used for eQTL replication in a linear regression model that also adjusted for diagnosis (Alzheimer disease, mild cognitive impairment, no cognitive impairment, and other) and four ancestry components.
Modeling Number of eQTL per Gene on Genomic Features
We considered three genomic features (gene length, number of LD blocks in the cis-region, and genic constraint score) for our modeling analyses. Gene lengths were calculated using Ensembl gene locations. We obtained LD blocks from the LDetect Bitbucket site to tally the number of blocks overlapping each gene’s cis-region (gene ± 1 Mb). We obtained loss-of-function-based genic constraint scores from the Exome Aggregation Consortium (ExAC). A negative binomial generalized linear regression model was used to model the number of eQTL per gene based on the above variables; results were qualitatively the same using linear regression of Box-Cox transformed eQTL numbers. Backward-forward stepwise regression using the full model with interaction terms for these three variables was used to determine the relationship between genomic attributes and eQTL number. These analyses were implemented in R. cis-heritability of gene expression was estimated using the same CMC data that were used for eQTL detection, including all markers in the cis-region and implemented in GCTA. SNP-heritability estimates were then added to the modeling procedure described above.Tissue, cell type, and developmental time point specificity were measured using the expression specificity metric Tau.26, 27 Tissue specificity for each gene was calculated using publicly available expression data for 53 tissues from the GTEx project (release V6p). Expression for each tissue was summarized as the log2 of the median expression plus one, and then used to calculate tissue specificity Tau. Cell type specificity for each gene was computed using publicly available single-cell RNA-sequencing expression data generated from human cortex and hippocampus tissues. Raw expression counts for 285 cells comprising six major cell types of the brain were obtained from GEO (GSE67835) and counts data were library normalized to CPM. Expression for each cell type was summarized as the log2 of the mean expression plus one, and then used to compute cell type specificity Tau. Developmental time point specificity for each gene was calculated using publicly available DLPFC expression data for 27 time points, clustered into eight biologically relevant groups, from the BrainSpan atlas (see Web Resources). Eight developmental periods were defined as follows: early prenatal (8–12 pcw), early mid-prenatal (13–17 pcw), late mid-prenatal (19–24 pcw), late prenatal (25–37 pcw), infancy (4 months–1 year), childhood (2–11 years), adolescence (13–19 years), and adulthood (21+ years). Expression for each time point was summarized as the log2 of the median expression plus one and then used to calculate developmental period specificity Tau. Each Tau was added to the above model for eQTL number individually, as well as all together.
Enrichment Analyses
We divided eQTL into separate subgroups by stepwise conditional order (first, second, and greater than second) and created sets of matched SNPs drawn from the SNPsnap database for each subgroup, matching on minor allele frequency, gene density (number of genes within 1 Mb of the SNP), distance from SNP to TSS of the nearest gene, and LD (number of LD-partners within r2 ≥ 0.8). For each subgroup of eQTL, we performed a logistic regression of status as eQTL or matched SNP on overlap with functional annotation, including the four SNP matching parameters as covariates. Enrichment was taken as the regression coefficient estimate, interpretable as the log-odds ratio for being an eQTL given a functional annotation. Functional annotations tested included: brain promoters and enhancers (union of all brain region TssA and Enh+EnhG intervals, respectively, from the NIH Roadmap Epigenomics Project ChromHMM core 15-state model), brain-specific promoters and enhancers (the union of all brain region TssA and Enh+EnhG intervals, excluding those present in seven other non-brain tissues/cell types: primary T helper cells from peripheral blood, osteoblast primary cells, HUES64 cells, adipose nuclei, liver, NHLF lung fibroblast primary cells, and NHEK-epidermal keratinocyte primary cells), and pre-frontal cortex (PFC) neuronal (NeuN+) and non-neuronal (NeuN−) nucleus H3K4me3 and H3K27ac ChIP-seq marks from the CMC. For each data source, active promoter and enhancer (or H3K4me3 and H3K27ac) annotations were tested for enrichment jointly. This analysis was repeated but restricting to matched SNPs located within 1 Mb of any of the 16,423 genes that were tested for eQTL, in order to determine whether the enrichment estimates were inflated due to the proximity of our primary and conditional eQTL to brain-expressed genes, which may be more likely to occur near active regulatory regions in the brain. In addition, to ensure that any enrichment patterns observed were not due to varying effect size among primary and conditional eQTL, the enrichment analyses were also carried out taking into account the variance in expression explained by each eQTL. Variance explained (R2) was estimated using the variancePartition R package, and eQTL were stratified into three R2 bins: bin 1, 1 × 10−2 ≤ R2 ≤ 1.75 × 10−2; bin 2, 1.75 × 10−2 ≤ R2 ≤ 2.25 × 10−2; and bin 3, 2.25 × 10−2 ≤ R2 ≤ 3 × 10−2. Logistic regression of status as eQTL or matched SNP was then carried out separately for each R2 bin, within each eQTL order.
Conditional eQTL Analyses
In order to isolate each conditionally independent cis-eQTL association, we carried out a series of “all-but-one” conditional analyses, implemented within MatrixEQTL, for each gene possessing more than one independent eQTL. As these conditional eQTL signals were to be used to test for co-localization with the SCZ GWAS signals, we limited these analyses to those genes (346 in total) with eQTL overlapping GWAS loci. For each of these genes, we conducted an all-but-one analysis for each independent eQTL by regressing the given gene’s expression data on the dosage data, including all of the other independent eQTL for that gene as covariates in addition to diagnosis and five ancestry components. For example, three conditional analyses would be conducted for a gene with three independent eQTL: one analysis conditioning on the secondary and tertiary eQTL, one analysis conditioning on the primary and tertiary, and one analysis conditioning on the primary and secondary. In this manner we generated summary statistics for each independent eQTL in isolation, conditional on all of the other independent eQTL for that gene.
Co-localization Analyses
For our co-localization analyses, we used summary statistics and genomic intervals from the 2014 Psychiatric Genomics Consortium (PGC) SCZ GWAS. We included 217 loci at a p value threshold of 1 × 10−6 (excluding the MHC locus), defined these loci by their LD r2 ≥ 0.6 with the lead SNP, and then merged overlapping loci. GWAS and eQTL signatures were qualitatively compared using p value-p value (P-P) plots, rendered in R, and LocusZoom plots.Multiple methods that aim to identify GWAS-eQTL co-localized loci are currently available.37, 38, 39, 40, 41, 42 We chose to further develop coloc for our co-localization analyses for several reasons: (1) it uses data from all SNPs within a locus; (2) it avoids the computational burden or approximate results of Bayesian inferential methods for causal variants,41, 42 which rely on reference panel estimates of linkage disequilibrium (LD); and (3) and it has been widely used43, 44, 45 including in direct comparisons of GWAS-eQTL co-localization methods.42, 46 We tested for co-localization using an updated version of coloc R functions, which we name coloc2 (see Web Resources), and incorporated several improvements to the method. First, coloc2 pre-processes data by aligning eQTL and GWAS summary statistics for each eQTL cis-region. Second, the coloc2 model optionally incorporates changes implemented in gwas-pw. Briefly, we implemented likelihood estimation of mixture proportions of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized) from genome-wide data. Coloc2 uses these proportions as priors (or optionally, coloc default or user-specified priors) in the empirical Bayesian calculation of the posterior probability of co-localization for each locus (eQTL cis-region). Coloc2 averages per-SNP Wakefield asymptotic Bayes factors (WABF) across three different values for the WABF prior variance term, 0.01, 0.1, and 0.5, and provides options for specifying phenotypic variance, estimating it from case-control proportions or estimating it from the data.
Results
Identification of eQTL
Primary and conditional eQTL were identified using genotype and RNA-seq data from the CommonMind Consortium post-mortem DLPFC samples (467 European-ancestry case and control subjects). We identified 12,813 primary and 16,082 conditional eQTL, totaling 28,895 independent eQTL. Of the genes tested, 81% (12,813 of 15,817 autosomal genes) had at least one eQTL and 63% of these (51% of all genes) also had at least one conditional eQTL, with an average of 1.83 independent eQTL per gene (2.26 among those with at least one eQTL) (Figure 1A). Conversely, when examining the distributions for the number of genes whose expression was affected by each eQTL (Table S1), the majority of eQTL were specific for a single gene, and only a small fraction of eQTL, 1.47%, affected more than one gene, with a maximum of six genes affected by a single eQTL.
Figure 1
Characterization of Conditional eQTL
(A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis).
(B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau).
(C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL.
Characterization of Conditional eQTL(A) Counts of the numbers of genes (y axis) regulated by at least N (1 ≤ N ≤ 16) independent eQTL (x axis).(B) Median Tau value (y axis) for genes with N independent eQTL (x axis), colored by Tau type (cell type, developmental time point, or tissue type Tau).(C) Density plot representing the distance from eSNP to eGene transcription start site (TSS), colored by eQTL order. Dashed lines represent the median distance to TSS for each order of eQTL.We tested for replication of conditional eQTL in two independent datasets, the National Institute of Mental Health’s Human Brain Collection Core (HBCC, n = 279, microarray expression data) and the Religious Orders Study/Memory and Aging Project (ROSMAP, n = 494, RNA-seq expression). For each gene the same models were evaluated that were identified in forward-stepwise conditional analysis in the CMC data. We observed significant evidence of replication for both primary and conditional eQTL in the HBCC and ROSMAP post-mortem brain cohorts (Table S2). The estimated proportion of true associations (π) in ROSMAP was 0.57 and 0.26 for primary and conditional eQTL, respectively; in HBCC π was 0.46 and 0.20 for primary and conditional eQTL. Therefore, replication was stronger for primary than for conditional eQTL, as expected given their stronger effect sizes. Replication rates were somewhat higher in the RNA-seq ROSMAP data than in HBCC.
Genomic Characterization of Primary and Conditional eQTL
The features for which primary and conditional eQTL and their respective eGenes displayed identifiable differences included distance from eQTL to its gene’s transcription start site (TSS), gene length, LD blocks per genic cis-region, genic constraint score, and genic cis-SNP-heritability. According to prior results, eQTL that are shared across tissues and cell types tend to be located closer to transcription start sites than context-specific eQTL;13, 14 we therefore first examined the relationship between primary or conditional eQTL status and distance to genic TSS. Primary eQTL fall closer to the TSS than conditional eQTL (Figure 1C): primary eQTL occur at a median distance of 70.4 kb from the TSS versus a median distance of 302 kb for conditional eQTL. This difference holds true even more proximally to the TSS (Figure S2); 8.1% and 2.5% of primary and conditional eQTL, respectively, fall within 3 kb of the TSS. We next characterized the relationship between the number of independent eQTL per gene and three different genomic features: gene length, number of LD blocks in the gene’s cis-region (±1 Mb), and Exome Aggregation Consortium (ExAC) genic constraint score, including possible interactions. The best multivariate model for eQTL number included gene length, number of LD blocks, and genic constraint as predictors, as well as a gene length-LD blocks interaction (Table 1). The number of independent eQTL was positively correlated with gene length and number of LD blocks and negatively correlated with genic constraint score (Figure S3). We then examined the variance of gene expression explained by cis-region SNPs, or cis-SNP-heritability, estimated by linear mixed model variance component analysis (Figure S4). We found a strong effect of estimated cis-heritability on number of independent eQTL (Table 1, Figure S5). In a joint model with cis-SNP-heritability, the main effects of gene length, number of LD blocks, and genic constraint on eQTL number remained at least nominally significant.
Table 1
Number of eQTL per Gene Modeled on Genomic Features
Predictor
Model 1 Estimate
Model 1 Robust SE
Model 1 Pr(> |z|)
Model 2 Estimate
Model 2 Robust SE
Model 2 Pr(> |z|)
Model 3 Estimate
Model 3 Robust SE
Model 3 Pr(> |z|)
log(Gene length)
0.27
0.04
5.16E−12
0.16
0.03
2.20E−06
0.17
0.03
9.87E−07
LD blocks
0.59
0.17
6.47E−04
0.33
0.15
2.92E−02
0.37
0.15
1.55E−02
log(Gene length): LD blocks
−0.03
0.02
7.77E−02
−0.01
0.01
5.65E−01
−0.01
0.01
4.11E−01
Constraint
−0.61
0.03
5.93E−85
−0.20
0.03
2.93E−13
−0.15
0.03
5.41E−08
cis-heritability
–
–
–
7.03
0.18
0.00
7.02
0.18
0.00
Tau (tissue)
–
–
–
–
–
–
0.08
0.08
2.76E−01
Tau (DLPFC cell type)
–
–
–
–
–
–
0.20
0.09
3.69E−02
Tau (developmental time point)
–
–
–
–
–
–
0.17
0.09
5.99E−02
Number of eQTL per Gene Modeled on Genomic FeaturesWe then addressed whether genes with conditional eQTL exhibit greater context specificity as measured by the robust expression specificity metric Tau.26, 27 We calculated Tau across 53 tissues from the Genotype-Tissue Expression (GTEx) project, across 6 DLPFC cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and oligodendrocyte progenitor cells) from single-cell RNA-seq, and across 8 developmental periods (early prenatal, early mid-prenatal, late mid-prenatal, late prenatal, infant, child, adolescent, and adult) from the BrainSpan atlas DLPFC RNA-seq data. We confirmed that higher values of Tau reflect expression specificity by comparing the distributions of all three Tau measures for all genes with the distributions for a subset of housekeeping genes (Figure S6). We found positive correlations between eQTL number and tissue, cell type, and developmental time point specificities (Figure 1B, Table 1, Table S3, Figure S7). In a joint model, the strongest correlation was with DLPFC cell type Tau, which is consistent with previous data demonstrating tissue-specific, cell type-dependent expression in blood; however, we note that all three Tau sets were inter-correlated (Table S3).
Epigenetic Enrichment Analyses
One way in which eQTL may affect gene expression is through alteration of cis-regulatory elements such as promoters and enhancers. Putative causal eSNPs have been shown to be enriched in genomic regions containing functional annotations such as DNase hypersensitive sites, transcription factor binding sites, promoters, and enhancers.51, 52, 53, 54 Our observation that conditional eQTL fall farther from transcription start sites than primary eQTL led us to hypothesize that primary eQTL may affect transcription levels by altering functional sites in promoters whereas conditional eQTL may do so by altering more distal regulatory elements such as enhancers. We therefore assessed enrichment of primary and conditional eQTL in brain active promoter (TssA) and enhancer (merged Enh and EnhG) states derived from the NIH Roadmap Epigenomics Project,32, 33 and in H3K4me3 and H3K27ac neuronal (NeuN+) and non-neuronal (NeuN−) ChIP-seq peaks from a subset of the CMC post-mortem DLPFC samples. The overlap of H3K4me3 and H3K27ac ChIP-seq peaks was used as a proxy for active promoters, and H3K27ac peaks that do not overlap H3K4me3 peaks were used as a (relatively non-specific) proxy for enhancers. We performed logistic regression of SNP status (eQTL versus random matched SNP) on overlap with functional annotations, separately for each eQTL order (primary, secondary, and greater than secondary).Primary and conditional eQTL were significantly enriched in both promoter and enhancer chromatin states from REMC brain and CMC DLPFC tissues, with greatest enrichments overall observed in PFC neuronal (NeuN+) promoters and enhancers (Figure 2, Table S4). We found that whereas active promoter enrichments in all tissue/cell types markedly decreased with higher conditional order of eQTL, enhancer enrichments either only slightly decreased (REMC brain and PFC NeuN+, Figures 2A and 2C) or remained level (REMC brain-specific, Figure 2B). Though there was also significant enrichment of eQTL in non-neuronal nuclei (NeuN−) promoters and enhancers, this trend of a marked decrease in active promoters but steady levels of enhancer enrichment with greater eQTL order was not observed for non-neuronal PFC nuclei (Figure 2D). This greater decrease in enrichment for promoters compared to enhancers with increasing eQTL order was not confounded by an excess of eQTL near brain-expressed genes in comparison to matched SNPs (Figure S8, Table S5) and furthermore was not an artifact of varying effect size with eQTL order; the same overall pattern was observed when stratifying eQTL by variance in expression explained (R2) and comparing enrichment across eQTL order, within each R2 bin (Figures S9–S12, Table S6).
Figure 2
Enrichments of Primary and Conditional eQTL in Active Regulatory Annotations
Plotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations.
(A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project.
(C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange).
(D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
Enrichments of Primary and Conditional eQTL in Active Regulatory AnnotationsPlotted are enrichments (regression coefficient estimate ± 95% CI from logistic regression, y axes) of primary (x axis eQTL order = 1) and conditional (eQTL order = 2, ≥ 3) eQTL in functional annotations.(A and B) Enrichment in brain (union of all individual brain regions) and brain-specific (present in brain but not in seven other non-brain tissues) active promoter (green) and enhancer (orange) ChromHMM states from the NIH Roadmap Epigenomics Project.(C) Enrichment in neuronal nuclei (NeuN+) for active promoters (intersection of DLPFC H3K4me3 and H3K27ac ChIP-seq peaks, green) and enhancers (H3K27 peaks that do not overlap H3K4me3 peaks, orange).(D) Enrichments in the same annotations, but for DLPFC non-neuronal nuclei (NeuN−).
eQTL Co-localization with SCZ GWAS
We performed co-localization analyses in order to evaluate the extent of overlap between eQTL and GWAS signatures in schizophrenia and to identify putative causal genes from GWAS associations. Considering 217 loci (Table S7) with lead SNPs reaching a significance threshold of p < 1 × 10−6 from the 2014 Psychiatric Genomics Consortium (PGC) schizophrenia GWAS, we tabulated the number of primary and conditional eQTL falling within GWAS loci. A total of 114 out of 217 loci contained primary and/or conditional eQTL for 346 genes; 110 of these genes had one eQTL only and 236 genes had more than one independent eQTL.To quantitatively compare the SCZ GWAS and eQTL association signatures, we modified the R package coloc for Bayesian inference of co-localization between the two sets of summary statistics across each gene’s cis-region. Coloc2, our modified implementation of coloc, analyzes the hierarchical model of gwas-pw, with likelihood-based estimation of dataset-wide probabilities of five hypotheses (H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both but not co-localized; and H4, both and co-localized). We then used these probabilities as priors to calculate empirical Bayesian posterior probabilities for the five hypotheses for each locus, in particular PPH4 for co-localization.For genes with conditional eQTL overlapping SCZ GWAS loci, summary statistics from all-but-one conditional eQTL analyses were assessed for co-localization with the GWAS signature (Figure 3). To illustrate this analytical strategy, we show eQTL results for the iron responsive element binding protein 2 gene IREB2 (MIM: 147582, chr15:78729773–78793798) as an example (Figure 4). Forward stepwise selection analysis identified two independent cis-eQTL for IREB2. In order to generate summary statistics for each eQTL in isolation, we conducted two all-but-one conditional analyses, in each analysis conditioning on all but a focal independent eQTL (for IREB2 this entailed conditioning on only one eQTL per conditional analysis, but involved conditioning on up to six eQTL per gene across all genes considered in the SCZ co-localization analysis). We then tested for co-localization between the GWAS and all of the eQTL summary statistics resulting from the above conditioning analysis using coloc2 (Table S12). In the case of IREB2, the conditional eQTL (rs7171869) was implicated as co-localized with the GWAS signal at this locus with a posterior probability for co-localization (PPH4) of 0.94. A qualitative examination of the IREB2 locus supported the coloc2 results: the correlation between the GWAS p values and conditional eQTL p values was higher than that between the GWAS and primary eQTL p values (Figure 4A). In addition, the GWAS signature for the locus more closely resembled the conditional eQTL signature than either the non-conditional eQTL signature or the primary eQTL signature (Figure 4B).
Figure 3
All-but-One Conditional Analysis to Isolate Independent eQTL Signatures
(A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL.
(B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures.
Figure 4
GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature
(A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom).
(B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled).
All-but-One Conditional Analysis to Isolate Independent eQTL Signatures(A) Hypothetical GWAS signature (top, green) at a given locus and an overlapping hypothetical eQTL signature (bottom, purple), which comprises two independent eQTL.(B) Same hypothetical GWAS and eQTL signatures after the all-but-one conditional eQTL analysis isolating the primary (red) and secondary (blue) eQTL signatures. Before conditional analysis there is a lack of co-localization between the GWAS signature and eQTL signature. After all-but-one conditional analysis, there is evidence for co-localization between the conditional (secondary) eQTL and GWAS signatures.GWAS Signature for IREB2 Co-localizes with the Conditional eQTL Signature(A) P-P plots comparing −log10 p values from GWAS (y axes) and all-but-one conditional eQTL analysis (x axes), which show the highest correlation to be between the GWAS and the conditional eQTL rs7171869 (blue, bottom).(B) LocusZoom plots for the IREB2 locus, where the GWAS signal (top) more closely resembles the conditional eQTL signal (rs7171869, bottom) than the primary eQTL signal (rs11639224, third from top) or non-conditional eQTL signal (second from top). For all LocusZoom plots, LD is colored with respect to the GWAS lead SNP (rs8042374, labeled).We found that 40 loci contained genes with strong evidence of co-localization between eQTL and GWAS signatures, with posterior probability of H4 (PPH4) ≥ 0.8 (Table 2). When restricting to genome-wide significance for the GWAS, we found co-localization in 24 of the 108 loci. Given the correlations between number of independent eQTL and expression specificity scores (Tau) across tissues, cell types, and development, we tabulated the reported genes’ Tau percentiles and expression levels, to highlight contexts in which the genes are specifically expressed (Table 2, Table S8). We acknowledge that while posterior probability PPH4 ≥ 0.8 demonstrates strong Bayesian evidence for co-localization, it is an arbitrary threshold for characterizing loci as GWAS-eQTL co-localized; we find that many loci with PPH4 ≥ 0.5 appear qualitatively consistent with co-localization.
Table 2
GWAS-eQTL Co-localized Loci
Chr
GWAS Locus Start
GWAS Locus End
GWAS Lead SNP
GWAS p Value
eSNP
eSNP p Value
Primary/Conditional
PPH4
Gene
Relevant Tissue/Cell Type/Developmental Period
1
2372401
2402501
rs4648845
4.03E−09
rs12037821
4.9E−04
conditional
0.87
SLC35E2
–/–/early mid-prenatal
1
8355697
8638984
rs301797
2.03E−09
rs138050288
1.8E−04
primary
0.95
RERE
–/–/–
1
30412551
30443951
rs1498232
1.28E−09
rs2015244
1.8E−08
primary
0.99
PTPRU
–/neurons /early mid-prenatal
1
163582923
163766623
rs7521492
5.64E−07
rs10799961
3.18E−11
primary
0.91
PBX1
–/–/early prenatal
1
205015255
205189455
rs16937
8.69E−07
rs12724651
7.31E−07
primary
0.89
TMEM81
–/neurons/–
rs12031350
8.15E−06
conditional
0.87
RBBP5
–/–/–
1
214137889
214163689
rs7529073
9.69E−07
rs1431983
1.67E−04
conditional
0.93
PROX1-AS1
cerebellar hemisphere/neurons/adult
2
73194203
73900439
rs56145559
8.42E−08
rs11679809
1.85E−34
primary
0.86
ALMS1P
testis/–/–
2
110262036
110398236
rs9330316
7.69E−08
rs892464
2.35E−26
primary
0.92
SEPT10
–/–/late prenatal
2
198148577
198835577
rs6434928
1.48E−11
rs12621129
6.06E−12
primary
0.94
SF3B1
–/–/–
2
200715237
201247789
rs281768
1.78E−14
rs35220450
3.46E−14
primary
0.95
FTCDNL1, AC073043.2
–/–/adult
rs186546506
8.77E−04
conditional
0.83
LINC01792, AC007163.3
putamen (basal ganglia)/ –/adult
2
208371631
208531731
rs2709410
5.75E−07
rs34171849
5.86E−17
primary
0.88
METTL21A
–/–/–
rs2551656
2.85E−09
primary
0.86
CREB1
–/–/early prenatal
2
220033801
220071601
rs6707588
9.51E−07
rs13404754
1.08E−09
primary
0.92
CNPPD1
–/–/–
3
36843183
36945783
rs75968099
3.39E−12
rs9834970
1.88E−05
primary
0.94
DCLK3
nerve - tibial /neurons/infant
3
52281078
53539269
rs2535627
3.96E−11
rs6801235
2.81E−08
conditional
0.86
PPM1M
–/neurons/late prenatal
3
63792650
64004050
rs832187
2.58E−08
rs113386200
1.95E−12
primary
0.98
THOC7
–/–/–
3
135807405
136615405
rs7432375
5.27E−11
rs10935184
7.71E−25
primary
0.93
PCCB
–/–/–
4
170357552
170646052
rs10520163
1.02E−08
rs7438
1.02E−09
primary
0.97
CLCN3
–/–/–
5
45291475
46404116
rs1501357
1.24E−08
rs9292918
4.45E−05
primary
0.94
BRCAT54, RP11-53O19.1
–/–/adult
6
83779798
84407274
rs3798869
8.57E−10
rs2016358
1.19E−09
primary
0.90
SNAP91
cerebellar hemisphere/–/–
6
108875527
109019327
rs9398171
3.37E−08
rs111727905
3.84E−06
primary
0.97
ZNF259P1
–/–/early mid-prenatal
7
21485312
21545712
rs73060317
6.60E−07
rs141984481
3.59E−05
primary
0.92
SP4
–/–/early prenatal
8
8088038
10056127
rs2945232
2.03E−08
rs2980441
7.68E−69
primary
0.82
FAM86B3P
–/–/adolescent
8
26181524
26279124
rs1042992
2.27E−07
rs17055186
3.06E−24
conditional
0.91
SDAD1P1
testis/–/adult
8
38020424
38310924
rs57709857
2.32E−07
rs201999919
1.70E−07
primary
0.88
WHSC1L1
–/–/early prenatal
8
144822546
144871746
rs11784536
1.83E−07
rs12541792
6.45E−35
primary
0.90
FAM83H
esophagus - mucosa/oligodendrocytes/adolescent
9
26839508
26909408
rs10967586
4.75E−07
rs12345197
3.90E−06
primary
0.80
IFT74
–/–/–
11
46340213
46751213
rs7951870
1.97E−11
rs16938506
5.08E−05
primary
0.88
MDK
–/–/early mid-prenatal
12
57428314
57497814
rs324017
2.13E−07
rs4559
2.02E−05
conditional
0.91
STAT6
–/microglia/adolescent
14
35421614
35847614
rs77477310
1.52E−07
rs1028449
8.09E−04
primary
0.84
RP11-85K15.2
–/–/–
15
78803032
78926732
rs8042374
1.87E−12
rs7171869
1.44E−04
conditional
0.94
IREB2
–/–/early prenatal
15
84661161
85153461
rs950169
7.62E−11
rs35677834
1.54E−34
primary
0.80
LOC101929479, RP11-561C5.3
ovary/–/early mid-prenatal
15
91416560
91436560
rs4702
2.30E−12
rs4702
4.49E−13
primary
1.00
FURIN
–/endothelial cells/adolescent
16
4447751
4596451
rs6500602
2.79E−07
rs3747580
4.75E−16
primary
0.90
CORO7
–/–/–
rs8046295
2.68E−11
primary
0.89
NMRAL1
–/–/–
16
29924377
30144877
rs12691307
1.30E−10
rs4788203
1.95E−05
primary
0.88
TMEM219
–/–/–
rs3935873
7.46E−14
primary
0.87
INO80E
–/neurons/–
rs4787491
1.60E−04
conditional
0.82
DOC2A
brain - cortex/neurons/adolescent
16
58669293
58691393
rs12325245
1.15E−08
rs11647976
4.83E−04
primary
0.94
CNOT1
–/–/–
17
17722402
18030202
rs8082590
6.84E−09
rs4072739
4.74E−13
primary
0.92
DRG2
–/–/–
19
11839736
11859736
rs72986630
4.64E−08
rs72986630
2.20E−14
primary
1.00
ZNF823
–/endothelial cells/early prenatal
19
19374022
19658022
rs2905426
6.92E−09
rs2965199
9.22E−36
primary
0.87
GATAD2A
–/–/–
19
50067499
50135399
rs56873913
2.19E−07
rs5023763
9.32E−05
primary
0.93
SNRNP70
–/–/–
22
41408556
42689414
rs9607782
6.76E−12
rs200447424
1.87E−04
primary
0.96
RANGAP1
–/–/–
GWAS-eQTL Co-localized LociImportantly, for 6 of the 40 co-localizing loci, a conditional rather than primary eQTL co-localized with the GWAS with compelling qualitative support (Table 2, Figure 4, Table S11, Figures S13–S17). The genes showing strong evidence for conditional eQTL co-localization include SLC35E2, PROX1-AS1 (MIM: 601546), PPM1M (MIM: 608979), SDAD1P1, STAT6 (MIM: 601512), and IREB2. Also notable are the occurrences of complex patterns of co-localization for some loci; for example, three loci showed evidence for co-localization with a primary eQTL for one gene and a conditional eQTL for another.
Comparison with Previous Co-localization Analyses
In the prior CMC study, a GWAS-eQTL co-localization analysis implemented in Sherlock and using non-conditional eQTL summary statistics reported a total of 18 co-localized loci, representing 17% of the 108 genome-wide significant loci examined. Through our all-but-one conditional co-localization analysis, we replicate the majority of their findings and detect an additional 13 instances of co-localization, bringing the total number of co-localizations when considering only the genome-wide significant (and not including the MHC) loci up to 24 (representing 22% of these 108 loci) (Table S9). These 13 comprise instances of conditional eQTL co-localization (for genes SLC35E2 and IREB2) and improved detection of primary eQTL co-localization due to isolation of independent eQTL signatures and our choice of co-localization software (coloc2). Of the six co-localized loci identified in the previous but not current analysis, three resulted from differences in study design such as GWAS locus definition and eQTL overlap criteria, and two were suggestive in the current analysis (0.65 < PPH4 < 0.8). The one remaining discrepant locus (chr8:143302933–143403527) was found to co-localize with TSNARE1 eQTL previously (Sherlock p = 8.24 × 10−7) but not here (coloc2 primary eQTL PPH4 = 0.074, PPH3 = 0.93). A qualitative comparison of the eQTL and GWAS data (Figure S18) did not appear to support co-localization; while the strongest GWAS association and the strongest eQTL are in close physical proximity, the LD between the two index SNPs is low (r2∼0.2–0.4). Additionally, our attempts to disentangle independent eQTL signal via conditional analysis do not reveal the GWAS index SNP to be in high LD with any of the conditionally independent eQTL peaks.We also compared our conditional co-localization results with those from non-conditional eQTL analysis, using coloc2 and the same SCZ GWAS loci (Table S10). Conditional and non-conditional coloc2 results were highly concordant, with slightly higher PPH4s resulting from the same WABFs due to a higher prior probability of co-localization estimated in the non-conditional coloc2 analysis. Thirty-five loci were co-localized in both analyses; five loci that were co-localized in the non-conditional analysis only were highly suggestive in the conditional analysis (0.65 < PPH4 < 0.8), and the five loci that were co-localized only in the conditional coloc2 analysis involved conditional eQTL, emphasizing the utility of the conditional analysis. This conditional eQTL co-localization represents a substantial proportion (∼15%) of all instances of co-localization, and furthermore could reflect context-specific differential expression that has the potential to implicate cell types, tissue types, and developmental stages that are relevant to disease etiology.
Discussion
We utilized genotype and expression data from 467 human post-mortem brain samples from the DLPFC to conduct eQTL mapping analyses, to characterize both primary and conditional eQTL. We then identified co-localization between SCZ GWAS and eQTL association signals, comprising both primary and conditional eQTL. Our principal findings include four major observations. First, we detect that conditional eQTL are widespread in the brain tissue samples we investigated. In 63% of genes with at least one eQTL, we found multiple statistically independent eQTL (representing 8,136 genes). In addition, conditional eQTL make substantial contributions to regulatory genetic variation, as there is a strong association between eQTL number and gene expression cis-SNP-heritability. This demonstrates that genetic variation affecting RNA abundance is incompletely characterized by focusing on only one primary eQTL per gene, which is the case currently for most eQTL studies.Second, we find the genomics of conditional eQTL and their genes are consistent with complex, context-specific regulation of gene expression, which may be conferred through overlap with distal regulatory elements. Genes with more independent eQTL tend to be larger and span multiple recombination hotspot intervals, and tend to be less constrained at the protein level. While these associations may reflect in part greater power to detect independent eQTL that are not in linkage disequilibrium and explain more phenotypic variance, they are also consistent with more complex regulation and greater potential for regulatory genetic variation. Context-specific genetic regulation of expression could manifest as conditional eQTL signal in the analysis of expression from a heterogeneous source. For example, eQTL in naive and stimulated (LPS, IFN) monocytes may occur as either primary or conditional eQTL in our CMC data, due to related microglial cells being present in brain tissue homogenate. We found that 60 stimulation-specific eQTL (FDR < 0.01 in interferon or lipopolysaccharide stimulated monocytes, but FDR ≥ 0.05 in naive monocytes) were also conditional eQTL in DLPFC. Notably, rs7171787, a conditional (tertiary) eQTL in our DLPFC analysis, is a stimulation-specific monocyte eQTL for the neurodevelopmental56, 57, 58 gene CYFIP1. In our data, associations with specificity of expression across tissues, developmental periods, and cell types determined from single-cell RNA-sequencing data suggest that context specificity plays a role in the occurrence of multiple statistically independent eQTL. Cell type specificity is particularly strongly correlated with eQTL number, consistent with those cell types being present in the current tissue homogenate data. Since previous studies have shown the importance of developmental59, 60, 61, 62 or cell-specific contributions61, 63, 64, 65, 66 to schizophrenia, interrogation of independent eQTL effects may elucidate developmental or tissue-specific effects obscured in whole-tissue eQTL studies.This context specificity of expression regulation is potentially mediated through overlap of eSNPs with distal regulatory elements, such as enhancers. Conditional eQTL occur farther from transcription start sites than primary eQTL, consistent with effects on enhancers. In addition, while both primary and conditional eQTL are enriched in both active promoter and enhancer regions, their enrichment in active promoters diminishes with increasing conditional eQTL order. In other words, conditional eQTL show greater enrichment in enhancers relative to promoters than do primary eQTL.Third, we have identified a number of candidate genes for which genetic variation for expression co-localizes with genetic variation for schizophrenia risk (Table 2), including cases of co-localization with conditional eQTL. Genetic co-localization is expected if gene expression causally mediates disease risk, although we recognize that co-localization could also result from pleiotropy or linkage, particularly in regions of extensive linkage disequilibrium and haplotype structure.40, 67 We also note that several co-localization methods have recently been developed,37, 38, 40, 41, 42 and direct comparisons have found broad concordance among these methods and a high degree of specificity of positive results using coloc.42, 45, 46 However, some differences in results would likely be achieved using alternative co-localization methods.Our analyses prioritize 27 genes within 24 genome-wide significant (GWAS p < 5 × 10−8) SCZ loci and 19 genes in 17 suggestive (p < 1 × 10−6) loci. In addition to a number of previously implicated SCZ risk genes, our findings include several genes not previously considered as candidates, in some cases—e.g., SLC35E2, PTPRU (MIM: 602454), LINC01792, DCLK3, PPM1M, LOC101929479—because the genes themselves do not overlap the GWAS locus regions but their eQTL do. In examining these genes for expression specificity in GTEx tissues, brain sample cell types from single-cell RNA-seq, and in BrainSpan DLPFC developmental periods (Tables 2 and S8), we find their expression contexts show a diversity of patterns and can provide clues to generate specific hypotheses for functional follow-up of their potential roles in SCZ. Interestingly, genes broadly expressed across cell types tend to show prenatal expression.Fourth, we highlight the importance of examining conditional eQTL for co-localization with GWASs. In at least 6 out of 40 loci showing GWAS-eQTL co-localization, a conditional eQTL signal co-localizes with SCZ risk. This is likely to be a conservative estimate, as the smaller effect sizes of conditional eQTL results in bias against detection of conditional GWAS-eQTL co-localization. If we had considered only primary eQTL in the analyses, these instances of co-localization would not have been identified. Among our highlighted conditional eQTL-GWAS co-localized genes are IREB2, STAT6, and PROX1-AS1. IREB2 (iron regulatory element binding protein 2) is a key regulator of iron homeostasis68, 69 that has been previously implicated in neurodegenerative disorders.70, 71 MouseIREB2 homolog Irp2 knockouts exhibit impairments in coordination and balance, exploration, and nociception. The immune-related transcription factor STAT6 induces interleukin 4 (IL-4)-mediated anti-apoptotic activity of T helper cells, and the locus is associated with migraine72, 73 and brain glioma as well as several immune/inflammatory diseases.75, 76, 77 STAT6 also activates neuronal progenitor/stem cells and neurogenesis, making it intriguing as an immune-related SCZ candidate given recent observations about the role of complement factor 4 (C4) gene as a SCZ risk gene and prior work potentially implicating microglia. Consistent with a role in immune-mediated synaptic pruning, STAT6 expression is broadly postnatal and shows specificity for microglia (Table S8). PROX1-AS1 encodes a lncRNA that has been implicated as aberrantly expressed in several cancers, is upregulated in the cell cycle S-phase, and promotes G1/S transition in cell culture. As a potential regulator of the Prospero Homeobox 1 (PROX1) transcription factor, it could be involved in development and cell differentiation in several tissues, including oligodendrocytes and GABAnergic interneurons in the brain. PROX1-AS1 expression is specific to neurons and mature oligodendrocytes and is expressed postnatally (Table S8).In conclusion, we find that conditional eQTL are widespread and are consistent with complex and context-specific regulation. Accounting for conditional eQTL leads to new findings of GWAS-eQTL co-localization and generates specific hypotheses for the role of gene expression regulation in disease etiology. The analytical strategy presented here could be implemented as a means of identification of putatively causal genes for any phenotype in which GWAS summary statistics and expression and genotype data from the GWAS phenotype-relevant tissue are available. Conditional eQTL that co-localize with disease risk may reflect regulatory mechanisms that are important in a key developmental period or individual cell type and may be missed when focusing on primary eQTL discovered in adult whole tissue. As further efforts are made to generate data across ranges of tissues or individual cell types, we may have a better ability to directly identify regulatory variants specific to these contexts. However, if a variant is primarily active in a very specific time point or stimulus condition, capturing data reflecting this condition will remain challenging. Conditional co-localization analysis in well-powered eQTL cohorts may best identify the genes driving these trait associations, though further validation work will be required to understand the mechanism by which the gene contributes to disease risk.
Consortia
CMC leadership: Pamela Sklar, Joseph Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie Devlin, David Lewis (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University of Pennsylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals Company Limited), Enrico Domenici, Laurent Essioux (F. Hoffmann-La Roche Ltd), Lara Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner, and Barbara Lipska (NIMH). Additional members of CMC: A. Ercument Cicek, Cong Lu, Kathryn Roeder, Lu Xie (Carnegie Mellon Univ.); Konrad Talbot (Cedars-Sinai Medical Center); Scott E. Hemby (High Point Univ.); Laurent Essioux (Hoffmann-La Roche); Andrew Browne, Andrew Chess, Aaron Topol, Alexander Charney, Amanda Dobbyn, Ben Readhead, Bin Zhang, Dalila Pinto, David A. Bennett, David H. Kavanagh, Douglas M. Ruderfer, Eli A. Stahl, Eric E. Schadt, Gabriel E. Hoffman, Hardik R. Shah, Jun Zhu, Jessica S. Johnson, John F. Fullard, Joel T. Dudley, Kiran Girdhar, Kristen J. Brennand, Laura G. Sloofman, Laura M. Huckins, Menachem Fromer, Milind C. Mahajan, Panos Roussos, Schahram Akbarian, Shaun M. Purcell, Tymor Hamamsy, Towfique Raj, Vahram Haroutunian, Ying-Chih Wang, Zeynep H. Gümüş (Mount Sinai School of Med.); Geetha Senthil, Robin Kramer (NIMH); Benjamin A. Logsdon, Jonathan M.J. Derry, Kristen K. Dang, Solveig K. Sieberts, Thanneer M. Perumal (Sage Bionetworks); Roberto Visintainer (Univ. Trento, Italy); Leslie A. Shinobu (Takeda); Patrick F. Sullivan (Univ. North Carolina); and Lambertus L. Klei (Univ. Pittsburgh School of Med.).
Authors: Martha S Windrem; Mikhail Osipovitch; Zhengshan Liu; Janna Bates; Devin Chandler-Militello; Lisa Zou; Jared Munir; Steven Schanz; Katherine McCoy; Robert H Miller; Su Wang; Maiken Nedergaard; Robert L Findling; Paul J Tesar; Steven A Goldman Journal: Cell Stem Cell Date: 2017-07-20 Impact factor: 24.633
Authors: Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang Journal: Nat Genet Date: 2016-03-28 Impact factor: 38.330
Authors: Bert van der Zwaag; Wouter G Staal; Ron Hochstenbach; Martin Poot; Henk A Spierenburg; Maretha V de Jonge; Nienke E Verbeek; Ruben van 't Slot; Michael A van Es; Frank J Staal; Christine M Freitag; Jacobine E Buizer-Voskamp; Marcel R Nelen; Leonard H van den Berg; Hans K Ploos van Amstel; Herman van Engeland; J Peter H Burbach Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2010-06-05 Impact factor: 3.568
Authors: Dorothy P Schafer; Emily K Lehrman; Amanda G Kautzman; Ryuta Koyama; Alan R Mardinly; Ryo Yamasaki; Richard M Ransohoff; Michael E Greenberg; Ben A Barres; Beth Stevens Journal: Neuron Date: 2012-05-24 Impact factor: 17.173
Authors: Elizabeth K Speliotes; Cristen J Willer; Sonja I Berndt; Keri L Monda; Gudmar Thorleifsson; Anne U Jackson; Hana Lango Allen; Cecilia M Lindgren; Jian'an Luan; Reedik Mägi; Joshua C Randall; Sailaja Vedantam; Thomas W Winkler; Lu Qi; Tsegaselassie Workalemahu; Iris M Heid; Valgerdur Steinthorsdottir; Heather M Stringham; Michael N Weedon; Eleanor Wheeler; Andrew R Wood; Teresa Ferreira; Robert J Weyant; Ayellet V Segrè; Karol Estrada; Liming Liang; James Nemesh; Ju-Hyun Park; Stefan Gustafsson; Tuomas O Kilpeläinen; Jian Yang; Nabila Bouatia-Naji; Tõnu Esko; Mary F Feitosa; Zoltán Kutalik; Massimo Mangino; Soumya Raychaudhuri; Andre Scherag; Albert Vernon Smith; Ryan Welch; Jing Hua Zhao; Katja K Aben; Devin M Absher; Najaf Amin; Anna L Dixon; Eva Fisher; Nicole L Glazer; Michael E Goddard; Nancy L Heard-Costa; Volker Hoesel; Jouke-Jan Hottenga; Asa Johansson; Toby Johnson; Shamika Ketkar; Claudia Lamina; Shengxu Li; Miriam F Moffatt; Richard H Myers; Narisu Narisu; John R B Perry; Marjolein J Peters; Michael Preuss; Samuli Ripatti; Fernando Rivadeneira; Camilla Sandholt; Laura J Scott; Nicholas J Timpson; Jonathan P Tyrer; Sophie van Wingerden; Richard M Watanabe; Charles C White; Fredrik Wiklund; Christina Barlassina; Daniel I Chasman; Matthew N Cooper; John-Olov Jansson; Robert W Lawrence; Niina Pellikka; Inga Prokopenko; Jianxin Shi; Elisabeth Thiering; Helene Alavere; Maria T S Alibrandi; Peter Almgren; Alice M Arnold; Thor Aspelund; Larry D Atwood; Beverley Balkau; Anthony J Balmforth; Amanda J Bennett; Yoav Ben-Shlomo; Richard N Bergman; Sven Bergmann; Heike Biebermann; Alexandra I F Blakemore; Tanja Boes; Lori L Bonnycastle; Stefan R Bornstein; Morris J Brown; Thomas A Buchanan; Fabio Busonero; Harry Campbell; Francesco P Cappuccio; Christine Cavalcanti-Proença; Yii-Der Ida Chen; Chih-Mei Chen; Peter S Chines; Robert Clarke; Lachlan Coin; John Connell; Ian N M Day; Martin den Heijer; Jubao Duan; Shah Ebrahim; Paul Elliott; Roberto Elosua; Gudny Eiriksdottir; Michael R Erdos; Johan G Eriksson; Maurizio F Facheris; Stephan B Felix; Pamela Fischer-Posovszky; Aaron R Folsom; Nele Friedrich; Nelson B Freimer; Mao Fu; Stefan Gaget; Pablo V Gejman; Eco J C Geus; Christian Gieger; Anette P Gjesing; Anuj Goel; Philippe Goyette; Harald Grallert; Jürgen Grässler; Danielle M Greenawalt; Christopher J Groves; Vilmundur Gudnason; Candace Guiducci; Anna-Liisa Hartikainen; Neelam Hassanali; Alistair S Hall; Aki S Havulinna; Caroline Hayward; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Anke Hinney; Albert Hofman; Georg Homuth; Jennie Hui; Wilmar Igl; Carlos Iribarren; Bo Isomaa; Kevin B Jacobs; Ivonne Jarick; Elizabeth Jewell; Ulrich John; Torben Jørgensen; Pekka Jousilahti; Antti Jula; Marika Kaakinen; Eero Kajantie; Lee M Kaplan; Sekar Kathiresan; Johannes Kettunen; Leena Kinnunen; Joshua W Knowles; Ivana Kolcic; Inke R König; Seppo Koskinen; Peter Kovacs; Johanna Kuusisto; Peter Kraft; Kirsti Kvaløy; Jaana Laitinen; Olivier Lantieri; Chiara Lanzani; Lenore J Launer; Cecile Lecoeur; Terho Lehtimäki; Guillaume Lettre; Jianjun Liu; Marja-Liisa Lokki; Mattias Lorentzon; Robert N Luben; Barbara Ludwig; Paolo Manunta; Diana Marek; Michel Marre; Nicholas G Martin; Wendy L McArdle; Anne McCarthy; Barbara McKnight; Thomas Meitinger; Olle Melander; David Meyre; Kristian Midthjell; Grant W Montgomery; Mario A Morken; Andrew P Morris; Rosanda Mulic; Julius S Ngwa; Mari Nelis; Matt J Neville; Dale R Nyholt; Christopher J O'Donnell; Stephen O'Rahilly; Ken K Ong; Ben Oostra; Guillaume Paré; Alex N Parker; Markus Perola; Irene Pichler; Kirsi H Pietiläinen; Carl G P Platou; Ozren Polasek; Anneli Pouta; Suzanne Rafelt; Olli Raitakari; Nigel W Rayner; Martin Ridderstråle; Winfried Rief; Aimo Ruokonen; Neil R Robertson; Peter Rzehak; Veikko Salomaa; Alan R Sanders; Manjinder S Sandhu; Serena Sanna; Jouko Saramies; Markku J Savolainen; Susann Scherag; Sabine Schipf; Stefan Schreiber; Heribert Schunkert; Kaisa Silander; Juha Sinisalo; David S Siscovick; Jan H Smit; Nicole Soranzo; Ulla Sovio; Jonathan Stephens; Ida Surakka; Amy J Swift; Mari-Liis Tammesoo; Jean-Claude Tardif; Maris Teder-Laving; Tanya M Teslovich; John R Thompson; Brian Thomson; Anke Tönjes; Tiinamaija Tuomi; Joyce B J van Meurs; Gert-Jan van Ommen; Vincent Vatin; Jorma Viikari; Sophie Visvikis-Siest; Veronique Vitart; Carla I G Vogel; Benjamin F Voight; Lindsay L Waite; Henri Wallaschofski; G Bragi Walters; Elisabeth Widen; Susanna Wiegand; Sarah H Wild; Gonneke Willemsen; Daniel R Witte; Jacqueline C Witteman; Jianfeng Xu; Qunyuan Zhang; Lina Zgaga; Andreas Ziegler; Paavo Zitting; John P Beilby; I Sadaf Farooqi; Johannes Hebebrand; Heikki V Huikuri; Alan L James; Mika Kähönen; Douglas F Levinson; Fabio Macciardi; Markku S Nieminen; Claes Ohlsson; Lyle J Palmer; Paul M Ridker; Michael Stumvoll; Jacques S Beckmann; Heiner Boeing; Eric Boerwinkle; Dorret I Boomsma; Mark J Caulfield; Stephen J Chanock; Francis S Collins; L Adrienne Cupples; George Davey Smith; Jeanette Erdmann; Philippe Froguel; Henrik Grönberg; Ulf Gyllensten; Per Hall; Torben Hansen; Tamara B Harris; Andrew T Hattersley; Richard B Hayes; Joachim Heinrich; Frank B Hu; Kristian Hveem; Thomas Illig; Marjo-Riitta Jarvelin; Jaakko Kaprio; Fredrik Karpe; Kay-Tee Khaw; Lambertus A Kiemeney; Heiko Krude; Markku Laakso; Debbie A Lawlor; Andres Metspalu; Patricia B Munroe; Willem H Ouwehand; Oluf Pedersen; Brenda W Penninx; Annette Peters; Peter P Pramstaller; Thomas Quertermous; Thomas Reinehr; Aila Rissanen; Igor Rudan; Nilesh J Samani; Peter E H Schwarz; Alan R Shuldiner; Timothy D Spector; Jaakko Tuomilehto; Manuela Uda; André Uitterlinden; Timo T Valle; Martin Wabitsch; Gérard Waeber; Nicholas J Wareham; Hugh Watkins; James F Wilson; Alan F Wright; M Carola Zillikens; Nilanjan Chatterjee; Steven A McCarroll; Shaun Purcell; Eric E Schadt; Peter M Visscher; Themistocles L Assimes; Ingrid B Borecki; Panos Deloukas; Caroline S Fox; Leif C Groop; Talin Haritunians; David J Hunter; Robert C Kaplan; Karen L Mohlke; Jeffrey R O'Connell; Leena Peltonen; David Schlessinger; David P Strachan; Cornelia M van Duijn; H-Erich Wichmann; Timothy M Frayling; Unnur Thorsteinsdottir; Gonçalo R Abecasis; Inês Barroso; Michael Boehnke; Kari Stefansson; Kari E North; Mark I McCarthy; Joel N Hirschhorn; Erik Ingelsson; Ruth J F Loos Journal: Nat Genet Date: 2010-10-10 Impact factor: 38.330
Authors: Hreinn Stefansson; Dan Rujescu; Sven Cichon; Olli P H Pietiläinen; Andres Ingason; Stacy Steinberg; Ragnheidur Fossdal; Engilbert Sigurdsson; Thordur Sigmundsson; Jacobine E Buizer-Voskamp; Thomas Hansen; Klaus D Jakobsen; Pierandrea Muglia; Clyde Francks; Paul M Matthews; Arnaldur Gylfason; Bjarni V Halldorsson; Daniel Gudbjartsson; Thorgeir E Thorgeirsson; Asgeir Sigurdsson; Adalbjorg Jonasdottir; Aslaug Jonasdottir; Asgeir Bjornsson; Sigurborg Mattiasdottir; Thorarinn Blondal; Magnus Haraldsson; Brynja B Magnusdottir; Ina Giegling; Hans-Jürgen Möller; Annette Hartmann; Kevin V Shianna; Dongliang Ge; Anna C Need; Caroline Crombie; Gillian Fraser; Nicholas Walker; Jouko Lonnqvist; Jaana Suvisaari; Annamarie Tuulio-Henriksson; Tiina Paunio; Timi Toulopoulou; Elvira Bramon; Marta Di Forti; Robin Murray; Mirella Ruggeri; Evangelos Vassos; Sarah Tosato; Muriel Walshe; Tao Li; Catalina Vasilescu; Thomas W Mühleisen; August G Wang; Henrik Ullum; Srdjan Djurovic; Ingrid Melle; Jes Olesen; Lambertus A Kiemeney; Barbara Franke; Chiara Sabatti; Nelson B Freimer; Jeffrey R Gulcher; Unnur Thorsteinsdottir; Augustine Kong; Ole A Andreassen; Roel A Ophoff; Alexander Georgi; Marcella Rietschel; Thomas Werge; Hannes Petursson; David B Goldstein; Markus M Nöthen; Leena Peltonen; David A Collier; David St Clair; Kari Stefansson Journal: Nature Date: 2008-09-11 Impact factor: 49.962
Authors: Laura M Huckins; Amanda Dobbyn; Douglas M Ruderfer; Gabriel Hoffman; Weiqing Wang; Antonio F Pardiñas; Veera M Rajagopal; Thomas D Als; Hoang T Nguyen; Kiran Girdhar; James Boocock; Panos Roussos; Menachem Fromer; Robin Kramer; Enrico Domenici; Eric R Gamazon; Shaun Purcell; Ditte Demontis; Anders D Børglum; James T R Walters; Michael C O'Donovan; Patrick Sullivan; Michael J Owen; Bernie Devlin; Solveig K Sieberts; Nancy J Cox; Hae Kyung Im; Pamela Sklar; Eli A Stahl Journal: Nat Genet Date: 2019-03-25 Impact factor: 38.330
Authors: Lulu Shang; Jennifer A Smith; Wei Zhao; Minjung Kho; Stephen T Turner; Thomas H Mosley; Sharon L R Kardia; Xiang Zhou Journal: Am J Hum Genet Date: 2020-03-26 Impact factor: 11.025
Authors: Max Lam; W David Hill; Joey W Trampush; Jin Yu; Emma Knowles; Gail Davies; Eli Stahl; Laura Huckins; David C Liewald; Srdjan Djurovic; Ingrid Melle; Kjetil Sundet; Andrea Christoforou; Ivar Reinvang; Pamela DeRosse; Astri J Lundervold; Vidar M Steen; Thomas Espeseth; Katri Räikkönen; Elisabeth Widen; Aarno Palotie; Johan G Eriksson; Ina Giegling; Bettina Konte; Annette M Hartmann; Panos Roussos; Stella Giakoumaki; Katherine E Burdick; Antony Payton; William Ollier; Ornit Chiba-Falek; Deborah K Attix; Anna C Need; Elizabeth T Cirulli; Aristotle N Voineskos; Nikos C Stefanis; Dimitrios Avramopoulos; Alex Hatzimanolis; Dan E Arking; Nikolaos Smyrnis; Robert M Bilder; Nelson A Freimer; Tyrone D Cannon; Edythe London; Russell A Poldrack; Fred W Sabb; Eliza Congdon; Emily Drabant Conley; Matthew A Scult; Dwight Dickinson; Richard E Straub; Gary Donohoe; Derek Morris; Aiden Corvin; Michael Gill; Ahmad R Hariri; Daniel R Weinberger; Neil Pendleton; Panos Bitsios; Dan Rujescu; Jari Lahti; Stephanie Le Hellard; Matthew C Keller; Ole A Andreassen; Ian J Deary; David C Glahn; Anil K Malhotra; Todd Lencz Journal: Am J Hum Genet Date: 2019-08-01 Impact factor: 11.025
Authors: Chelsea K Raulerson; Arthur Ko; John C Kidd; Kevin W Currin; Sarah M Brotman; Maren E Cannon; Ying Wu; Cassandra N Spracklen; Anne U Jackson; Heather M Stringham; Ryan P Welch; Christian Fuchsberger; Adam E Locke; Narisu Narisu; Aldons J Lusis; Mete Civelek; Terrence S Furey; Johanna Kuusisto; Francis S Collins; Michael Boehnke; Laura J Scott; Dan-Yu Lin; Michael I Love; Markku Laakso; Päivi Pajukanta; Karen L Mohlke Journal: Am J Hum Genet Date: 2019-09-26 Impact factor: 11.025