| Literature DB >> 29632380 |
Hilary K Finucane1,2,3, Yakir A Reshef4, Verneri Anttila5,6, Kamil Slowikowski5,7,8, Alexander Gusev9, Andrea Byrnes5,6, Steven Gazal9, Po-Ru Loh9, Caleb Lareau5,10, Noam Shoresh5, Giulio Genovese5, Arpiar Saunders11, Evan Macosko11, Samuela Pollack9, John R B Perry12, Jason D Buenrostro5,13, Bradley E Bernstein5,14, Soumya Raychaudhuri5,8,15,16,17, Steven McCarroll5,11, Benjamin M Neale5,6, Alkes L Price18,19.
Abstract
We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29632380 PMCID: PMC5896795 DOI: 10.1038/s41588-018-0081-4
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
List of gene expression data sets used in this study. We analyzed five gene expression data sets: two (GTEx and Franke lab) containing a wide range of tissues and three (Cahoy, PsychENCODE, ImmGen) with more detailed information about a particular tissue.
| Name | Organism | Tissues/cell types | Technology |
|---|---|---|---|
| GTEx[ | Human | 53 tissues/cell types | RNA-seq |
| Franke lab[ | Human/mouse/rat | 152 tissues/cell types | Array |
| Cahoy[ | Mouse | 3 brain cell types | Array |
| PsychENCODE[ | Human | 2 neuronal cell types | RNA-seq |
| ImmGen[ | Mouse | 292 immune cell types | Array |
Figure 1Overview of the approach. For each tissue in our gene expression data set, we compute t-statistics for differential expression for each gene. We then rank genes by t-statistic, take the top 10% of genes, and add a 100kb window to get a genome annotation. We use stratified LD score regression[7] to test whether this annotation is significantly enriched for per-SNP heritability, conditional on the baseline model[7] and the set of all genes.
Figure 2Results of the multiple-tissue analysis for selected traits. Results for the remaining traits are displayed in Figure S1. Each point represents a tissue/cell type from either the GTEx data set or the Franke lab data set. Large points pass the FDR<5% cutoff, –log10(P)=2.75. GWAS data is described in Table S4, gene expression data is described in the Online Methods and Tables S2-3, and the statistical method is described in the Overview of Methods and the Online Methods. Numerical results are reported in Table S6.
Figure 3Validation of gene expression results with chromatin data. (A) Examples of validation using chromatin data (bottom) of results from gene expression data (top), for selected traits. Results using chromatin data for all traits are displayed in Figure S5, with numerical results in Table S7. For the chromatin results, each point represents a track of peaks for H3K4me3, H3K4me1, H3K9ac, H3K27ac, H3K36me3, or DHS in a single tissue/cell type. (B) Results using gene expression data (including GTEx), Roadmap, and EN-TEx, for migraine (all subtypes) and migraine without aura. For both subfigures, large points pass the FDR<5% cutoff, –log10(P)=2.85 (chromatin) or –log10(P)=2.75 (gene expression). GWAS data is described in Table S4; gene expression data and chromatin data are described in the Online Methods, Tables S2-3, and Table S7; and the statistical method is described in the Overview of Methods and the Online Methods.
Figure 4Results of the brain analysis for selected traits. Numerical results for all traits are reported in Table S8. (A) Results from within-brain analysis of 13 brain regions in GTEx, classified into four groups, for seven of 12 brain-related traits. Large points passed the FDR<5% cutoff, –log10(P)=2.34. (B) Results from the data of Cahoy et al. on three brain cell types for seven of 12 brain-related traits. Large points passed the FDR<5% cutoff, –log10(P)=2.22. (C) Results from PyschENCODE data on two neuronal subtypes for three of five neuron-related traits. Large points passed the Bonferroni significance threshold in this analysis, –log10(P)=2.06. GWAS data is described in Table S4, gene expression data is described in the Online Methods and Table S8, and the statistical method is described in the Overview of Methods and the Online Methods.
Figure 5Results of the analysis of ImmGen gene expression data (top) and hematopoiesis ATAC-seq data (bottom) for selected traits. Results for the remaining traits are displayed in Figure S9. Large points passed the FDR<5% cutoff, –log10(P)=3.03 (Gene expression) or –log10(P)=2.32 (Chromatin). Numerical results are reported in Table S10. GWAS data is described in Table S4, gene expression and chromatin data is described in the Online Methods and Table S10, and the statistical method is described in the Overview of Methods and the Online Methods.