Literature DB >> 29945882

An atlas of chromatin accessibility in the adult human brain.

John F Fullard1,2,3, Mads E Hauberg1,2,4,5,6, Jaroslav Bendl1,2,3, Gabor Egervari1,2,7, Maria-Daniela Cirnaru8, Sarah M Reach3, Jan Motl9, Michelle E Ehrlich3,8,10, Yasmin L Hurd1,2,7, Panos Roussos1,2,3,11.   

Abstract

Most common genetic risk variants associated with neuropsychiatric disease are noncoding and are thought to exert their effects by disrupting the function of cis regulatory elements (CREs), including promoters and enhancers. Within each cell, chromatin is arranged in specific patterns to expose the repertoire of CREs required for optimal spatiotemporal regulation of gene expression. To further understand the complex mechanisms that modulate transcription in the brain, we used frozen postmortem samples to generate the largest human brain and cell-type-specific open chromatin data set to date. Using the Assay for Transposase Accessible Chromatin followed by sequencing (ATAC-seq), we created maps of chromatin accessibility in two cell types (neurons and non-neurons) across 14 distinct brain regions of five individuals. Chromatin structure varies markedly by cell type, with neuronal chromatin displaying higher regional variability than that of non-neurons. Among our findings is an open chromatin region (OCR) specific to neurons of the striatum. When placed in the mouse, a human sequence derived from this OCR recapitulates the cell type and regional expression pattern predicted by our ATAC-seq experiments. Furthermore, differentially accessible chromatin overlaps with the genetic architecture of neuropsychiatric traits and identifies differences in molecular pathways and biological functions. By leveraging transcription factor binding analysis, we identify protein-coding and long noncoding RNAs (lncRNAs) with cell-type and brain region specificity. Our data provide a valuable resource to the research community and we provide this human brain chromatin accessibility atlas as an online database "Brain Open Chromatin Atlas (BOCA)" to facilitate interpretation.
© 2018 Fullard et al.; Published by Cold Spring Harbor Laboratory Press.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29945882      PMCID: PMC6071637          DOI: 10.1101/gr.232488.117

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


Within the human brain, combinational binding of transcription factors at chromatin accessible cis regulatory elements (CREs), such as promoters and enhancers, orchestrates gene expression in different cell types and brain regions. Understanding the role of CREs in the human brain is of great interest, because the majority of common genetic risk variants associated with neuropsychiatric disease affect transcriptional regulatory mechanisms as opposed to protein structure and function (Maurano et al. 2012; Gusev et al. 2014; Roussos et al. 2014; Roadmap Epigenomics Consortium 2015; Fullard et al. 2017). Previous efforts to map CREs in human brain were limited either by their use of homogenate tissue, consisting of a mixture of markedly different cell types (Maurano et al. 2012; Andersson et al. 2014; Roadmap Epigenomics Consortium 2015) or focused on a single cortical region (Fullard et al. 2017). To further understand the role of CREs in human brain function, we sought to generate a comprehensive map of open chromatin regions (OCRs). We applied ATAC-seq to postmortem nuclei extracted from two broad cell types—neuronal (NeuN+) and non-neuronal (NeuN−)—isolated from 14 discrete brain regions of five adult individuals. Chromatin accessibility varies enormously between neurons and non-neurons, and both show enrichment with known cell type markers. Although the pattern of open chromatin in non-neurons is largely invariable, significant variability in neuronal chromatin structure is observed across different brain regions with the most extensive differences seen between neurons of the cortical regions, hippocampus, thalamus, and striatum. We identify numerous cell-type– and region-specific OCRs. Transcription factor (TF) footprinting analysis infers cell type differences in the regulation of gene expression and identifies protein-coding and long noncoding RNAs (lncRNA) with cell type and brain region specificity. Moreover, cell- and region-specific differentially accessible OCRs are enriched for genetic variants associated with neuropsychiatric traits. Overall, our findings emphasize the importance of conducting cell-type– and region-specific epigenetic studies to elucidate regulatory and disease-associated mechanisms in the human brain. Our data provide a valuable resource to the research community, and we provide our raw data and genome browser tracks to facilitate further studies of gene regulation in the human brain.

Results

Maps of chromatin accessibility in neuronal and non-neuronal nuclei across 14 brain regions

To map chromatin accessibility in neuronal and non-neuronal nuclei across 14 brain regions (Fig. 1; Supplemental Fig. S1), we combined fluorescence-activated nuclear sorting (FANS) followed by ATAC-seq on 122 nuclear preparations obtained from 14 brain regions of five control subjects (Supplemental Table S1). We processed these data bioinformatically (Supplemental Fig. S2), and multiple metrics, including genotype concordance, gender, and evaluation of cell types, did not indicate sample mislabeling or contamination (Supplemental Fig. S3). Quality control (QC) metrics (confirmed by visual inspection of the mapped reads) led to the exclusion of seven libraries, leaving a final total of 115 libraries (Supplemental Table S2; Supplemental Fig. S3A). Overall, we obtained 4.3 billion (average of 37.8 million) uniquely mapped reads after removing duplicate reads (mean 24%) and those aligning to the mitochondrial genome (mean 1%) (Supplemental Table S3). Samples within the same brain region and cell type were very strongly correlated (Pearson correlation, r = 0.913), indicating high reproducibility among the samples (Supplemental Fig. S4). To assess the quality of our data, we compared it to five publicly available data sets generated using more optimal starting material, such as fresh tissue and cell lines (Supplemental Fig. S5; Qu et al. 2015; Corces et al. 2016, 2017; Novakovic et al. 2016; Banovich et al. 2018). Our data compared favorably and showed the lowest fraction of mitochondrial reads and the highest amount of uniquely mapped, nonduplicated, paired-end reads.
Figure 1.

Schematic outline of the study design. Dissections from 14 brain regions of five control subjects were obtained from frozen human postmortem tissue. We combined fluorescence-activated nuclear sorting (FANS) with ATAC-seq, followed by downstream analyses, to identify cell-type–specific open chromatin regions. The brain regions and abbreviations are described in Supplemental Table S2.

Schematic outline of the study design. Dissections from 14 brain regions of five control subjects were obtained from frozen human postmortem tissue. We combined fluorescence-activated nuclear sorting (FANS) with ATAC-seq, followed by downstream analyses, to identify cell-type–specific open chromatin regions. The brain regions and abbreviations are described in Supplemental Table S2. We detected an average of 73,350 and 42,942 OCRs for neuronal and non-neuronal libraries, accounting for 1.05% and 0.709% of the genome, respectively (Supplemental Fig. S6A). Analysis of known neuronal and non-neuronal-specific genes indicate that our data identify OCRs in a cell-type–specific manner (Fig. 2A,B). The neuronal OCRs were more distal to transcription start sites (TSSs) compared to non-neuronal OCRs (Fig. 2C; Supplemental Fig. S6B). Further, there was a high overlap of OCRs within the neuronal and non-neuronal samples across the different brain regions, with >56.6% and 67.7% of OCRs found in two or more neuronal and non-neuronal samples, respectively. In general, promoter OCRs and non-neuronal OCRs were more frequently identified in multiple samples (Fig. 2D; Supplemental Fig. S7). Jointly, these findings suggest higher regional variability of OCRs and more distal regulation of genes in neurons compared to non-neurons.
Figure 2.

Comparisons between neuronal and non-neuronal OCRs of various brain regions. Representative cell-type–specific open chromatin tracks in the dorsolateral prefrontal cortex (DLPFC) and hippocampus at known neuron-specific (CAMK2A) (A) and non-neuron-specific (OLIG1 and OLIG2) genes (B). (C) Neuronal and non-neuronal OCRs show distinct distribution of genomic contexts. OCRs within 3 kb of a TSS were considered as promoter OCRs. (D) The distribution of the number of brain regions in which a consensus OCR was found, stratified by cell type and promoter/nonpromoter OCRs. OCRs within 3 kb of a TSS were considered as promoter OCRs. (E) Clustering of the individual samples (n = 115) using t-SNE. Brain regions are grouped in six broad areas: (AMY) amygdala; (HIP) hippocampus; (MDT) mediodorsal thalamus; (NCX) neocortex; (PVC) primary visual cortex; (STR) striatum. (F) Distribution of statistical dissimilarity (quantified based on the proportion of true tests, pi1) for inter- and intra-cell-type pairwise comparisons. Larger pi1 indicates a larger fraction of OCRs estimated to be different between samples based on pairwise comparisons. (G) Multidimensional scaling of brain regions and cell types (n = 28) using the pi1 estimates of statistical dissimilarity as distance. Same abbreviations as in E. The MDT non-neuronal group is immediately adjacent to, and partly obscured by, the leftmost non-neuronal striatum group.

Comparisons between neuronal and non-neuronal OCRs of various brain regions. Representative cell-type–specific open chromatin tracks in the dorsolateral prefrontal cortex (DLPFC) and hippocampus at known neuron-specific (CAMK2A) (A) and non-neuron-specific (OLIG1 and OLIG2) genes (B). (C) Neuronal and non-neuronal OCRs show distinct distribution of genomic contexts. OCRs within 3 kb of a TSS were considered as promoter OCRs. (D) The distribution of the number of brain regions in which a consensus OCR was found, stratified by cell type and promoter/nonpromoter OCRs. OCRs within 3 kb of a TSS were considered as promoter OCRs. (E) Clustering of the individual samples (n = 115) using t-SNE. Brain regions are grouped in six broad areas: (AMY) amygdala; (HIP) hippocampus; (MDT) mediodorsal thalamus; (NCX) neocortex; (PVC) primary visual cortex; (STR) striatum. (F) Distribution of statistical dissimilarity (quantified based on the proportion of true tests, pi1) for inter- and intra-cell-type pairwise comparisons. Larger pi1 indicates a larger fraction of OCRs estimated to be different between samples based on pairwise comparisons. (G) Multidimensional scaling of brain regions and cell types (n = 28) using the pi1 estimates of statistical dissimilarity as distance. Same abbreviations as in E. The MDT non-neuronal group is immediately adjacent to, and partly obscured by, the leftmost non-neuronal striatum group.

Cell type and regional differences in chromatin accessibility

To quantitatively analyze differences among cell types and brain regions, we generated a consensus set of 300,444 OCRs by taking the union of peaks called in the individual cells/brain regions (Methods). We next quantified how many reads overlapped each. Covariate analyses (Methods) revealed that, besides cell type and brain region, fraction of reads within peaks (FRiP) explains a large proportion of variation in our data (Supplemental Fig. S8). After covariate correction, all variables besides cell type and brain region explained <1% of variance. t-SNE-based clustering using the adjusted read counts clearly separated neuronal from non-neuronal samples (Fig. 2E). In addition, we also observed a more modest separation among neuronal samples into neo- and subcortical regions (hippocampus, striatum, and thalamus), indicating that regional differences are more prominent in neurons. To further assess differences in cell type and/or brain region, we performed pairwise comparisons among all samples and quantified the level of statistical significance based on the proportion of true tests, pi1. The pi1 (which equals to 1 − pi0) is an estimate of the fraction of OCRs that are differentially accessible between two groups; “1” corresponds to all OCRs estimated to have differential accessibility, whereas “0” corresponds to none of the OCRs having differential accessibility. This yielded results comparable to the t-SNE-based clustering: Among the pairwise comparisons, those between neuronal and non-neuronal cells (inter-cell-type comparison) showed a large pi1 (median = 0.59, SD = 0.10). For the intra-cell-type comparisons, there was, on average, a higher pi1 among pairs of neuronal samples than pairs of non-neuronal samples (median = 0.27, SD = 0.19 versus median = 0.064, SD = 0.10) (Fig. 2F). Furthermore, multidimensional scaling of the samples, based on the pi1 estimates as the distance metric, showed a clear distinction between neurons and non-neurons in the first dimension (Fig. 2G). Here, the neuronal samples displayed distinct clustering among different regions of the brain in the second dimension. Within the neocortex, the primary visual cortex has the most unique profile. The hippocampus and amygdala clusters showed a more similar profile with the neocortical regions when compared to the mediodorsal thalamus and striatum (putamen and nucleus accumbens). These findings are in agreement with those identified by gene expression analysis of homogenate tissue (Kang et al. 2011) and suggest a significant neuronal contribution to the regional variability described in the previous study (Kang et al. 2011). To define cell-type– and brain region–specific OCRs, we next performed differential chromatin accessibility analysis. For the brain region analysis, we only considered neuronal samples due to the comparably minor variance seen in non-neuronal samples. The cell type analysis identified 221,957 neuronal and 46,299 non-neuronal differential OCRs at false discovery rate (FDR) of 5% (Supplemental Figs. S9, S10A; Supplemental Table S4). Regional analysis identified neuronal OCRs specific to neocortex (61,410), primary visual cortex (22,248), hippocampus (11,535), mediodorsal thalamus (42,560), and striatum (97,707) at FDR 5% (Supplemental Figs. S9, S10B; Supplemental Table S4). Due to the complementary nature of the two approaches, in the following sections we used these cell-specific (neuronal and non-neuronal) and region-specific (neocortex, primary visual cortex, hippocampus, striatum, and mediodorsal thalamus) OCRs in parallel with all OCRs identified in each brain region and cell type.

Overlap with existing epigenomic annotations, cell/region-specific genes, and biological processes

We compared the OCRs with existing epigenetic data from the NIH Roadmap Epigenomics Mapping Consortium (REMC) (Ernst and Kellis 2015; Roadmap Epigenomics Consortium 2015), considering both DNase-seq (Supplemental Fig. S11) and chromatin states (Supplemental Fig. S12) from homogenate brain tissue, brain-derived cells, and nonbrain tissues (referred as “Other”). Overall, we identified a higher overlap between our OCRs and REMC brain-related DNase-seq and active chromatin states. Notably, we saw a comparatively higher overlap with non-neuronal-specific OCRs than those of neurons (Fig. 3A). This may be an indication that many neuron-specific regulatory elements are not captured when studying homogenate tissue due to an abundance of non-neurons relative to neurons.
Figure 3.

Overlap with other epigenomes and biological functions. (A) Overlap between DNase-seq OCRs and promoter/primary enhancer states of 127 epigenomes from REMC and neuronal and non-neuronal OCRs identified by ATAC-seq. Samples from REMC are split into three groups: brain tissue, brain-derived cells, and nonbrain tissues (referred to as “Other”). The full results for the individual REMC samples are shown in Supplemental Figures S11 and S12. (B) Overlap between cell- and region-specific open chromatin (ATAC-seq) and gene sets representing biological processes and pathways. Only those that were within the top five most significant gene sets in one or more ATAC-seq categories are shown. Pathways were clustered by the Jaccard index using the WardD method based on the overlap between the genes in the different gene sets and not the enrichments. This was done to show how enrichments varied by cell type and region in terms of related pathways. (#) FDR < 0.001; (·) FDR < 0.05; (Bi) BIOCARTA; (GO) gene ontology; (KG) KEGG; (Re) REACTOME. In this analysis, the region-specific OCRs were derived from neuronal samples only.

Overlap with other epigenomes and biological functions. (A) Overlap between DNase-seq OCRs and promoter/primary enhancer states of 127 epigenomes from REMC and neuronal and non-neuronal OCRs identified by ATAC-seq. Samples from REMC are split into three groups: brain tissue, brain-derived cells, and nonbrain tissues (referred to as “Other”). The full results for the individual REMC samples are shown in Supplemental Figures S11 and S12. (B) Overlap between cell- and region-specific open chromatin (ATAC-seq) and gene sets representing biological processes and pathways. Only those that were within the top five most significant gene sets in one or more ATAC-seq categories are shown. Pathways were clustered by the Jaccard index using the WardD method based on the overlap between the genes in the different gene sets and not the enrichments. This was done to show how enrichments varied by cell type and region in terms of related pathways. (#) FDR < 0.001; (·) FDR < 0.05; (Bi) BIOCARTA; (GO) gene ontology; (KG) KEGG; (Re) REACTOME. In this analysis, the region-specific OCRs were derived from neuronal samples only. To examine the overlap with cell- and region-specific genes, as well as genes involved in various biological processes, we next used the approach from GREAT (Methods; McLean et al. 2010). Using cell-type–specific genes (Zhang et al. 2014; Zeisel et al. 2015), we identified an overlap between neuronal OCRs and genes of pyramidal cells and interneurons, whereas non-neuronal OCRs overlapped with oligodendrocyte and astrocyte specific genes (Supplemental Fig. S13). Similarly, we explored the overlap with genes displaying region-specific expression profiles (Supplemental Fig. S14; Hawrylycz et al. 2012) and showed that region-specific OCRs overlapped predominantly with genes expressed in the same brain region. Although this analysis describes high-order enrichment of OCRs with region-specific genes, regional and cell-type specificity of chromatin accessibility is readily visualized at the gene level. As representative examples, we considered genes with preferential expression in cortical regions (SATB2, GJD4, STX1A, and CALHM1), mediodorsal thalamus (CHRNA2 and PLCD4), and striatum (DRD2, ADORA2A, and RGS9) (Supplemental Fig. S15). Finally, we agnostically examined the overlap with biological processes and pathways (Fig. 3B; Supplemental Fig. S16). In this analysis, neuron-specific OCRs overlapped ion channels and a range of brain-related functions, whereas non-neuronal OCRs overlapped with terms relating, among others, to the NOTCH pathway, gliogenesis, and ensheathment of neurons.

Overlap of open chromatin with neuropsychiatric traits

We used an LD-score partitioned heritability approach (Finucane et al. 2015) to assess the overlap of OCRs with genetic variants associated with 15 neuropsychiatric and unrelated traits. We found significant enrichment only for neuropsychiatric traits (Fig. 4A; Supplemental Fig. S17; Supplemental Table S5). For the cell- and region-specific OCRs, for example, neuronal- and striatal-specific OCRs were enriched for schizophrenia-associated variants, whereas neocortical- and striatal-specific OCRs were enriched for variants correlated with educational attainment. Further exploration of OCRs identified in each brain region and cell type showed that neuronal OCRs in hippocampus, nucleus accumbens, and superior temporal cortex provide the most significant enrichment with schizophrenia risk variants (Fig. 4B). These findings are in agreement with a recent study highlighting striatal medium spiny neurons and hippocampal C1A pyramidal neurons in schizophrenia (Skene et al. 2018) and DRD2, an antipsychotic drug target, being highly expressed in medium spiny neurons (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014; Skene et al. 2018). By applying the LD-score partitioned heritability approach to OCRs from homogenate brain or other tissues, we observed the strongest enrichment of schizophrenia risk variants with neuronal ATAC-seq and homogenate fetal brain OCRs (Supplemental Fig. S18; Supplemental Table S5), which is consistent with the neurodevelopmental hypothesis of schizophrenia (Rapoport et al. 2012).
Figure 4.

Overlap between genetic variants associated with various complex traits and identified OCRs assayed using LD-score partitioned heritability. (A) Overlap between cell-type– and region-specific OCRs and genetic risk variants of various traits. The region-specific OCRs are based only on neuronal samples. (B) Overlap between all OCRs identified in 14 brain regions by two cell types and schizophrenia genetic risk variants. OCRs were in all cases padded with 1000 bp to also capture adjacent genetic variants. (Chronotype) whether one is a morning or an evening person; (·) nominally significant; (#) significant after FDR correction of multiple testing across all traits and OCRs sets.

Overlap between genetic variants associated with various complex traits and identified OCRs assayed using LD-score partitioned heritability. (A) Overlap between cell-type– and region-specific OCRs and genetic risk variants of various traits. The region-specific OCRs are based only on neuronal samples. (B) Overlap between all OCRs identified in 14 brain regions by two cell types and schizophrenia genetic risk variants. OCRs were in all cases padded with 1000 bp to also capture adjacent genetic variants. (Chronotype) whether one is a morning or an evening person; (·) nominally significant; (#) significant after FDR correction of multiple testing across all traits and OCRs sets.

Classifying brain sample epigenomes using machine learning

We applied a support vector machine approach to identify OCR signatures that predict cell type and brain region in ATAC-seq samples of unknown origin (Supplemental Table S6). For accurate classification of cell type (neuron versus non-neuron) alone, cell type and cortical/subcortical regions, and cell type and five different brain regions defined based on the differential chromatin accessibility analysis, signatures of 3, 115, and 252 OCRs were needed, respectively (Supplemental Figs. S19A–D, S20; Supplemental Tables S7, S8). To corroborate our finding, we tested our models on independent ATAC-seq data sets (Egervari et al. 2017; Fullard et al. 2017). Here, the cell type classifier (3 OCR signature) achieved perfect accuracy in distinguishing neuronal and non-neuronal samples (Supplemental Fig. S19E; Supplemental Table S7). The cell type and cortical/subcortical classifier (115 OCR signature) attained an overall accuracy of 90% (Supplemental Fig. S19F; Supplemental Table S7). However, the classification of non-neuronal samples into cortical and subcortical groups seemed more challenging, yielding an accuracy of 86% versus an accuracy of 96% obtained for neurons. This difference was also evident using the cell-type and five-brain-region model (252 OCR signature). Although overall accuracy using the validation data set attained 85% (Supplemental Fig. S19F; Supplemental Table S7), neuronal and non-neuronal subgroups were classified with accuracies of 92% and 79%, respectively. The difficulties in classifying the brain region of non-neuronal samples mirror the previously observed small inter-region differences in MDS clustering and pi1 estimates and provide evidence for lesser regional variability among non-neuronal cells.

Regulatory effects of transcription factor binding on gene expression

To explore gene regulation in the brain, we performed footprinting analysis using PIQ (Sherwood et al. 2014) to infer transcription factor (TF) binding within the OCRs for cell type and region, independently. This approach utilized 431 TF binding motifs representing 807 TFs aggregated from a meta-database (Methods; Weirauch et al. 2014). We estimated a regulatory score for the impact of each TF on gene expression by weighing each TF binding site by the probability of that site being bound and the distance to the TSS. We found the overall regulatory score of a gene (sum of regulatory scores across all TFs for a given gene) to correlate markedly with gene expression (range of Spearman's rho: 0.318–0.523) (Supplemental Fig. S21), which is greater than the null (estimated based on permutation analysis: mean = −1.1 × 10−4, 95% CI range = −2 × 10−4; −1.5 × 10−5). The correlation is higher for brain-derived expression compared to whole blood, and this difference is more prominent in neuronal ATAC-seq libraries (Supplemental Fig. S21A). We next explored cell type (neuronal and non-neuronal) and brain region (cortical and subcortical) regulatory differences among samples at the gene level (protein-coding, lncRNA, and miRNA) by using a regulatory divergence score. This score takes into account both the difference in regulatory burden between samples and the regulatory divergence (defined as one minus the correlation of the gene regulation) (Methods; Qu et al. 2015). For protein-coding genes, this approach highlighted, among others, HOOK1 and KCNB2 for neuronal cells and SOX8 and HES1 for non-neuronal cells (Fig. 5A). The top 500 most neuronal and top 500 most non-neuronal genes were enriched in biologically relevant pathways. Similar analysis identified multiple protein-coding genes, including PPP1R1B, DRD2, and CACNG4 for subcortical neurons, and NRN1 and SERTM1 for cortical neurons (Fig. 5A). PPP1R1B had the twelfth highest subcortical regulatory divergence score. OCRs in the promoter and upstream enhancer region of PPP1R1B are only present in neurons of the striatum (Fig. 5B). PPP1R1B (also known as DARPP-32; dopamine and cAMP-regulated phosphoprotein) shows high expression in the dopaminoceptive medium spiny projection neurons (MSNs) of the striatum. Although frequently used as a marker of MSNs, PPP1R1B is actually widely active throughout the forebrain and in the Purkinje cells of the cerebellum (Ouimet et al. 1984; Brené et al. 1994). To validate the function of the PPP1R1B regulatory elements identified by ATAC-seq, we constructed a vector extending from 4.5 kb upstream of the TSS through the 5′ end of Exon 2 (Supplemental Fig. S22) engineered to express EGFP downstream from an Internal Ribosomal Entry Site (IRES). Pronuclear injection of this transgenic vector into mice yielded nine transgene-positive animals. Histological examination of the brain at 2 mo of age showed that seven of nine expressed EGFP in the majority of dorsal and ventral MSNs, and in the piriform cortex, a site of endogenous PPP1R1B (DARPP-32) expression (Fig. 5C–G). None showed expression outside of these regions, including in other regions with endogenous PPP1R1B expression.
Figure 5.

Identification of cell- and region-specific regulation of protein-coding genes. (A) Ranking of protein-coding genes based on their regulatory divergence score averaged across all neuronal versus all non-neuronal samples (left) and cortical neuronal samples versus subcortical neuronal samples (right). The regulatory divergence score is a combined measure for the difference in the regulatory burden for each gene, multiplied by how different the regulatory landscape is surrounding the gene (Methods). A gene set enrichment analysis using general gene sets and the top 500 most specific genes for either cell type/region using a one-sided Fisher's exact test was performed—the top three gene sets with P-values corrected for multiple testing using FDR are indicated. SOX8, AC009041.2, and LMF1 are all located in the same genetic locus. (B) Regional plot in the PPP1R1B locus showing OCRs. The promoter OCR and putative proximal enhancer OCRs are highlighted (dashed box). (C) The identified human PPP1R1B upstream OCR along with Exon 1, Intron 1, and the 5′ end of Exon 2 were used to direct expression of EGFP in transgenic mice. Expression identified with anti-PPP1R1B and DAB is restricted to the dorsal (dStr) and ventral striatum (vStr) (dorsal > ventral) and their projections (globus pallidus [gp] and substantia nigra [sn]) and the piriform cortex (pc). The black box indicates the region shown at higher magnification using immunofluorescence in D–G: (D) anti-EGFP (green); (E) anti-PPP1R1B (DARPP-32) (red); (F) DAPI (blue); (G) a merged image. EGFP is expressed exclusively in PPP1R1B positive neurons.

Identification of cell- and region-specific regulation of protein-coding genes. (A) Ranking of protein-coding genes based on their regulatory divergence score averaged across all neuronal versus all non-neuronal samples (left) and cortical neuronal samples versus subcortical neuronal samples (right). The regulatory divergence score is a combined measure for the difference in the regulatory burden for each gene, multiplied by how different the regulatory landscape is surrounding the gene (Methods). A gene set enrichment analysis using general gene sets and the top 500 most specific genes for either cell type/region using a one-sided Fisher's exact test was performed—the top three gene sets with P-values corrected for multiple testing using FDR are indicated. SOX8, AC009041.2, and LMF1 are all located in the same genetic locus. (B) Regional plot in the PPP1R1B locus showing OCRs. The promoter OCR and putative proximal enhancer OCRs are highlighted (dashed box). (C) The identified human PPP1R1B upstream OCR along with Exon 1, Intron 1, and the 5′ end of Exon 2 were used to direct expression of EGFP in transgenic mice. Expression identified with anti-PPP1R1B and DAB is restricted to the dorsal (dStr) and ventral striatum (vStr) (dorsal > ventral) and their projections (globus pallidus [gp] and substantia nigra [sn]) and the piriform cortex (pc). The black box indicates the region shown at higher magnification using immunofluorescence in D–G: (D) anti-EGFP (green); (E) anti-PPP1R1B (DARPP-32) (red); (F) DAPI (blue); (G) a merged image. EGFP is expressed exclusively in PPP1R1B positive neurons. We performed similar regulatory divergence analysis using a recent lncRNA gene assembly (Hon et al. 2017) and identified potential differentially regulated lncRNAs with cell type (neuronal and non-neuronal) and brain region (cortical and subcortical) specificity (Fig. 6A). We applied the same data used to create the aforementioned assembly and confirmed the cell type and brain region specificity of identified lncRNAs in closely related tissues. For example, non-neuronal and subcortical lncRNAs are more abundant in expression profiles derived from white matter (Neuron Projection Bundle in Fig. 6A) and striatum, respectively. Furthermore, cell type and regional specificity was validated by qPCR gene expression studies for two lncRNAs from each group (neuronal, non-neuronal, cortical, and subcortical) (Fig. 6B; Supplemental Table S9). Finally, we examined the regulation of microRNA genes in a similar manner (Supplemental Fig. S23). This analysis identified a number of differentially expressed miRNAs, including mir-124-1 for neurons, let-7a-3 for non-neurons, mir-148a for subcortical neurons, and mir-3139 for cortical neurons.
Figure 6.

Identification of cell- and region-specific regulation of lncRNA. (A) Top ranking of lncRNA genes based on their regulatory divergence score averaged across all neuronal versus all non-neuronal samples (left) and cortical neuronal samples versus subcortical neuronal samples (right). The regulatory divergence score is a combined measure of the difference in the regulatory burden for each gene multiplied by how different the regulatory landscape is surrounding that gene (Methods). lncRNA genes were obtained from the FANTOM CAT Robust category, from which only genes from the category “far from protein-coding genes” were retained. Genes with coding status “uncertain” were excluded. (Bottom) Heatmaps of whether gene expression (CAGE) identified genes associated with the given anatomical structure. “Neuron Projection Bundle” includes samples from the corpus callosum and the optic nerve, which are depleted in neuronal nuclei. Red indicates a high gene density, and blue indicates a low gene density. Numbers in parentheses indicate the number of lncRNAs associated with the ontology. (B) qPCR validation of cell-type–specific (left) and brain region–specific (right) lncRNA identified by a regulatory divergence analysis based on ATAC-seq data. Shown are fold differences in expression for neuronal (positive values) to non-neuronal (negative values) gene expression (left) and cortical (positive values) to subcortical (negative values) (right). Error bars indicate standard deviation. (PFC) prefrontal cortex; (STR) striatum; (*) P < 0.05; (**) P < 0.01; (***) P < 0.005.

Identification of cell- and region-specific regulation of lncRNA. (A) Top ranking of lncRNA genes based on their regulatory divergence score averaged across all neuronal versus all non-neuronal samples (left) and cortical neuronal samples versus subcortical neuronal samples (right). The regulatory divergence score is a combined measure of the difference in the regulatory burden for each gene multiplied by how different the regulatory landscape is surrounding that gene (Methods). lncRNA genes were obtained from the FANTOM CAT Robust category, from which only genes from the category “far from protein-coding genes” were retained. Genes with coding status “uncertain” were excluded. (Bottom) Heatmaps of whether gene expression (CAGE) identified genes associated with the given anatomical structure. “Neuron Projection Bundle” includes samples from the corpus callosum and the optic nerve, which are depleted in neuronal nuclei. Red indicates a high gene density, and blue indicates a low gene density. Numbers in parentheses indicate the number of lncRNAs associated with the ontology. (B) qPCR validation of cell-type–specific (left) and brain region–specific (right) lncRNA identified by a regulatory divergence analysis based on ATAC-seq data. Shown are fold differences in expression for neuronal (positive values) to non-neuronal (negative values) gene expression (left) and cortical (positive values) to subcortical (negative values) (right). Error bars indicate standard deviation. (PFC) prefrontal cortex; (STR) striatum; (*) P < 0.05; (**) P < 0.01; (***) P < 0.005.

Transcription factors underlying cell and regional differences

To infer TFs that underlie the regulatory differences between cell types and brain regions, we calculated the fold-change enrichment in the corresponding peaks compared to the background of all peaks (Fig. 7; Supplemental Fig. S24). Because TFs within a given TF family share binding motifs (Weirauch et al. 2014), it is difficult to determine those family members that are biologically relevant in a given context. We note, however, that a number of studies support our findings: basic helix-loop-helix (bHLH) TFs in neurons (Lee 1997); RFX1 in neurons and the hippocampus (Ma et al. 2006), and the RORA/RORB nuclear receptor TFs in the dorsal thalamus (Ino 2004). In addition, a recent study has shown that neocortical expression of the bHLH TFs, TWIST1, and TWIST2 may be unique to primates, and both genes have human-specific expressions in the neocortex compared to macaque and chimpanzee (Sousa et al. 2017). Together with our finding of enriched exposure of TWIST1 and TWIST2 binding sites in the human neocortex, this implicates these sites in the regulation of primate and human-specific neocortical genes.
Figure 7.

The top 10 transcription factor binding motifs showing the highest fold enrichment of footprinted binding sites within peaks specific to a given cell type or brain region compared to all peaks. The region-specific TFs are based only on neuronal samples. TF binding motifs are grouped by TF family, and line width indicates the log2-transformed fold enrichment. All shown enrichments were statistically significant after correcting for multiple testing in a one-sided binomial test. Similar plots of TF binding motif enrichments stratified by genomic context are shown in Supplemental Figure S24.

The top 10 transcription factor binding motifs showing the highest fold enrichment of footprinted binding sites within peaks specific to a given cell type or brain region compared to all peaks. The region-specific TFs are based only on neuronal samples. TF binding motifs are grouped by TF family, and line width indicates the log2-transformed fold enrichment. All shown enrichments were statistically significant after correcting for multiple testing in a one-sided binomial test. Similar plots of TF binding motif enrichments stratified by genomic context are shown in Supplemental Figure S24.

Discussion

The generation of a cell-type– and brain region–specific atlas of open chromatin enabled exploration of gene regulation in the adult human brain with previously unattained detail. Differential accessibility analyses and machine learning inferred cell-type– and brain region–specific signatures of open chromatin. Compared to non-neuronal populations, open chromatin regions in neurons were found to be more extensive, to be more distal to TSS, to show a smaller overlap with previously reported OCRs from bulk brain tissue, to show greater regional variability, and to show significant enrichment in generic risk variants of various neuropsychiatric traits. Enrichment analysis highlighted an overlap of open chromatin with previously reported genes showing cell-type– and region-specific expression and further implicated cell- and region-specific molecular pathways. We utilized the open chromatin patterns to infer transcription factor binding and to impute downstream gene regulation and expression. Despite limitations in predicting transcription factor binding, and ambiguity in subsequently linking its OCR to the gene(s) it regulates (Sherwood et al. 2014; Dixon et al. 2015; Maurano et al. 2015), we found a convincing correlation with gene expression studies. Using this regulatory analysis, we predicted cell- and region-specific protein-coding genes, lncRNAs, and microRNAs. As an example, we identified, and functionally validated, regulatory elements of the striatal, neuronal gene PPP1R1B. In addition, we predicted and experimentally validated cell type and regional patterns of lncRNA expression. Finally, we identified cell- and region-specific TFs based on the enrichment of their cognate binding motifs, which overlap with previous literature. We acknowledge, however, that footprinting analysis based on ATAC-seq data is limited due to the widespread sharing of recognition motifs between TFs (Weirauch et al. 2014), and future studies using other approaches such as ChIP-seq for specific TFs can complement our observations. The most distinct brain region based on the neuronal OCRs was the striatum (putamen and nucleus accumbens). An explanation for this could be that, in contrast to the other assayed brain regions, the majority of neurons here are GABAergic medium spiny neurons (Kemp and Powell 1971). All experiments were performed on nuclei extracted from frozen postmortem brain specimens. Following thawing of the samples, the cell membrane is lost and, with it, many cell-type–specific antigens that would facilitate separation of different cell types by FANS. Future studies targeting additional neuronal subtypes using single-cell approaches or by cytometric separation into secondary cell subtypes could further elucidate gene regulation across the brain. In conclusion, our findings indicate the utility of our open chromatin atlas in studying the regulation of gene expression in the brain and the impact of neuropsychiatric disease risk variants. We provide to the research community an atlas of chromatin accessibility in human brain as an online database “Brain Open Chromatin Atlas (BOCA)” to facilitate interpretation and future studies.

Methods

Brain tissue specimens from 14 brain regions of five controls with no history of psychiatric disorder and drug use were processed using a FACSAria flow cytometer to neuronal (NeuN+) and non-neuronal (NeuN−) nuclei. The Assay for Transposase Accessible Chromatin followed by sequencing (ATAC-seq) was performed using an established protocol (Buenrostro et al. 2013) and sequenced on HiSeq 2500 (Illumina) obtaining 2 × 50 paired-end reads. Reads from each sample were aligned on hg19 (GRCh37; see Supplemental Methods for a note about reference assembly) reference genome using the STAR aligner (Dobin et al. 2013) (v2.5.0). We excluded reads that (1) mapped to more than one locus using SAMtools (Li et al. 2009); (2) were duplicated using PICARD (v2.2.4); and (3) mapped to the mitochondrial genome. We merged the BAM files of samples from the same brain region and cell type and subsampled to a uniform depth. We subsequently called peaks using the model-based Analysis of ChIP-seq (MACS, v2.1) (Zhang et al. 2008) and created a joint set of peaks requiring each peak to be called in at least one of the merged BAM files. After removing peaks overlapping the blacklisted genomic regions, 300,444 peaks remained. We subsequently quantified read counts of all the individual nonmerged samples within these peaks using the featureCounts function in RSubread (v.1.15.0) (Liao et al. 2014). We used the voomWithQualityWeights function from the limma package (Liu et al. 2015) to model the normalized read counts, including fraction of reads within peaks as covariates. We performed differential chromatin accessibility analysis by fitting weighted least-squares linear regression models for the effect of cell type (neuronal and non-neuronal) and/or brain region. P-values were adjusted for multiple hypothesis testing using false discovery rate (FDR) ≤5%. The protein interaction quantitation (PIQ) framework (Sherwood et al. 2014) was used to predict transcription factor binding sites from the genome sequence. To integrate functional annotations and GWAS results, we used the LD-score partitioned heritability (Finucane et al. 2015) approach. More details are described in the Supplemental Material.

Data access

The data from this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE96949. We further provide the online database “Brain Open Chromatin Atlas (BOCA)” as UCSC tracks and download links at our webpage (http://icahn.mssm.edu/boca).
  40 in total

1.  The transcription factor regulatory factor X1 increases the expression of neuronal glutamate transporter type 3.

Authors:  Kaiwen Ma; Shuqiu Zheng; Zhiyi Zuo
Journal:  J Biol Chem       Date:  2006-05-24       Impact factor: 5.157

Review 2.  Basic helix-loop-helix genes in neural development.

Authors:  J E Lee
Journal:  Curr Opin Neurobiol       Date:  1997-02       Impact factor: 6.627

3.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases.

Authors:  Alexander Gusev; S Hong Lee; Gosia Trynka; Hilary Finucane; Bjarni J Vilhjálmsson; Han Xu; Chongzhi Zang; Stephan Ripke; Brendan Bulik-Sullivan; Eli Stahl; Anna K Kähler; Christina M Hultman; Shaun M Purcell; Steven A McCarroll; Mark Daly; Bogdan Pasaniuc; Patrick F Sullivan; Benjamin M Neale; Naomi R Wray; Soumya Raychaudhuri; Alkes L Price
Journal:  Am J Hum Genet       Date:  2014-11-06       Impact factor: 11.025

4.  Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

Authors:  Jason Ernst; Manolis Kellis
Journal:  Nat Biotechnol       Date:  2015-02-18       Impact factor: 54.908

5.  Spatio-temporal transcriptome of the human brain.

Authors:  Hyo Jung Kang; Yuka Imamura Kawasawa; Feng Cheng; Ying Zhu; Xuming Xu; Mingfeng Li; André M M Sousa; Mihovil Pletikos; Kyle A Meyer; Goran Sedmak; Tobias Guennel; Yurae Shin; Matthew B Johnson; Zeljka Krsnik; Simone Mayer; Sofia Fertuzinhos; Sheila Umlauf; Steven N Lisgo; Alexander Vortmeyer; Daniel R Weinberger; Shrikant Mane; Thomas M Hyde; Anita Huttner; Mark Reimers; Joel E Kleinman; Nenad Sestan
Journal:  Nature       Date:  2011-10-26       Impact factor: 49.962

6.  Genetic identification of brain cell types underlying schizophrenia.

Authors:  Nathan G Skene; Julien Bryois; Trygve E Bakken; Gerome Breen; James J Crowley; Héléna A Gaspar; Paola Giusti-Rodriguez; Rebecca D Hodge; Jeremy A Miller; Ana B Muñoz-Manchado; Michael C O'Donovan; Michael J Owen; Antonio F Pardiñas; Jesper Ryge; James T R Walters; Sten Linnarsson; Ed S Lein; Patrick F Sullivan; Jens Hjerling-Leffler
Journal:  Nat Genet       Date:  2018-05-21       Impact factor: 38.330

7.  An anatomically comprehensive atlas of the adult human brain transcriptome.

Authors:  Michael J Hawrylycz; Ed S Lein; Angela L Guillozet-Bongaarts; Elaine H Shen; Lydia Ng; Jeremy A Miller; Louie N van de Lagemaat; Kimberly A Smith; Amanda Ebbert; Zackery L Riley; Chris Abajian; Christian F Beckmann; Amy Bernard; Darren Bertagnolli; Andrew F Boe; Preston M Cartagena; M Mallar Chakravarty; Mike Chapin; Jimmy Chong; Rachel A Dalley; Barry David Daly; Chinh Dang; Suvro Datta; Nick Dee; Tim A Dolbeare; Vance Faber; David Feng; David R Fowler; Jeff Goldy; Benjamin W Gregor; Zeb Haradon; David R Haynor; John G Hohmann; Steve Horvath; Robert E Howard; Andreas Jeromin; Jayson M Jochim; Marty Kinnunen; Christopher Lau; Evan T Lazarz; Changkyu Lee; Tracy A Lemon; Ling Li; Yang Li; John A Morris; Caroline C Overly; Patrick D Parker; Sheana E Parry; Melissa Reding; Joshua J Royall; Jay Schulkin; Pedro Adolfo Sequeira; Clifford R Slaughterbeck; Simon C Smith; Andy J Sodt; Susan M Sunkin; Beryl E Swanson; Marquis P Vawter; Derric Williams; Paul Wohnoutka; H Ronald Zielke; Daniel H Geschwind; Patrick R Hof; Stephen M Smith; Christof Koch; Seth G N Grant; Allan R Jones
Journal:  Nature       Date:  2012-09-20       Impact factor: 49.962

8.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

9.  Biological insights from 108 schizophrenia-associated genetic loci.

Authors: 
Journal:  Nature       Date:  2014-07-22       Impact factor: 49.962

10.  Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.

Authors:  M Ryan Corces; Jason D Buenrostro; Beijing Wu; Peyton G Greenside; Steven M Chan; Julie L Koenig; Michael P Snyder; Jonathan K Pritchard; Anshul Kundaje; William J Greenleaf; Ravindra Majeti; Howard Y Chang
Journal:  Nat Genet       Date:  2016-08-15       Impact factor: 38.330

View more
  58 in total

1.  EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species.

Authors:  Tianshun Gao; Jiang Qian
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

Review 2.  Regulatory landscape in brain development and disease.

Authors:  Keeley Spiess; Hyejung Won
Journal:  Curr Opin Genet Dev       Date:  2020-06-18       Impact factor: 5.578

3.  Recapitulation and Reversal of Schizophrenia-Related Phenotypes in Setd1a-Deficient Mice.

Authors:  Jun Mukai; Enrico Cannavò; Gregg W Crabtree; Ziyi Sun; Anastasia Diamantopoulou; Pratibha Thakur; Chia-Yuan Chang; Yifei Cai; Stavros Lomvardas; Atsushi Takata; Bin Xu; Joseph A Gogos
Journal:  Neuron       Date:  2019-10-09       Impact factor: 17.173

Review 4.  Defining the Genetic, Genomic, Cellular, and Diagnostic Architectures of Psychiatric Disorders.

Authors:  Patrick F Sullivan; Daniel H Geschwind
Journal:  Cell       Date:  2019-03-21       Impact factor: 41.582

Review 5.  Investigation of Schizophrenia with Human Induced Pluripotent Stem Cells.

Authors:  Samuel K Powell; Callan P O'Shea; Sara Rose Shannon; Schahram Akbarian; Kristen J Brennand
Journal:  Adv Neurobiol       Date:  2020

Review 6.  Chromatin accessibility in neuropsychiatric disorders.

Authors:  Gabor Egervari
Journal:  Neurobiol Learn Mem       Date:  2021-04-15       Impact factor: 2.877

7.  Functional enhancer elements drive subclass-selective expression from mouse to primate neocortex.

Authors:  John K Mich; Lucas T Graybuck; Erik E Hess; Joseph T Mahoney; Yoshiko Kojima; Yi Ding; Saroja Somasundaram; Jeremy A Miller; Brian E Kalmbach; Cristina Radaelli; Bryan B Gore; Natalie Weed; Victoria Omstead; Yemeserach Bishaw; Nadiya V Shapovalova; Refugio A Martinez; Olivia Fong; Shenqin Yao; Marty Mortrud; Peter Chong; Luke Loftus; Darren Bertagnolli; Jeff Goldy; Tamara Casper; Nick Dee; Ximena Opitz-Araya; Ali Cetin; Kimberly A Smith; Ryder P Gwinn; Charles Cobbs; Andrew L Ko; Jeffrey G Ojemann; C Dirk Keene; Daniel L Silbergeld; Susan M Sunkin; Viviana Gradinaru; Gregory D Horwitz; Hongkui Zeng; Bosiljka Tasic; Ed S Lein; Jonathan T Ting; Boaz P Levi
Journal:  Cell Rep       Date:  2021-03-30       Impact factor: 9.423

Review 8.  Employing core regulatory circuits to define cell identity.

Authors:  Nathalia Almeida; Matthew W H Chung; Elena M Drudi; Elise N Engquist; Eva Hamrud; Abigail Isaacson; Victoria S K Tsang; Fiona M Watt; Francesca M Spagnoli
Journal:  EMBO J       Date:  2021-05-02       Impact factor: 14.012

9.  Parallel functional testing identifies enhancers active in early postnatal mouse brain.

Authors:  Jason T Lambert; Linda Su-Feher; Karol Cichewicz; Tracy L Warren; Iva Zdilar; Yurong Wang; Kenneth J Lim; Jessica L Haigh; Sarah J Morse; Cesar P Canales; Tyler W Stradleigh; Erika Castillo Palacios; Viktoria Haghani; Spencer D Moss; Hannah Parolini; Diana Quintero; Diwash Shrestha; Daniel Vogt; Leah C Byrne; Alex S Nord
Journal:  Elife       Date:  2021-10-04       Impact factor: 8.140

10.  You are when you eat: on circadian timing and energy balance.

Authors:  Jonathan Cedernaes; Joseph Bass
Journal:  J Clin Invest       Date:  2021-01-04       Impact factor: 14.808

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.