Literature DB >> 22307276

DNase I sensitivity QTLs are a major determinant of human expression variation.

Jacob F Degner1, Athma A Pai, Roger Pique-Regi, Jean-Baptiste Veyrieras, Daniel J Gaffney, Joseph K Pickrell, Sherryl De Leon, Katelyn Michelini, Noah Lewellen, Gregory E Crawford, Matthew Stephens, Yoav Gilad, Jonathan K Pritchard.   

Abstract

The mapping of expression quantitative trait loci (eQTLs) has emerged as an important tool for linking genetic variation to changes in gene regulation. However, it remains difficult to identify the causal variants underlying eQTLs, and little is known about the regulatory mechanisms by which they act. Here we show that genetic variants that modify chromatin accessibility and transcription factor binding are a major mechanism through which genetic variation leads to gene expression differences among humans. We used DNase I sequencing to measure chromatin accessibility in 70 Yoruba lymphoblastoid cell lines, for which genome-wide genotypes and estimates of gene expression levels are also available. We obtained a total of 2.7 billion uniquely mapped DNase I-sequencing (DNase-seq) reads, which allowed us to produce genome-wide maps of chromatin accessibility for each individual. We identified 8,902 locations at which the DNase-seq read depth correlated significantly with genotype at a nearby single nucleotide polymorphism or insertion/deletion (false discovery rate = 10%). We call such variants 'DNase I sensitivity quantitative trait loci' (dsQTLs). We found that dsQTLs are strongly enriched within inferred transcription factor binding sites and are frequently associated with allele-specific changes in transcription factor binding. A substantial fraction (16%) of dsQTLs are also associated with variation in the expression levels of nearby genes (that is, these loci are also classified as eQTLs). Conversely, we estimate that as many as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. Our observations indicate that dsQTLs are highly abundant in the human genome and are likely to be important contributors to phenotypic variation.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22307276      PMCID: PMC3501342          DOI: 10.1038/nature10808

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   49.962


It is now well-established that eQTLs are abundant in a wide range of cell-types and in diverse organisms, and recent studies have implicated human eQTLs as important contributors to phenotypic variation[1-5]. However, the underlying regulatory mechanisms by which eQTLs impact gene expression remain poorly understood. One mechanism that may be important is when the alternative alleles at a particular SNP lead to different levels of transcription factor binding or nucleosome occupancy at regulatory sites; this in turn may lead to allele-specific differences in transcription rates[9-12]. In this study, we used high-depth DNaseI-sequencing (DNase-seq) in a panel of 70 individuals and find that indeed a large fraction of eQTLs are likely caused by this type of mechanism. DNase-seq is a genome-wide extension of the classical DNaseI footprinting method[13-15]. This assay identifies regions of chromatin that are accessible (or “sensitive”) to cleavage by the DNaseI enzyme. Such regions are referred to as DNaseI hypersensitive sites (DHSs). DNaseI-sensitivity provides a precise, quantitative marker of regions of open chromatin, and correlates well with a variety of other markers of active regulatory regions including promoter and enhancer-associated histone marks. Furthermore, bound transcription factors protect the DNA sequence within a binding site from DNaseI cleavage, often producing recognizable “footprints” of reduced DNaseI sensitivity[13,15-17]. We collected DNase-seq data for 70 HapMap Yoruba lymphoblastoid cell lines (LCLs) for which gene expression data and genome-wide genotypes were already available[6-8]. We obtained an average of 39M uniquely mapped DNase-seq reads per sample, providing individual maps of chromatin accessibility for each cell line (see Supplementary Information for all analysis details). Our data allowed us to characterize the distribution of DNaseI cuts within individual hypersensitive sites at extremely high resolution. As expected, the DHSs coincide to a great extent with previously annotated regulatory regions, and DNaseI sensitivity is positively correlated with the expression levels of nearby genes (Figures S6&7). Overall, the locations of hypersensitive sites are highly correlated across individuals (Supplementary Information)[11]. We tested for genetic variants that impact local chromatin accessibility. To do this, we divided the genome into non-overlapping 100 bp windows, and then focused our analysis on the 5% of windows with the highest DNaseI sensitivity (see Supplementary Information). For each individual, we treated the number of DNase-seq reads in a given window, divided by the total number of mapped reads, as a quantitative trait that estimates the level of chromatin accessibility. We then tested for association between individual-specific DNaseI sensitivity in each window and genotypes of all SNPs and indels in a cis candidate region of 40kb centered on the target window. Using this procedure, we identified associations between genotypes and inter-individual variation in DNase-seq read depth in 9,595 windows at FDR=10% (corresponding to 8,902 distinct DHSs, once we combine adjacent windows whose hypersensitivity data is associated with the same SNP or indel; Figure 1A). We refer to these 8,902 loci as “DNaseI sensitivity QTLs”, or dsQTLs. We additionally considered a much smaller cis-candidate region of just 2kb around each target window, and found that the majority of the dsQTLs are detected within this smaller region (7,088 associated windows in 6,070 DHSs), suggesting that most dsQTLs lie close to the target DHS. In contrast, we find only weak evidence of trans-acting dsQTLs, likely because our experiment is underpowered for detecting these (Supplementary Information). For dsQTLs with enough DNase-seq reads overlapping the most significant SNP (n=892), we confirmed that the fraction of reads carrying each allele in heterozygotes correlates well with the dsQTL effect sizes (Figure 1B, correlation coefficient r=0.72, P<<10−16).
Figure 1

Genome-wide identification of dsQTLs and a typical example

(A) QQ-plots for all tests of association between DNaseI cut rates in 100bp windows, and variants within 2kb (green) and 40kb (black) regions centered on the target DHS windows. (B) Allele-specific analysis of dsQTLs in heterozygotes. Plotted are the predicted (x-axis) and observed (y-axis) fractions of reads carrying the major allele based on the genotype means. (C) Example of a dsQTL (rs4953223). The black line indicates the position of the associated SNP. (D) Boxplot showing that rs4953223 is strongly associated with local chromatin accessibility (P=3×10−13). (E) The T allele, which is associated with low DNaseI sensitivity, disrupts the binding motif of a previously identified NF-κB binding site at this location[14] (F). NF-κB ChIP-seq data from 10 individuals[7] indicates a strong effect of this SNP on NF-κB binding.

We observed that dsQTLs typically affect chromatin accessibility for about 200-300 bp (Figure 2A). Of the DHSs affected by dsQTLs, 77% lie in chromatin regions predicted by Ernst et al. to be functional in LCLs[18]: 41% in predicted enhancers, 26% in promoters, and 10% in insulators, even though those chromatin states together cover only 6.7% of the genome overall (and 38% of our hypersensitive sites).
Figure 2

Properties of dsQTLs

(A) Aggregated plot of DNaseI-sensitivity for high-confidence dsQTLs that lie within the target DHS. Individuals were assigned into the high-sensitivity (blue), heterozygote (green), and low-sensitivity (red) classes. The shading indicates the bootstrap 95% confidence intervals. (B) The peak density of dsQTLs is very tightly focused around the target DHS window. (C) Total fraction of cis-dsQTLs that fall into different categories of distance from the target window (x-axis) and different annotations (y-axis). The total area of each rectangle is proportional to the estimated number of dsQTLs in that category. (D) Boxplot showing distribution of PWM score differences between high sensitivity and low sensitivity dsQTL alleles, respectively. Notches indicate 95% CI for median. (E) The x-axis shows the fraction of sequence reads predicted to carry the major allele based on the DNaseI genotype means; the y-axis shows the observed fraction in ChIP-seq data. The lines show the regression fits for each factor separately; the numbers in the legend show the fraction of sites that are in a concordant direction for each factor.

We next studied the properties of cis-acting variants that generate dsQTLs, using a Bayesian hierarchical model that accounts for the uncertainty in which sites are causal[19](Supplementary Information). This model obtains unbiased estimates of the average properties of causal sites even though, because of linkage disequilibrium, it is typically uncertain which site is causal for any individual dsQTL (Supplementary Information). As shown in Figures 2B&C, most dsQTLs are generated by variants that are close to the target window. We estimate that 56% of the dsQTLs are due to variants that lie within the same DHSs and that 67% lie within 1 kb of the target window. dsQTLs that lie more than 1kb from the target window are themselves significantly enriched in non-adjacent DHS windows (2.4-fold compared to matched random SNPs), and are often associated with changes in sensitivity in multiple non-adjacent DHS windows (Figure S15). One intuitive mechanism for dsQTLs is that these may be caused by variants that strengthen or weaken individual transcription factor binding sites, thereby changing transcription factor affinity and local nucleosome occupancy[20-22] and hence DNaseI cut rates. Consistent with this model, an aggregated plot of DNaseI sensitivity at dsQTLs shows a distinct drop in chromatin accessibility around putatively causal SNPs that is reminiscent of transcription factor binding footprints, especially in the genotypes associated with high sensitivity[15-17]. To test the importance of disruption of transcription factor binding sites as a mechanism underlying dsQTLs, we again turned to the Bayesian hierarchical model. We used the union of all published footprint locations in lymphoblastoid cell lines[16-17], and a set of footprints that we identified using the DNase-seq data reported in this study (Supplementary Methods). Analysis using the hierarchical model indicated a 3.6-fold enrichment of dsQTLs within transcription factor binding footprints (P<<10−16), controlling for the overall enrichment within DHSs. Additionally, the allele associated with a higher score of the position weight matrix (PWM) is typically associated with higher chromatin accessibility (P<<10−16), consistent with the expectation that higher transcription factor binding affinity leads to more open chromatin (Figure 2D). Of the dsQTLs that fall within DNase-seq footprints tied to specific transcription factors motifs (using CENTIPEDE[17]), CTCF, CRE and ISRE are the most enriched while MEF2 is significantly depleted. To further understand the functional consequences of dsQTLs, we examined ChIP-seq data for nine transcription factors collected by the ENCODE Project in one or more lymphoblastoid cell lines[10,23]. Overall, the alleles that are associated with increased DNaseI sensitivity are highly associated with increased transcription factor binding (P<10−16 ; Figure 2E), indicating that dsQTLs are strong predictors of changes in occupancy by a range of DNA-binding proteins. Given that dsQTLs produce sequence-specific changes in chromatin accessibility and, frequently, changes in transcription factor binding, we hypothesized that a fraction of the dsQTL variants might also affect expression levels of nearby genes. We examined this by testing for associations between the most significant variant at each of the dsQTLs detected using the 2kb window size and expression levels of nearby genes (i.e., genes with transcription start sites, TSSs, within 100kb) estimated by sequencing RNA from the same cell lines[8]. Using this approach, we found that 16% of dsQTL SNPs are also significantly associated with variation in expression levels of at least one nearby gene (FDR=10%). This represents a huge enrichment over random expectation (450-fold, P<<10−16 ; Figure 3). One example of a joint dsQTL-eQTL is illustrated in Figure 3A, in which a SNP disrupts an interferon-sensitive response element (ISRE) located in the first intron of the SLFN5 gene, leading to both a strong dsQTL and an eQTL for SLFN5. Conversely, out of 1,271 eQTLs detected using RNA-seq data from these cell lines[8], 23% of the most significant SNPs are also dsQTLs (FDR=10%). Using the method of Storey et al.[24] for estimating the proportion of tests where the null hypothesis is false (while accounting for incomplete power), we estimate that 55% of the most significant eQTL SNPs are also dsQTLs and that 39% of the dsQTLs are also eQTLs. Hence dsQTLs are a major mechanism by which genetic variation may impact gene expression levels.
Figure 3

Relationship between dsQTLs and eQTLs

(A) Example of a dsQTL SNP that is also an eQTL for the gene SLFN5. The SNP disrupts an interferon-sensitive response element, thereby changing local chromatin accessibility within the first intron of SLFN5. Expression of SLFN5 has been shown to be inducible by interferon-α in melanoma cell-lines. DNase-seq (left column) and RNA-seq (right column) measurements from DNase-seq and RNA-seq are plotted, stratified by genotype at the putative causal SNP. (B) QQ-plot of the t-statistic for association with gene expression changes (eQTL) of dsQTL SNPs. The sign of the eQTL t-statistic is with respect to the genotype that increases DNase sensitivity.

We observed that for most (70%) of the joint dsQTL-eQTLs, the allele that is associated with increased chromatin accessibility is also associated with increased gene expression levels (Figure 3B). Since higher DNaseI-sensitivity generally correlates with higher transcription factor occupancy, this suggests that transcription factors that are bound to DHSs usually act as enhancers. CRE-box and GABP/ETS-box were the most enriched motifs among repressors and enhancers respectively. The dsQTLs that are also eQTLs (FDR=10%) are highly enriched around the TSSs of the target genes: for 23% of the joint dsQTL-eQTLs, the associated DHS is within 1kb, and for 39% it is within 10kb of the TSS (Figure 4A). This is consistent with previous work showing strong clustering of eQTLs around TSSs[19,25-26]. Nonetheless, there is a significant signal of long-range regulation as far as 100kb. Additionally, 14% of the joint dsQTL-eQTLs are significant eQTLs for two or more genes, suggesting that some regulatory regions affect more than one gene.
Figure 4

Relationship between dsQTLs and eQTLs

(A) Most joint dsQTL-eQTLs lie close to the gene TSS. (B) Effect of various factors on the log odds that a given dsQTL is also an eQTL, while controlling for the strong distance relationship observed in panel A. In annotations (1) and (2) we do not consider the direction of transcription. In annotations (6-8), ChIP-seq is measured on the dsQTL window. One of the most significant annotations in delineating the regulatory regions is defined by the presence of the CTCF insulator element, which reduces the probability that a dsQTL is an eQTL by 2.4-fold. Error bars represent 95% confidence intervals

We sought to identify additional factors that may influence whether a dsQTL regulates gene expression of nearby genes, while controlling for the very strong effect of distance from TSS (Figure 4B). We observed that a dsQTL is more likely to be an eQTL for the gene with the nearest TSS (1.6-fold, P=3×10−4) and is more likely to be an eQTL if it is located within the transcribed region of the gene (2.7-fold, P=2×10−9). Further, a dsQTL is 2.6 fold more likely to be an eQTL if it is associated with a DHS that overlaps a DNA methylation QTL[27] (P=4×10−4), and shows a 2.4-fold increase if the associated DHS overlaps a PolII ChIP-seq peak[10] (P=4×10−4). Conversely, a dsQTL is significantly less likely to be an eQTL for a gene if an active binding site for the insulator protein CTCF[17] lies between the dsQTL and the gene’s TSS (2.4-fold decrease, P=1×10−12). Finally, the presence of the enhancer mark P300 (from ENCODE ChIP-seq data[28]) in the dsQTL window increases the probability that a distal dsQTL (TSS>1.5kb) is an eQTL (1.7-fold, P=1×10−5). In summary, we have shown that common genetic variants impact chromatin accessibility at thousands of hypersensitive regions across the human genome. The putative causal variants most often lie within or very near the hypersensitive regions, and frequently act by changing the binding affinity of transcription factors. Mapping of DNaseI sensitivity QTLs provides a powerful tool for detecting potentially functional changes in a variety of different types of regulatory elements, and roughly 50% of eQTLs are also dsQTLs. Furthermore, analysis of significantly associated SNPs from genome-wide association studies additionally implicates some of these dsQTLs as potentially underlying a variety of GWAS hits (Supplementary Information). Changes in chromatin accessibility may be a major mechanism linking genetic variation to changes in gene regulation and, ultimately, organismal phenotypes.

Methods Summary

DNase-seq libraries were created as previously described[29], with small modifications. Each library was sequenced on at least two lanes of an Illumina GAIIx. Resulting 20bp sequencing reads were mapped to the human genome sequence (hg18) using an algorithm that we designed specifically to eliminate mappability biases between sequence variants. We divided the genome into 100 bp windows and selected the top 5% in terms of total DNaseI sensitivity. DNaseI sensitivity for each individual in each window was normalized by the total number of mapped reads for that individual. For QTL mapping, the data were further rescaled within and across individuals, and we adjusted the data for an observed individual × GC interaction, as well as for the top four principal components of the DNaseI sensitivity matrix. Genotypes for all available SNPs and indels were obtained from HapMap and 1000 Genomes data and imputed where necessary[6,7,30]. We performed DNase-seq association mapping by regressing the adjusted sensitivity in each window against the genotypes at variants in a 40 kb region centered on each DHS. As validation, we used our DNase-seq reads as well as ChIP-seq reads and DNase-seq reads from ENCODE to confirm that allele-specific reads spanning heterozygous sites at dsQTLs are consistent with the association analysis. We also used RNA-seq data from the same cell lines[8] to study the links between dsQTLs and eQTLs. Finally, we explored the properties of dsQTLs that make them more or less likely to influence gene expression by fitting a logistic model on all dsQTLs, where the eQTL status of each dsQTL-eQTL test is modeled as a function of distance from the TSS and a variety of other annotations. For full details of all methods see the Supplementary Information.
  29 in total

1.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells.

Authors:  Alan P Boyle; Lingyun Song; Bum-Kyu Lee; Darin London; Damian Keefe; Ewan Birney; Vishwanath R Iyer; Gregory E Crawford; Terrence S Furey
Journal:  Genome Res       Date:  2010-11-24       Impact factor: 9.043

2.  High-resolution mapping and characterization of open chromatin across the genome.

Authors:  Alan P Boyle; Sean Davis; Hennady P Shulha; Paul Meltzer; Elliott H Margulies; Zhiping Weng; Terrence S Furey; Gregory E Crawford
Journal:  Cell       Date:  2008-01-25       Impact factor: 41.582

3.  Heritable individual-specific and allele-specific chromatin signatures in humans.

Authors:  Ryan McDaniell; Bum-Kyu Lee; Lingyun Song; Zheng Liu; Alan P Boyle; Michael R Erdos; Laura J Scott; Mario A Morken; Katerina S Kucera; Anna Battenhouse; Damian Keefe; Francis S Collins; Huntington F Willard; Jason D Lieb; Terrence S Furey; Gregory E Crawford; Vishwanath R Iyer; Ewan Birney
Journal:  Science       Date:  2010-03-18       Impact factor: 47.728

4.  Variation in transcription factor binding among humans.

Authors:  Maya Kasowski; Fabian Grubert; Christopher Heffelfinger; Manoj Hariharan; Akwasi Asabere; Sebastian M Waszak; Lukas Habegger; Joel Rozowsky; Minyi Shi; Alexander E Urban; Mi-Young Hong; Konrad J Karczewski; Wolfgang Huber; Sherman M Weissman; Mark B Gerstein; Jan O Korbel; Michael Snyder
Journal:  Science       Date:  2010-03-18       Impact factor: 47.728

5.  Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS.

Authors:  Dan L Nicolae; Eric Gamazon; Wei Zhang; Shiwei Duan; M Eileen Dolan; Nancy J Cox
Journal:  PLoS Genet       Date:  2010-04-01       Impact factor: 5.917

6.  Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations.

Authors:  Alexandra C Nica; Stephen B Montgomery; Antigone S Dimas; Barbara E Stranger; Claude Beazley; Inês Barroso; Emmanouil T Dermitzakis
Journal:  PLoS Genet       Date:  2010-04-01       Impact factor: 5.917

7.  Genetic analysis of variation in transcription factor binding in yeast.

Authors:  Wei Zheng; Hongyu Zhao; Eugenio Mancera; Lars M Steinmetz; Michael Snyder
Journal:  Nature       Date:  2010-03-17       Impact factor: 49.962

8.  Understanding mechanisms underlying human gene expression variation with RNA sequencing.

Authors:  Joseph K Pickrell; John C Marioni; Athma A Pai; Jacob F Degner; Barbara E Engelhardt; Everlyne Nkadori; Jean-Baptiste Veyrieras; Matthew Stephens; Yoav Gilad; Jonathan K Pritchard
Journal:  Nature       Date:  2010-03-10       Impact factor: 49.962

9.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.

Authors:  Jay R Hesselberth; Xiaoyu Chen; Zhihong Zhang; Peter J Sabo; Richard Sandstrom; Alex P Reynolds; Robert E Thurman; Shane Neph; Michael S Kuehn; William S Noble; Stanley Fields; John A Stamatoyannopoulos
Journal:  Nat Methods       Date:  2009-03-22       Impact factor: 28.547

10.  High-resolution mapping of expression-QTLs yields insight into human gene regulation.

Authors:  Jean-Baptiste Veyrieras; Sridhar Kudaravalli; Su Yeon Kim; Emmanouil T Dermitzakis; Yoav Gilad; Matthew Stephens; Jonathan K Pritchard
Journal:  PLoS Genet       Date:  2008-10-10       Impact factor: 5.917

View more
  351 in total

Review 1.  Making the case for chromatin profiling: a new tool to investigate the immune-regulatory landscape.

Authors:  Deborah R Winter; Steffen Jung; Ido Amit
Journal:  Nat Rev Immunol       Date:  2015-08-14       Impact factor: 53.106

2.  Layered genetic control of DNA methylation and gene expression: a locus of multiple sclerosis in healthy individuals.

Authors:  Jean Shin; Celine Bourdon; Manon Bernard; Michael D Wilson; Eva Reischl; Melanie Waldenberger; Barbara Ruggeri; Gunter Schumann; Sylvane Desrivieres; Alexander Leemans; Michal Abrahamowicz; Gabriel Leonard; Louis Richer; Luigi Bouchard; Daniel Gaudet; Tomas Paus; Zdenka Pausova
Journal:  Hum Mol Genet       Date:  2015-07-28       Impact factor: 6.150

Review 3.  Unravelling the human genome-phenome relationship using phenome-wide association studies.

Authors:  William S Bush; Matthew T Oetjens; Dana C Crawford
Journal:  Nat Rev Genet       Date:  2016-02-15       Impact factor: 53.242

Review 4.  Using chromatin marks to interpret and localize genetic associations to complex human traits and diseases.

Authors:  Gosia Trynka; Soumya Raychaudhuri
Journal:  Curr Opin Genet Dev       Date:  2013-11-25       Impact factor: 5.578

5.  WAVELET-BASED GENETIC ASSOCIATION ANALYSIS OF FUNCTIONAL PHENOTYPES ARISING FROM HIGH-THROUGHPUT SEQUENCING ASSAYS.

Authors:  Heejung Shim; Matthew Stephens
Journal:  Ann Appl Stat       Date:  2015       Impact factor: 2.083

6.  TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data.

Authors:  Clément Goubert; Jainy Thomas; Lindsay M Payer; Jeffrey M Kidd; Julie Feusier; W Scott Watkins; Kathleen H Burns; Lynn B Jorde; Cédric Feschotte
Journal:  Nucleic Acids Res       Date:  2020-04-06       Impact factor: 16.971

Review 7.  Determining causality and consequence of expression quantitative trait loci.

Authors:  A Battle; S B Montgomery
Journal:  Hum Genet       Date:  2014-04-26       Impact factor: 4.132

Review 8.  ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions.

Authors:  Terrence S Furey
Journal:  Nat Rev Genet       Date:  2012-10-23       Impact factor: 53.242

Review 9.  Using the ENCODE Resource for Functional Annotation of Genetic Variants.

Authors:  Michael J Pazin
Journal:  Cold Spring Harb Protoc       Date:  2015-03-11

10.  The spliceosome factor SART1 exerts its anti-HCV action through mRNA splicing.

Authors:  Wenyu Lin; Chuanlong Zhu; Jian Hong; Lei Zhao; Nikolaus Jilg; Dahlene N Fusco; Esperance A Schaefer; Cynthia Brisac; Xiao Liu; Lee F Peng; Qikai Xu; Raymond T Chung
Journal:  J Hepatol       Date:  2014-12-03       Impact factor: 25.083

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.