Literature DB >> 31492806

Interpreting Coronary Artery Disease Risk Through Gene-Environment Interactions in Gene Regulation.

Anthony S Findley1, Allison L Richards1, Cristiano Petrini1, Adnan Alazizi1, Elizabeth Doman1, Alexander G Shanku1, Gordon O Davis1, Nancy Hauff2, Yoram Sorokin2, Xiaoquan Wen3, Roger Pique-Regi4,2, Francesca Luca4,2.   

Abstract

GWAS and eQTL studies identified thousands of genetic variants associated with complex traits and gene expression. Despite the important role of environmental exposures in complex traits, only a limited number of environmental factors were measured in these studies. Measuring molecular phenotypes in tightly controlled cellular environments provides a more tractable setting to study gene-environment interactions in the absence of other confounding variables. We performed RNA-seq and ATAC-seq in endothelial cells exposed to retinoic acid, dexamethasone, caffeine, and selenium to model genetic and environmental effects on gene regulation in the vascular endothelium-a common site of pathology in cardiovascular disease. We found that genes near regions of differentially accessible chromatin were more likely to be differentially expressed [OR = (3.41, 6.52), [Formula: see text]]. Furthermore, we confirmed that environment-specific changes in transcription factor binding are a key mechanism for cellular response to environmental stimuli. Single nucleotide polymorphisms (SNPs) in these transcription response factor footprints for dexamethasone, caffeine, and retinoic acid were enriched in GTEx eQTLs from artery tissues, indicating that these environmental conditions are latently present in GTEx samples. Additionally, SNPs in footprints for response factors in caffeine are enriched in colocalized eQTLs for coronary artery disease (CAD), suggesting a role for caffeine in CAD risk. By combining GWAS, eQTLs, and response genes, we annotated environmental components that can increase or decrease disease risk through changes in gene expression in 43 genes. Interestingly, each treatment may amplify or buffer genetic risk for CAD, depending on the particular SNP or gene considered.
Copyright © 2019 by the Genetics Society of America.

Entities:  

Keywords:  TWAS; gene expression response; gene regulation; genetic variation; transcription factor

Mesh:

Substances:

Year:  2019        PMID: 31492806      PMCID: PMC6781890          DOI: 10.1534/genetics.119.302419

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


THE vascular endothelium refers to the single layer of cells lining blood vessels that form an interface between the blood and the rest of the body. The endothelium plays an important role in coagulation, thrombosis, leukocyte extravasation, regulation of vascular tone, and angiogenesis (Rajendran ). Endothelial dysfunction has been implicated in many pathological processes, including atherosclerosis, hypertension, tumor angiogenesis, wound healing, and preeclampsia (Gimbrone and García-Cardeña 2016). For many of these conditions, genome-wide association studies (GWAS) have been instrumental in identifying a large number of genetic variants associated with disease (e.g., Nikpay ). However, the molecular mechanisms linking each variant to the disease phenotype are largely unknown. This is because common risk variants for complex traits tend to fall in noncoding regions and affect regulatory mechanisms that are not yet well characterized (Nica ; Gamazon ; Gusev ; Aguet ). DNase-seq and ATAC-seq (Boyle ; Buenrostro ) are experimental approaches developed to profile the chromatin landscape and can be used to identify active regulatory elements. Transcription factors bound to chromatin leave characteristic “footprints,” allowing for the identification of many bound transcription factors in a single experiment (Pique-Regi ). ATAC-seq offers the advantage of requiring a much smaller number of cells, allowing for increased multiplexing of experiments in small volumes. This enables study of cell lines with differing genetic backgrounds across environmental perturbations, such as the addition of a drug or hormone. Current existing annotations for genetic variants capture regulatory mechanisms in only one arbitrary environment, and thus it is necessary to annotate regulatory regions across multiple conditions. Combining chromatin accessibility data with measures of gene expression, such as RNA-seq, provides insights into the regulatory code governing gene expression. By comparing RNA- and ATAC-seq in the presence/absence of perturbations, it becomes possible to comprehensively study the cellular response and link changes in chromatin accessibility and transcription factor binding to changes in gene expression (Pacis ; Nédélec ; Alasoo ; Gate ). The molecular basis of cellular response can be identified by leveraging genetic differences within and across individuals to pinpoint underlying regulatory elements. One such approach that uses genetic variation is quantitative trait loci (QTL) mapping, whereby genetic variants can be linked to differences in gene expression (eQTL) (e.g., Stranger ; Aguet ) or chromatin accessibility (caQTL) (Degner , among others). Results from these methods, which can be used individually or as complements to one another, have been integrated with GWAS findings to suggest a model in which genetic variants disrupt the chromatin architecture, leading to differences in gene expression that affect complex traits (Gamazon ; Gusev ; Hormozdiari ; Wen ). However, the effect of a genetic variant on a molecular pathway, and, ultimately, on the disease condition, may be modulated by environmental factors (Moyerbrailean et al. 2016b; Knowles , 2018). eQTL studies performed on cells exposed to a stimulus can identify genetic variants whose influence on gene expression response is modulated by the environment, known as response eQTLs (reQTLs) (see for example Maranville ; Mangravite ; Nédélec ; Alasoo ; Knowles ). Known environmental risk factors for endothelial dysfunction include diabetes, smoking, obesity, high cholesterol, and hypertension (Hadi ). Dexamethasone, retinoic acid, caffeine, and selenium are four treatments that have been associated with cardiovascular disease (CVD) and/or endothelial dysfunction (Flores-Mateo ; Pan and Baker 2007; Walker 2007; Turnbull ), and that produce strong transcriptional responses in endothelial cells (Moyerbrailean ). Dexamethasone is a potent glucocorticoid that serves as a proxy for stress (Sapolsky ), and excess may promote calcification within arteriosclerotic lesions (Zhu ). Retinoic acid plays a crucial role in the development of the cardiovascular system, and vascular endothelial cells are exposed to high concentrations of retinoic acid and express retinoid receptors (Saito ). Caffeine, the most widely consumed stimulant worldwide, promotes vasodilation in the endothelium and has been the subject of numerous studies regarding its association with CVD, with conflicting results (Ding ; Turnbull ). Selenium is a component of selenoproteins which act as antioxidants, and selenium has been used as a supplement to increase high-density lipoprotein (HDL) cholesterol (Rayman 2012). Recently, the heterogeneity of the contexts represented in complex datasets has been used as proxy for varying environmental conditions (Knowles ; Zhernakova ) to identify gene-environment interaction (GxE) effects. The advantage of these approaches resides in the possibility of exploiting pre-existing large-scale eQTL studies that profile gene expression in tissues from a heterogeneous sample of individuals; however, the differences in environmental conditions between individuals is not explicitly ascertained. To characterize genetic and environmental risk factors for cardiovascular disease, it is critical to investigate gene regulation in a relevant cell type under controlled treatment conditions. Here, we developed annotations of genetic variants in key regulatory elements for gene expression response to inform the role of GxE in coronary artery disease.

Materials and Methods

Cell culture

Human umbilical vein endothelial cells (HUVECs) were isolated from human umbilical cord tissue collected shortly following birth. Umbilical cord tissue specimens were obtained from 14 healthy, full-term pregnant women, admitted to DMC Hutzel Women’s Hospital (Detroit, MI). All specimens for this study were collected following guidelines approved by the institutional review board (#013213 MP4E) of Wayne State University. Cord specimens, between 10 and 30 cm in length, were first rinsed with warm phosphate-buffered saline (PBS) and a blunt-ended needle was inserted into the umbilical vein at one end of the cord, and subsequently clamped in place. The cord was then purged to remove any excess blood from the vein. The other end of the cord was then sealed, in a manner identical to the first end, and prewarmed 0.25% trypsin-EDTA (Gibco) was then injected into the vein. Following a 20 min incubation at 37°, detached HUVECs were rinsed from the vein, collected by centrifugation, counted, and seeded into an appropriate vessel at 10,000 cells/cm2, in EGM-2 growth medium (Lonza). Expanded cultures were cryopreserved prior to be used in the experiments. For additional three HUVEC lines from a previous study (Moyerbrailean ) we collected ATAC-seq data and used previously published RNA-seq data (dbGaP accession number phs001176.v1.p1).

Treatments

Treatment concentrations were the same as used in Moyerbrailean and were derived from the Clinical Guidelines Mayo Clinic Reference Levels (http://www.mayomedicallaboratories.com) and the CDC National Biomonitoring Report Reference Levels (http://www.cdc.gov/biomonitoring/). Concentrations were M for dexamethasone, M for retinoic acid, M for caffeine, and M for selenium. Vehicle controls were included to represent the solvent used to prepare the different treatments, either ethanol (1 μl of ethanol per 10,000 μl of culturing media) or water. Each individual cell line was treated on a separate day. Culturing conditions were the same as those found in Moyerbrailean . Specifically, 3 days before treatment, cells were plated (to passage 7) in eight wells in six-well plates at 5000/cm2 in EGM-2. Following a 24 hr recovery period, the medium was changed to a “starvation medium,” composed of phenol-red free EGM-2, without Hydrocortisone and Vitamin C, and supplemented with 2% CS-FBS. Cell starvation was continued for 48 hr prior to treatment. For each individual cell line, control treatments were performed in duplicate. Cells were treated for 6 hr, and were then collected by scraping the plate on ice. For four cell lines, each well was counted in order to remove 50,000 cells to be used for ATAC-seq while the rest were used for RNA sequencing.

RNA library preparation

Treated cells were collected by centrifugation at 2000 rpm and washed twice using ice-cold PBS. Collected pellets were lysed on the plate, using Lysis/Binding Buffer (Ambion), and frozen at −80°. Polyadenylated mRNAs were subsequently isolated from thawed lysates using the Dynabeads mRNA Direct Kit (Ambion) following the manufacturer’s instructions. RNA-seq libraries were prepared using a protocol modified from the NEBNext Ultradirectional (NEB) library preparation protocol to use 96 Barcodes from BIOOScientific added by ligation, as described in Moyerbrailean . The individual libraries were quantified using the KAPA real-time PCR system, following the manufacturer’s instructions and using a custom-made series of standards obtained from serial dilutions of the phi-X DNA (Illumina). Libraries were pooled and sequenced in multiple sequencing runs for an average of 50 M 300 bp PE reads.

ATAC-seq library preparation

We followed the protocol by Buenrostro to lyse 50,000 cells and prepare ATAC-seq libraries, with the exception that we used the Illumina Nextera Index Kit (Cat #15055290) in the PCR enrichment step. For three of the four individual cell lines (LP-001, LP-002 and H288-L), the cells were not lysed with 0.1% IGEPAL CA-630 before adding the transposase to begin the ATAC-seq protocol. Individual library fragment distributions were assessed on the Agilent Bioanalyzer and pooling proportions were determined using the qPCR Kapa library quantification kit (KAPA Biosystems). Library pools were run on the Illumina NextSeq 500 Desktop sequencer in the Luca/Pique-Regi laboratory. Barcoded libraries of ATAC-seq samples were pooled and sequenced in multiple sequencing runs for an average coverage of 130 M reads.

Alignment of RNA-seq and ATAC-seq

Reads were aligned to the GRCh37 human reference genome using HISAT2 (Kim ) (https://ccb.jhu.edu/software/hisat2/index.shtml, version hisat2-2.0.4), and the human reference genome (GRCh37) with the following options:where represents the location of the genome file (genome_snp for ATAC-seq alignment, genome_snp_tran for RNA-seq alignment), and and represent that sample’s fastq files. The multiple sequencing runs were merged for each sample using samtools (version 2.25.0). We removed PCR duplicates and further removed reads with a quality score of (equating to reads mapped to multiple locations) for the RNA-sequencing analysis, only.

Differential gene expression analysis

To identify differentially expressed (DE) genes, we used DESeq2 (Love ) (R version 3.2.1, DESeq2 version 1.8.1). Gene annotations from Ensembl version 75 were used and transcripts with <20 reads total were discarded. Bedtools coverage was utilized to count reads with -s to account for strandedness and -split for BED12 input. The counts were then utilized in DESeq2 to determine changes in gene expression under the different treatment conditions. Multiple test correction was performed using the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995) with a significance threshold of 10%. A gene was considered DE if at least one of its transcripts was DE and had an absolute fold change . The model corrected for the library preparation batch and the sample of origin.

Gene ontology analysis

For each treatment, we tested upregulated and downregulated genes separately for enrichment of biological processes gene ontology terms using the Fisher’s test on the AmiGO 2 web browser (http://amigo.geneontology.org/amigo). Multiple test correction was performed using the Benjamini-Hochberg procedure with a significance threshold of 5% (BH-FDR).

Identification of differentially accessible regions following treatment

To identify differentially accessible regions, we used DESeq2 (Love ) (R version 3.2.1, DESeq2 version 1.8.1). We separated the genome into 300 bp regions and bedtools coverage was used to count reads in these regions. A total of 508,140 regions with >0.25 reads per million (high accessibility regions) were then utilized in DESeq2 using the following model:where represents the internal DESeq2 mean of normalized counts of all samples normalized by sequencing depth for region j and experiment n, is the treatment indicator, and parameter is the treatment effect. Differentially accessible regions (DAR) following exposure to dexamethasone, retinoic acid, caffeine, and selenium, were determined at BH FDR %. We annotated the genomic context of these DARs using chromHMM (Ernst and Kellis 2012), which characterizes chromatin states on the basis of histone marks. We downloaded the precomputed HUVEC annotations from ENCODE (ENCODE Project Consortium 2012). Enrichments were performed with the Fisher’s Exact test. DARs were then compared to gene annotations from Ensembl version 75 to identify those that were within 50 kb of a transcription start site, and a Fisher’s exact test was used to test for enrichment of DARs near DE genes.

Overlap of DE genes and DARs across treatments

We used UpSetR (Conway ) to visualize DE genes and DARs shared across treatments. To calculate correlations across treatments, we subset genes or regions of chromatin that were differentially expressed or accessible in any treatment and computed Spearman correlations on the log-fold change.

Transcription factor binding footprints

To detect which transcription factors have footprints in each condition we adapted CENTIPEDE (Pique-Regi ) to use the fragment length information contained in the ATAC-seq in the footprint model, and to jointly use in parallel the treatment and control conditions in order to ensure that the same footprint shape is used for the same motif in both conditions. As in CENTIPEDE, we need to start from candidate binding sites for a given motif model. For each transcription factor we scan the entire human genome (hg19) for matches to its DNA recognition motif using position weight matrix (PWM) models from TRANSFAC and JASPAR as previously described (Pique-Regi ). Then for each candidate location l we collect all the ATAC-seq fragments which are partitioned into four bins depending on the fragment length: (1) [39-99], (2) [100-139], (3) [140-179], (4) [180-250]. For each fragment, the two Tn5 insertion sites were calculated as the position 4 bp after the 5′-end in the 5′ to 3′ direction. Then, for each candidate motif, a matrix X was constructed to count Tn5 insertion events: each row represented a sequence match to motif in the genome (motif instance), and each column a specific cleavage site at a relative base pair and orientation with respect to the motif instance. We built a matrix for each fragment length bin, each using a window half-size S = 150 bp resulting in columns, where W is the length of the motif in base pair. Finally, we fit the CENTIPEDE model in a subset of ∼5000 instances to learn footprint shapes for each factor as in Moyerbrailean . Then we used CENTIPEDE to analyze the remaining motif instances in the genome and kept those with a CENTIPEDE posterior probability higher than 0.99 to denote locations where the transcription factors are bound, also referred to as “footprints.” We further subset these bound transcription factor sites by including only transcription factors that were active genome-wide in any condition, defined by a significant enrichment (Fisher’s exact test, FDR < 10%) of binding sites in high accessibility regions (see above) compared to a randomly selected set of 500,000 chromatin regions genome-wide (not high accessible regions). There were 882 active factors across all treatments.

Identification of response factors and genetic variants in their binding sites

In order to identify transcription factors that are important for response to treatments, we calculated enrichment scores for each active motif in regions of differentially accessible chromatin using a Fisher’s exact test. We defined motifs that were significantly enriched or depleted in opening or closing chromatin (BH FDR %; Table S3 and Table S4). For the enrichment analysis with TORUS, we relaxed the threshold for response factors to nominal significance (P-value ) to get better estimates for the confidence intervals for the enrichment parameters. For each tissue and in each treatment, we then annotated each single nucleotide polymorphism (SNP) from the 1000 Genomes project (1000 Genomes Project Consortium 2015) as (1) not in a footprint for a treatment response factor; (2) in a footprint for a treatment response factor (Table S2). SNPs in footprints are generally enriched in eQTLs (Wen ). Thus, to account for this treatment-independent effect, we also generated an annotation for SNPs in active footprints in either control condition, which should represent a baseline annotation of candidate regulatory variants. We note that the controls are largely redundant, sharing >93% similarity.

Enrichment of GTEx eQTL and CAD variants in footprints for response factors

To identify treatments for which GTEx eQTLs (Aguet ) are enriched in footprints for response factors, we focused on the three artery tissues present in GTEx: tibial, coronary, and aorta. GTEx identified 11,945, 4378, and 9203 genes with an eQTL (eGenes) in the tibial artery, coronary artery, and aorta tissues, respectively. We then used the Torus package (Wen ) with the “-est” option to calculate the enrichment of eQTLs in response factors for each treatment. Torus quantitatively assesses the enrichment of molecular annotations (e.g., SNPs in response factors) in GWAS or eQTL datasets. TORUS is based on a hierarchical model, where the prior probability of association for each SNP in a locus is modeled with a logistic function that has one hyper-parameter for each category (i.e., nonresponse factor control footprint, response factor footprint, and distance to the transcription start site as overall baseline). The confidence interval for each enrichment parameter is estimated using a profile likelihood procedure. In this analysis, the summary statistics for all SNPs that can be imputed and tested in the eQTL or GWA study are used, regardless of SNP P-value. Here we used the annotations generated above for SNPs in footprints for response factors and control. As additional annotation, we included distance of the SNP to the transcriptional start site of the gene. We meta-analyzed the enrichments for each treatment across tissues using a fixed effects model with inverse variance weights. For the CAD enrichment, we used the same annotations as above on the summary statistics of the CAD GWAS, minus the distance of the SNP to the transcriptional start site of a gene. GWAS SNPs were partitioned into blocks based on linkage disequilibrium (LD) pattern.

Genotyping and colocalization

We performed colocalization as previously described using enloc (Wen ). Briefly, enloc integrates QTL annotations into GWAS analysis by calculating the enrichment of QTL in complex trait-associated genetic variants, which is then used to identify colocalized SNPs between the two annotations. We performed the colocalization between the CARDIoGRAM CAD summary statistics and eQTLs from all GTEx tissues independently. We considered the lead SNP from a locus with a cumulative posterior inclusion probability (PIP) of >0.5 to be colocalized; 168 SNPs colocalize, corresponding to 139 genes. To identify polymorphic colocalized SNPs in our dataset, DNA was isolated from umbilical cords and sent to be genotyped using ultralow sequencing performed by Gencove. In our dataset, 21 of the colocalized SNPs (corresponding to 14 eGenes) are polymorphic, and, therefore, can be tested for reQTL.

reQTL mapping

For reQTL mapping of the colocalized GTEx and CAD GWAS SNPs, we tested all 21 SNPs for reQTLs, corresponding to 33 gene-SNP pairs. We used a linear model to identify reQTLs, testing for the effect of genotype dosage on quantile-normalized gene expression fold change between treatment and control. We generated an empirical null distribution by permuting sample labels, and corrected the treatment P-values based on the ranks observed in the permutation. We performed 2820 permutations, corresponding to 3x the number of tests in the treated samples.

TWAS

We used GAMBIT (https://github.com/corbinq/GAMBIT) to conduct a TWAS on the CARDIoGRAM GWAS using the three artery tissues from GTEx as reference gene expression sets. Using a threshold of , there were 44, 17, and 49 genes significantly associated with CAD for the tibial artery, coronary artery, and aorta, respectively, for a total of 74 unique genes in at least one of the three arteries.

Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. Raw sequencing data, including RNA-seq on all 17 individuals, ATAC-seq on 4 individuals, and genotypes for all 17 individuals, have been deposited in the NCBI dbGaP (https://www.ncbi.nlm.nih.gov/gap/) under study ID phs001176.v3.p1.Supplemental text, Supplemental Material, Figure S1, and Tables S1–S8 are available at http://genome.grid.wayne.edu/FUNGEI/data_tables/.

Results

Exposure of endothelial cells to environmental perturbations leads to changes in gene expression and chromatin accessibility

To determine the impact of genetic variation on endothelial cell response to environmental perturbations, we treated human umbilical vein endothelial cells (HUVECs) from 17 unrelated, healthy donors with four compounds (dexamethasone, retinoic acid, caffeine, and selenium) and two vehicle controls. These treatments have been associated with cardiovascular disease and/or endothelial dysfunction (Flores-Mateo ; Pan and Baker 2007; Walker 2007; Turnbull ) and have been shown in a previous study (Moyerbrailean ) to produce strong transcriptional responses in HUVECs. We then assessed changes in gene expression with RNA-seq in all samples and chromatin accessibility with ATAC-seq in four samples at 6 hr following treatment (Figure 1A).
Figure 1

Changes in gene expression and chromatin accessibility in HUVECs treated with dexamethasone, retinoic acid, caffeine, and selenium. (A) Diagram of artery layers and study design. HUVECs were treated separately with four compounds and vehicle controls prior to RNA- and ATAC-seq (Materials and Methods). (B) QQ-plot of P-values from DESeq2 analysis of gene expression in response to retinoic acid. The purple dots represent genes within 50 kb of a DAR, while the gray dots represent genes not within 50 kb of a DAR. (C) Enrichment of DE genes within 50 kb of a DAR. Odds ratio was estimated using Fisher’s exact test, and for all treatments P.

Changes in gene expression and chromatin accessibility in HUVECs treated with dexamethasone, retinoic acid, caffeine, and selenium. (A) Diagram of artery layers and study design. HUVECs were treated separately with four compounds and vehicle controls prior to RNA- and ATAC-seq (Materials and Methods). (B) QQ-plot of P-values from DESeq2 analysis of gene expression in response to retinoic acid. The purple dots represent genes within 50 kb of a DAR, while the gray dots represent genes not within 50 kb of a DAR. (C) Enrichment of DE genes within 50 kb of a DAR. Odds ratio was estimated using Fisher’s exact test, and for all treatments P. Each treatment caused significant transcriptional responses, with >2500 differentially expressed genes in each environment (Benjamini-Hochberg (BH) , ) (Table 1). By gene ontology analysis, we identified an enrichment of differentially expressed genes implicated in biological processes relevant for endothelial function. For example, response to wounding is enriched in genes downregulated in response to dexamethasone, and regulation of blood vessel diameter is enriched in genes downregulated in response to retinoic acid (Table S1).
Table 1

Number of differentially expressed genes and differentially accessible regions

TreatmentDE genesDARs
Retinoic acid4,874222
Dexamethasone2,879133
Caffeine5,7901038
Selenium10,329196
Across the four samples for which ATAC-seq was obtained, we identified between 133 and 1038 DARs in a given treatment condition (Table 1). The smaller number of DARs compared with DE genes could be due to the timescale at which we measured response, the increased stochasticity of chromatin opening and closing relative to gene expression, and/or insufficient statistical power. We used chromHMM (Ernst and Kellis 2012) chromatin states derived from ENCODE HUVECs histone marks (Dunham et al. 2012) to annotate DARs. We found an enrichment of DARs in strong enhancer regions in dexamethasone, retinoic acid, and selenium (OR = 5.31, 3.89, and 3.51, respectively. P for all). However, caffeine DARs are only slightly enriched in strong enhancers (OR , P ). Instead, DARs in caffeine are most enriched for active promoters (OR , P ). This difference between treatments could indicate that different types of factors are involved in each response. We compared the degree to which transcriptional and chromatin changes were shared across treatments by overlapping DE genes and DARs, respectively, across treatments (Figure 2). Thousands of DE genes are shared by at least two treatments, with 312 genes being differentially expressed across all four. Due to the reduced number of DARs, we have less power to detect sharing across treatments (Figure 2B). As a complementary approach, we calculated the correlation of the log-fold change across all treatments for genes differentially expressed (Figure 2A) and regions differentially accessible (Figure 2B) in any condition.
Figure 2

Shared DE genes and DARs across treatments. (A) DE gene sharing. Vertical bars indicate the number of genes differentially expressed in each group of treatments. Inset heatmap depicts the Spearman correlation of the logFC of all DE genes between treatments. (B) DAR sharing. Vertical bars indicate the number of DARs in each group of treatments. Inset heatmap depicts the Spearman correlation of the logFC of all DARs between treatments.

Shared DE genes and DARs across treatments. (A) DE gene sharing. Vertical bars indicate the number of genes differentially expressed in each group of treatments. Inset heatmap depicts the Spearman correlation of the logFC of all DE genes between treatments. (B) DAR sharing. Vertical bars indicate the number of DARs in each group of treatments. Inset heatmap depicts the Spearman correlation of the logFC of all DARs between treatments. To investigate the relationship between differential chromatin accessibility and transcriptional response, we calculated the enrichment of DE genes within 50 kb of a DAR. DE genes in all treatments were significantly enriched near DARs identified in the same treatment. The greatest enrichments were found for dexamethasone and retinoic acid with odds ratios of 3.35 and 3.99, respectively (Figure 1, B and C and Figure S1; Fisher’s exact test; P ). The significant enrichment for DE genes among DARs in dexamethasone and retinoic acid could be due to their mechanism of action, specifically that both treatments are known to work largely through specific nuclear receptors that bind the DNA and influence gene expression directly. Overall these results suggest that changes in chromatin accessibility are an important mechanism through which cells regulate gene expression in response to environmental perturbation.

Specific transcription factors regulate response to treatment

In addition to informing on open chromatin, ATAC-seq data can be analyzed to characterize the transcription factor landscape governing response to treatment. This is particularly important given that the vast majority of genetic variants found to influence complex traits act through gene regulatory regions. In order to identify transcription factors controlling response to each treatment, we used CENTIPEDE (Pique-Regi ) to perform footprinting analysis on the ATAC-seq data. Across all conditions, we identified a total of 882 active transcription factors, defined as those with motifs that have CENTIPEDE footprints and are associated with open chromatin (Materials and Methods, Table S7). We hypothesized that transcription factors that are important in the regulation of a particular treatment will be more likely to be found in DARs. The factors with the greatest enrichments in DARs are shown in Figure 3. We refer to these enriched factors as response factors. We identified 293 factors with footprints in regions that become more accessible after treatment, and 38 factors in regions that are less accessible after treatment (Table 2). For well-studied treatments, like retinoic acid and dexamethasone, our approach identified transcription factors known to be central to the regulation of gene expression in response to those treatments. For example, the top retinoic acid response factor was the retinoic acid receptor (Figure 3B). For dexamethasone, the androgen receptor (AR), glucocorticoid receptor (GR), and progesterone receptor (PR) have similar binding sites, and likely all represent sites of GR binding (Schoenmakers ; Severson ). While for these treatments, we have prior information on the major mechanisms that regulate the cellular response, for caffeine and selenium less is known, and our data can be used to identify the relevant factors (Table S3 and Table S4). Caffeine acts as a phosphodiesterase inhibitor, which increases intracellular cyclic AMP (cAMP) (Essayan 2001). One of the top caffeine response factors in opening chromatin is CREB (cAMP response element-binding protein). In response to selenium treatment the top enriched factors are in regions that become less accessible after treatment (Figure 3).
Figure 3

Identification of response factors. Topmost enriched transcription factor footprints in DARs in each condition. For selenium, the top factors are enriched in regions of closing chromatin. For all other treatments, the top factors are enriched in regions of opening chromatin.

Table 2

Number of response factors enriched in opening and closing chromatin and SNPs in response factors footprints

TreatmentOpening chromatinClosing chromatinAnnotated SNPs
Retinoic acid170438,507
Dexamethasone150218,793
Caffeine261161,887,282
Selenium022710,611
Identification of response factors. Topmost enriched transcription factor footprints in DARs in each condition. For selenium, the top factors are enriched in regions of closing chromatin. For all other treatments, the top factors are enriched in regions of opening chromatin.

Identifying latent environmental factors in large-scale eQTL datasets from vascular tissues

Large-scale efforts to identify expression eQTLs, such as GTEx (Aguet ), have discovered genetic regulators of gene expression in many different tissues, including three types of artery. However, GTEx samples likely represent a composite of different environmental exposures. Here, we propose that latent, unobserved environmental factors contribute to eQTL effects detected in GTEx, and that some of these eQTLs may represent cases of significant interaction effects in specific environmental contexts. To indirectly test this hypothesis, we considered that interactions with latent environments represented by our four in vitro treatments (retinoic acid, caffeine, selenium, and dexamethasone) would be enriched in response factor binding sites. We used centiSNPs (Moyerbrailean ), an extension of the CENTIPEDE framework, to annotate SNPs in response factor footprints, as these SNPs are most likely to disrupt transcription factor binding and modify the transcriptional response. We found that out of 4,104,579 SNPs in footprints for active factors, 56% are in footprints for response factors for at least one of the treatments. We then estimated enrichments for our treatment-specific annotations in eQTLs from the artery samples in GTEx. Using the EM-DAP1 algorithm implemented in the software package TORUS (Wen ), we found enrichment of eQTLs in footprints for dexamethasone, retinoic acid, and caffeine response factors, but not for selenium (Figure 4A). These results suggest that exposure to retinoic acid, caffeine, and dexamethasone (or a closely related factor; e.g., cortisol elevated by stress) are represented as latent environments in the GTEx samples and can modulate eQTL effects.
Figure 4

Latent environments in GTEx and CAD risk. (A) Enrichment of GTEx eQTLs in SNPs within response factor footprints for each treatment. (B) Enrichment of coronary artery disease risk loci in SNPs within response factor footprints for each treatment. (C) For the genes shown in (D), number of concordant (treatment changes gene expression in the same direction as TWAS risk) and discordant (vice versa) gene/treatment pairs. (D) Comparison of logFC of gene expression vs. CAD TWAS z-score. A positive logFC indicates that the treatment increases gene expression. A positive TWAS value indicates that increased expression of the gene is associated with CAD risk.

Latent environments in GTEx and CAD risk. (A) Enrichment of GTEx eQTLs in SNPs within response factor footprints for each treatment. (B) Enrichment of coronary artery disease risk loci in SNPs within response factor footprints for each treatment. (C) For the genes shown in (D), number of concordant (treatment changes gene expression in the same direction as TWAS risk) and discordant (vice versa) gene/treatment pairs. (D) Comparison of logFC of gene expression vs. CAD TWAS z-score. A positive logFC indicates that the treatment increases gene expression. A positive TWAS value indicates that increased expression of the gene is associated with CAD risk.

Latent environments in coronary artery disease

To identify which conditions may be enriched for putative latent GxE in coronary artery disease (CAD), we performed an enrichment analysis for SNPs in footprints for response factors using CAD GWAS data (Nikpay ) (Figure 4B). Among the treatment conditions we explored, we found a strikingly high enrichment of 14.5-fold for CAD-associated variants that are found in caffeine response factor footprints. This enrichment is also significantly higher than that found for footprints that are identified in the control condition, which represent regulatory variants that do not interact with the treatments considered here. This suggests a large number of latent GxE effects in cardiovascular disease risk. To further investigate the role of gene expression changes induced by environmental perturbations in CAD risk, we used Transcriptome Wide Association Study (TWAS). TWAS (Gamazon ; Barbeira ; Gusev ) is an approach based on Mendelian randomization that was developed to infer the causal relationship between gene expression and complex trait variation, using genetic information (genotypes) as instrumental variables. Importantly, TWAS assigns a direction of effect of gene expression on the trait (i.e., whether increased or decreased expression is associated with trait risk). We have performed a TWAS on the CARDIoGRAM GWAS using the three artery tissues from GTEx as reference gene expression sets using the GAMBIT software (https://github.com/corbinq/GAMBIT). There were 44, 17, and 49 genes significantly associated with CAD () for the tibial artery, coronary artery, and aorta, respectively, for a total of 74 unique genes in at least one of the three arteries; 43 of these genes were differentially expressed in at least one condition in our data. For these genes, we compared the TWAS z-score to the change in gene expression in each condition for which the gene is differentially expressed (Figure 4D). Positive TWAS z-scores indicate that increased expression of the gene is associated with increased risk of CAD. Building on the conceptual framework established by Marigorta for coherent and incoherent genetic effects, we consider instances where the TWAS effect and gene expression response act in the same direction to be concordant, indicating that the treatment effect on that gene amplifies the genetic CAD risk (i.e., treatment increases gene expression, and increased gene expression is associated with increased CAD risk). Neither concordant nor discordant effects predominate for any treatment, indicating that the effect of these treatments on CAD risk is not broadly protective or harmful. Rather, it is a composite of effects at many genes, which can be additive to the genetic effect, or, as we show below, can be further influenced by interactions with genetic factors (GxE). To further dissect the molecular mechanism for GxE in CAD risk, we integrated GTEx artery eQTLs with the GWAS summary statistics and GxE effects in gene expression in our data. We first performed colocalization of GTEx eQTLs with CAD GWAS hits using enloc (Wen ) (Materials and Methods) and identified 168 SNP-gene pairs; 40 GTEx tissues had at least one colocalized eQTL. The most represented tissues for colocalized SNPs were the aorta and tibial artery, with 56 and 28 colocalized SNPs, respectively (Table S5). Of the colocalized genes, 14 can be tested for reQTL in our data and 6 have a reQTL in at least one of the treatment conditions considered (permutation P < 0.05, see Table S6). Although with our sample size we do not have the power to detect genome-wide significant signals, we provide examples of our reQTLs (Figure 5). The reQTL for MAT2A acts in different directions depending on the treatment. In Figure 5, A and B, we show MAT2A has a negative reQTL effect following treatment with selenium. However, the reQTL effect is positive following treatment with retinoic acid (first panel of Figure 5C).
Figure 5

reQTLs. (A) Boxplot depicting logFC for selenium reQTL for MAT2A (B) Same reQTL as in (A), but with trendlines representing gene expression in the treatment and control conditions. (C) All reQTLs. Colors indicate gene expression in the treatment condition (blue, dexamethasone; red, caffeine; orange, selenium; purple, caffeine; black, control).

reQTLs. (A) Boxplot depicting logFC for selenium reQTL for MAT2A (B) Same reQTL as in (A), but with trendlines representing gene expression in the treatment and control conditions. (C) All reQTLs. Colors indicate gene expression in the treatment condition (blue, dexamethasone; red, caffeine; orange, selenium; purple, caffeine; black, control). We now develop a conceptual framework that integrates TWAS, gene expression changes, and reQTLs to illustrate how genetic variation modulates the effect of an environmental exposure on a complex trait (Figure 6). We describe eight possible scenarios, which can be divided into four groups (Figure 6A) based on the relationships between TWAS and treatment effect direction, and treatment and reQTL effect direction. A positive TWAS value indicates that increased expression of the gene is associated with increased risk. A positive gene expression change indicates that the treatment increases gene expression. The treatment effect can act in the same direction as the TWAS effect, indicating that the treatment increases risk (Groups 1 and 2), or they can act in opposite directions, where the treatment buffers risk (Groups 3 and 4).
Figure 6

reQTLs modulate environmental effects on complex traits. (A) Table relating TWAS, gene expression, and reQTL effects. (B) Risk associated with treatment vs. genetic modulation by reQTLs. The risk associated with treatment represents the TWAS effect multiplied by the sign of the logFC of the treatment. Positive values indicate that the treatment amplifies CAD risk. Genetic modulation represents the reQTL effect multiplied by the sign of the logFC of the treatment. Positive values indicate that the risk allele amplifies the treatment effect.

reQTLs modulate environmental effects on complex traits. (A) Table relating TWAS, gene expression, and reQTL effects. (B) Risk associated with treatment vs. genetic modulation by reQTLs. The risk associated with treatment represents the TWAS effect multiplied by the sign of the logFC of the treatment. Positive values indicate that the treatment amplifies CAD risk. Genetic modulation represents the reQTL effect multiplied by the sign of the logFC of the treatment. Positive values indicate that the risk allele amplifies the treatment effect. Because our reQTLs originated from GWAS-eQTL colocalization, we have polarized the reQTL effect to the CAD risk allele. Therefore, a positive reQTL effect indicates that the risk allele causes an increased change in gene expression relative to the nonrisk allele following exposure to a treatment. The reQTL and treatment effect can act in the same direction, indicating that the risk allele amplifies the environmental effect (Groups 1 and 4), or opposite direction, buffering the environmental effect (Groups 2 and 3). Of the 10 reQTL we identified, six are in Groups 1 and 3, where the reQTL increases disease risk by amplifying harmful environmental effects (Group 1) or buffering protective environmental effects (Group 3) (Figure 6B). The remaining four are in Groups 2 and 4, where the reQTL decreases disease risk by buffering harmful environmental effects (Group 2) or enhancing protective environmental effects (Group 4). These results show that environmental effects on disease risk are contingent on the individual’s genotype, with certain exposures amplifying a specific genetic risk but buffering others.

Discussion

Vascular endothelial dysfunction has been implicated in a variety of cardiovascular and noncardiovascular diseases. Here we have characterized the transcriptional and chromatin landscape in endothelial cells exposed to dexamethasone, retinoic acid, caffeine, and selenium. While all of these treatments cause significant changes in gene expression and chromatin accessibility, they act through varied pathways. Dexamethasone and retinoic acid are steroid hormones, whose action at the molecular and transcriptional level has been well-studied (Oakley and Cidlowski 2013; di Masi ). Caffeine and selenium have been investigated at the molecular level, particularly with regard to caffeine’s role in inhibiting the breakdown of cAMP and selenium’s role as a cofactor (Echeverri ; Roman ). However, the regulators of gene expression response to these compounds are less characterized, especially in endothelial cells. We were able to confirm with retinoic acid and dexamethasone that our approach in analyzing ATAC-seq data identifies biologically relevant response factors, as we found enrichment of known intracellular steroid receptors like the retinoic acid receptor and glucocorticoid receptor. Upon extension of this framework to caffeine and selenium, we identified key response factors for these treatments that were previously unknown (e.g., Sp2 for caffeine). Combining the ATAC-seq data with known genetic variation in human populations, we have generated a set of annotations for genetic variants in the footprints of treatment response factors (1300 motifs genome-wide) that may affect binding, and, therefore, gene expression response in the different conditions considered. Using this annotation in combination with large-scale genetic studies of gene expression (GTEx) and CAD risk (Cardiogram), we detected potential latent environmental effects and assigned a putative mechanism of action for variants associated with these molecular and organismal traits, through disruption of transcription factor binding. We found that dexamethasone, retinoic acid, and caffeine were enriched in GTEx eQTLs from three different arteries in our response factor annotations. This is not surprising given the widespread use of caffeine in the general population and the expression of the glucocorticoid receptor and retinoic acid receptor in most GTEx tissues (Aguet ). Interestingly, we did not find an enrichment for the selenium annotation, which is expected considering that we are exposed to only limited amounts of selenium on a daily basis. Our observation that genetic loci associated with risk of coronary artery disease are highly enriched in the binding sites of transcription factors that regulate response to caffeine, suggests that caffeine, or a compound that activates a similar regulatory response, modulates coronary artery disease risk. This is an intriguing finding given the extensive number of studies that have explored the link between caffeine, coffee consumption, and heart disease (Ding ; Grosso ; Turnbull ). Here, we developed a novel conceptual framework that combines information on specific environmental perturbations collected in vitro with large-scale heterogeneous datasets interrogating genetic effects on both molecular and organismal phenotypes. While our study demonstrates that there is a significant enrichment for genetic associations in context specific regulatory annotations, our small sample size had limited power to directly validate these latent GxE effects. Nevertheless, this approach can be used to guide the design of future studies to directly test for GxE. Increasing the sample size would likely have resulted in a greater number of genome-wide significant reQTLs, but there is a trade-off between sample size and the number of environmental conditions which can be assayed in one experiment. For example, one of the first studies of reQTLs identified 26 reQTLs in 120 LCLs in response to dexamethasone (Maranville ). More recently, 503 and 244 reQTLs were identified for Salmonella and Listeria in 175 macrophages (Nédélec et al. 2016). In a panel of 45 iPSC-derived cardiomyocyte lines treated with varying concentrations of doxorubicin, 447 reQTL were discovered (Knowles ), while 387 reQTL were discovered in 86 iPSC-derived macrophage lines treated with interferon gamma and Salmonella infection (Alasoo ). Based on these and other studies, it is difficult to determine an ideal sample size, as power to identify reQTLs depends on the stimulus considered and the method used. We show that, in principle, we can leverage molecular phenotyping data in response to tightly controlled treatments to characterize the complex interplay between genotypes and environment and their role in complex traits, highlighting the importance of latent environmental exposures in large-scale datasets. When considering CAD risk, both concordant and discordant GxE effects on gene expression are observed across multiple genes without either trend predominating. This would indicate that the effect of these treatments on CAD risk is not broadly protective or harmful when all genes are considered. Environmental effects therefore may explain the discrepancies between transcriptional risk scores and genetic risk scores (Marigorta ), and the limited portability of genetic risk scores even within populations (Mostafavi ). In the context of precision medicine, these latent environmental exposures need to be accounted for when predicting genetic risk for diseases, as genetic effects measured in large datasets may indeed represent GxE, and, consequently, accurate prediction of genetic risk can be achieved only by accounting for these interactions.
  53 in total

Review 1.  How do glucocorticoids influence stress responses? Integrating permissive, suppressive, stimulatory, and preparative actions.

Authors:  R M Sapolsky; L M Romero; A U Munck
Journal:  Endocr Rev       Date:  2000-02       Impact factor: 19.871

Review 2.  Glucocorticoids and cardiovascular disease.

Authors:  Brian R Walker
Journal:  Eur J Endocrinol       Date:  2007-11       Impact factor: 6.664

3.  Genetic Ancestry and Natural Selection Drive Population Differences in Immune Responses to Pathogens.

Authors:  Yohann Nédélec; Joaquín Sanz; Golshid Baharian; Zachary A Szpiech; Alain Pacis; Anne Dumaine; Jean-Christophe Grenier; Andrew Freiman; Aaron J Sams; Steven Hebert; Ariane Pagé Sabourin; Francesca Luca; Ran Blekhman; Ryan D Hernandez; Roger Pique-Regi; Jenny Tung; Vania Yotova; Luis B Barreiro
Journal:  Cell       Date:  2016-10-20       Impact factor: 41.582

4.  Integrative approaches for large-scale transcriptome-wide association studies.

Authors:  Alexander Gusev; Arthur Ko; Huwenbo Shi; Gaurav Bhatia; Wonil Chung; Brenda W J H Penninx; Rick Jansen; Eco J C de Geus; Dorret I Boomsma; Fred A Wright; Patrick F Sullivan; Elina Nikkola; Marcus Alvarez; Mete Civelek; Aldons J Lusis; Terho Lehtimäki; Emma Raitoharju; Mika Kähönen; Ilkka Seppälä; Olli T Raitakari; Johanna Kuusisto; Markku Laakso; Alkes L Price; Päivi Pajukanta; Bogdan Pasaniuc
Journal:  Nat Genet       Date:  2016-02-08       Impact factor: 38.330

5.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

6.  Bacterial infection remodels the DNA methylation landscape of human dendritic cells.

Authors:  Alain Pacis; Ludovic Tailleux; Alexander M Morin; John Lambourne; Julia L MacIsaac; Vania Yotova; Anne Dumaine; Anne Danckaert; Francesca Luca; Jean-Christophe Grenier; Kasper D Hansen; Brigitte Gicquel; Miao Yu; Athma Pai; Chuan He; Jenny Tung; Tomi Pastinen; Michael S Kobor; Roger Pique-Regi; Yoav Gilad; Luis B Barreiro
Journal:  Genome Res       Date:  2015-09-21       Impact factor: 9.043

7.  Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease.

Authors:  Urko M Marigorta; Lee A Denson; Jeffrey S Hyams; Kajari Mondal; Jarod Prince; Thomas D Walters; Anne Griffiths; Joshua D Noe; Wallace V Crandall; Joel R Rosh; David R Mack; Richard Kellermayer; Melvin B Heyman; Susan S Baker; Michael C Stephens; Robert N Baldassano; James F Markowitz; Mi-Ok Kim; Marla C Dubinsky; Judy Cho; Bruce J Aronow; Subra Kugathasan; Greg Gibson
Journal:  Nat Genet       Date:  2017-08-14       Impact factor: 38.330

8.  Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization.

Authors:  Xiaoquan Wen; Roger Pique-Regi; Francesca Luca
Journal:  PLoS Genet       Date:  2017-03-09       Impact factor: 5.917

9.  Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.

Authors:  Alvaro N Barbeira; Scott P Dickinson; Rodrigo Bonazzola; Jiamao Zheng; Heather E Wheeler; Jason M Torres; Eric S Torstenson; Kaanan P Shah; Tzintzuni Garcia; Todd L Edwards; Eli A Stahl; Laura M Huckins; Dan L Nicolae; Nancy J Cox; Hae Kyung Im
Journal:  Nat Commun       Date:  2018-05-08       Impact factor: 14.919

10.  Genetic determinants of co-accessible chromatin regions in activated T cells across humans.

Authors:  Rachel E Gate; Christine S Cheng; Aviva P Aiden; Atsede Siba; Marcin Tabaka; Dmytro Lituiev; Ido Machol; M Grace Gordon; Meena Subramaniam; Muhammad Shamim; Kendrick L Hougen; Ivo Wortman; Su-Chen Huang; Neva C Durand; Ting Feng; Philip L De Jager; Howard Y Chang; Erez Lieberman Aiden; Christophe Benoist; Michael A Beer; Chun J Ye; Aviv Regev
Journal:  Nat Genet       Date:  2018-07-09       Impact factor: 41.307

View more
  7 in total

1.  Dietary Caffeine Synergizes Adverse Peripheral and Central Responses to Anesthesia in Malignant Hyperthermia Susceptible Mice.

Authors:  Monica Aleman; Rui Zhang; Wei Feng; Lihong Qi; Jose R Lopez; Chelsea Crowe; Yao Dong; Genady Cherednichenko; Isaac N Pessah
Journal:  Mol Pharmacol       Date:  2020-08-06       Impact factor: 4.436

Review 2.  Genetic Insights Into Smooth Muscle Cell Contributions to Coronary Artery Disease.

Authors:  Doris Wong; Adam W Turner; Clint L Miller
Journal:  Arterioscler Thromb Vasc Biol       Date:  2019-06       Impact factor: 8.311

Review 3.  Where Are the Disease-Associated eQTLs?

Authors:  Benjamin D Umans; Alexis Battle; Yoav Gilad
Journal:  Trends Genet       Date:  2020-09-07       Impact factor: 11.639

4.  Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes.

Authors:  Michelle C Ward; Nicholas E Banovich; Abhishek Sarkar; Matthew Stephens; Yoav Gilad
Journal:  Elife       Date:  2021-02-08       Impact factor: 8.140

5.  Functional dynamic genetic effects on gene regulation are specific to particular cell types and environmental conditions.

Authors:  Anthony S Findley; Alan Monziani; Allison L Richards; Katherine Rhodes; Michelle C Ward; Cynthia A Kalita; Adnan Alazizi; Ali Pazokitoroudi; Sriram Sankararaman; Xiaoquan Wen; David E Lanfear; Roger Pique-Regi; Yoav Gilad; Francesca Luca
Journal:  Elife       Date:  2021-05-14       Impact factor: 8.140

6.  Psychosocial experiences modulate asthma-associated genes through gene-environment interactions.

Authors:  Justyna A Resztak; Allison K Farrell; Henriette Mair-Meijers; Adnan Alazizi; Xiaoquan Wen; Derek E Wildman; Samuele Zilioli; Richard B Slatcher; Roger Pique-Regi; Francesca Luca
Journal:  Elife       Date:  2021-06-18       Impact factor: 8.713

7.  Transcription factor regulation of eQTL activity across individuals and tissues.

Authors:  Elise D Flynn; Athena L Tsu; Silva Kasela; Sarah Kim-Hellmuth; Francois Aguet; Kristin G Ardlie; Harmen J Bussemaker; Pejman Mohammadi; Tuuli Lappalainen
Journal:  PLoS Genet       Date:  2022-01-31       Impact factor: 5.917

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.