Literature DB >> 35712020

Context-dependant enhancers as a reservoir of functional polymorphisms and epigenetic markers linked to alcohol use disorders and comorbidities.

Alasdair MacKenzie¹, Elizabeth A Hay¹, Andrew R McEwan¹.

Abstract

Alcohol use disorder (AUD) is one of the major causes of mortality and morbidity world-wide. It is estimated that 50% of the causes of AUD are heritable. Efforts to determine the genetic determinants governing AUD using genome wide association studies (GWAS) show that the most strongly associated SNPs occur within, or in the vicinity of, genes encoding enzymes that metabolise ethanol. However, these studies were not so conclusive in identifying the genes that influenced the choice to drink ethanol or why a proportion of the population become addicted. Most importantly, these studies also found that over 98% of the 1292 SNPs associated with AUD (p<1 × 10-6) were found outside of coding regions and within the poorly understood non-coding genome. Many years of study have shown that functional components of the non-coding genome include enigmatic enhancer elements whose biological role is to modulate levels of gene expression in specific cells, in specific amounts and in response to the correct stimuli. The current short review introduces the functional components of the non-coding genome, such as promoters and enhancers, and critically assesses the latest methods of identifying and characterising their context dependant roles in AUD and mental health disorders. We then go on to examine what is known about the roles of enhancers, such as the GAL5.1 enhancer, in alcohol intake and explore how enhancers are affected by polymorphic variation and epigenetic markers such as DNA-methylation and may influence susceptibility to AUD. The review finishes by discussing the future of AUD genetics and what technologies will need to be brought to bear to understand how genetic and environmentally induced changes in enhancer structure may contribute to the need to drink alcohol to excess.

Entities: Chemical

Keywords: Alcohol use disorder; CRISPR genome editing; Chromatin modification; Comparative genomics; Complex disease; Enhancer; Gene regulation; Genome wide association studies; Mental health; Polymorphisms; Promoter

Year: 2022 PMID： 35712020 PMCID： PMC9101288 DOI： 10.1016/j.addicn.2022.100014

Source DB: PubMed Journal: Addict Neurosci ISSN： 2772-3925

The problem at hand

Alcohol use disorders (AUD) continue to be a major problem in western countries like the UK partly because of its increased affordability, where alcohol is 74% more affordable in 2019 than it was in 1987 [1]. Although there are several reports that suggest the positive effects of moderate alcohol intake on cardiovascular health [2] the general effect of problematic alcohol use on society is overwhelmingly negative with 24% of adults in the UK drinking over the recommended 14 units a week, a level of alcohol intake that increases the risks of developing cancers, cardiovascular diseases and liver disease [3]. Worldwide, alcohol causes 5.3% of all deaths (>3 million people per year) and accounts for 132.6 million disability adjusted life years [3]. Thus, AUD presents a major health, societal and economic burden to countries worldwide. The challenge is to understand the factors which contribute to AUD so that preventive or therapeutic strategies can be designed and implemented.

Genetic studies of AUD

Based on adoption and twin studies, the genetic liability for alcohol abuse is estimated to be around 50% [4]. Genome wide association studies (GWAS, Fig. 1B) have indicated that the greatest risk loci for AUD centre on two genes, alcohol dehydrogenase (ADH1B) and aldehyde dehydrogenase (ALDH2), which encode enzymes that metabolise ethanol [5,6]. These studies suggest that a component in the development of AUD [7] may involve a change in the expression or function of these enzymes although the precise mechanisms of their involvement in the development of AUD remains to be established. However, the involvement of metabolising enzymes, that are primarily expressed in the liver, does not address why many people choose to drink alcohol and why alcohol use may develop into an addiction. It is widely understood that the decision to drink alcohol is modulated by regions of the brain, that include the hypothalamus and that addiction involves regions that include the nucleus accumbens [8,9]. Aside from the afore mentioned metabolic genes, the majority of genetic risk seems to be spread amongst a large number of variants each with small effects, a known common feature of the genetics of complex diseases [10]. In addition to identifying the ethanol metabolising enzymes discussed above, two major GWAS studies identified other loci with an association to AUD and identified several genes including KLB (β-klotho), GCKR (Glucokinase regulatory protein), CADM2 (Cell Adhesion molecule 2), FAM69C (Family with sequence similarity 69, member C), STPG2 (Sperm Tail PG-Rich Repeat Containing 2) and DNAJB14 (DnaJ Heat Shock Protein Family (Hsp40) Member B14) with a subsequent replicating study also identifying JCAD (Junctional Cadherin 5 Associated)SLC39A13 (Solute Carrier Family Member 13) and CRHR1 (Corticotropin Releasing Hormone Receptor 1) [5,6]. The most interesting of these genes is CRHR1 whose involvement in AUD and the stress response, has been extensively explored [11]. However, the specific SNP identified in this study (rs62062288) falls within an intron of the MAPT gene (encodes Tau protein) and over 200 kb away from the transcriptional start site of the CRHR1 gene, possibly reflecting the degrees of linkage disequilibrium (LD; where groups of alleles do not segregate randomly in a population) present within this region of the genome. Therefore, it is unclear how this SNP may functionally contribute to the presentation of AUD. Significantly, out of the 1292 SNPs that exceeded the p<1 × 10−6 threshold required to achieve significance in GWAS studies only 25 of these SNPs fell within exonic regions [6]. These results are fairly typical of the data derived from current GWAS analyses of complex disease where the vast majority of SNPs which exceed the p<1 × 10−6 threshold of significance are intronic or intergenic [10,12]. Based on these observations it is likely that the greatest burden of AUD causing SNPs do not lie within the coding regions of genes but within the unknown, and enigmatic, non-coding genome. Consequently, the main aim of the current review will not be to explore the known genetics or neuroscience of AUD in any depth, a subject which has been well reviewed in a number of other publications [8,9], but to briefly and critically appraise what we know about the information sources contained in the “non-coding” genome that are important in health, what techniques are currently used to understand the role of the non-coding genome in alcohol intake and what we need to do in the future to better understand its biology.

Fig. 1

Flow diagram demonstrating the relationship between different techniques designed to allow (A-C) identification of putative context dependent enhancers involved in AUD and (D and E) functional validation. ChIP-seq, chromatin immunoprecipitation sequencing; ATAC-seq, assay for transposase-accessible chromatin-sequencing; 5C, carbon copy chromatin conformation capture; GWAS, genome wide association analysis; eQTL, expression quantitative trait locii; WGS, whole genome sequencing, TFBS, transcription factor binding site; LacZ/GFP, βgalactosidase/green fluorescent protein, ELS, early life stress; HFD, high fat diet; QPCR, quantitative PCR; sc/snRNA-seq, single cell/single nucleus RNA-seq; IHC, immunohistochemistry; ISH, in situ hybridisation; 5mC, 5-methyl cytosine; 5hmC, 5-hydroxymethyl-cytosine.

What information is needed to build a healthy human?

Other than the genetic information required to produce correct proteins, what other information sources are contained within the human genome that are important for normal human development health and behaviour including ethanol intake? If we consider that 10% of the human genome is under selective pressure (regions of the genome whose mutation rates are constrained due to their importance in survival) and only 1.7–1.9% of the genome encodes proteins, this suggests that at least 8% of the human genome, which does not encode proteins, is essential for health [13]. A high proportion of this genome is likely to be comprised of sequences that include promoters, enhancers, silencers and insulator regions that are critical to modulating expression levels of protein coding genes and non-coding functional RNA species, including microRNAs and long non-coding RNA, in specific cells and tissues [14,15]. A comprehensive review further describing promoter and enhancer biology, and the possible involvement of these sequences in a number of diseases, such as cancer and congenital malformations, has recently been published [14]. Intriguingly, gene regulatory sequences are lacking in unicellular organisms which suggests that these sequences may have evolved to support the development of multicellular organisms where their role is to coordinate the cell-cell interactions required for organ and body development [16]. Consequently, because we now know that most complex disease associated SNPs are found in the non-coding genome, a common feature of GWAS analyses of complex disease [10,12], it could be argued that the genetic causes of complex disease are less a function of “what genes encode” and more about ensuring that these genes are expressed within the correct cells, at the correct times in the correct amounts and in response to the correct signal, a property known as context-dependency.

Identifying regulatory elements

If we accept that the non-coding genome acts as the major reservoir of information required to build functional healthy humans (a concept not yet widely accepted), and that this information is compromised in most complex disease, how do we go about identifying the functional components of the non-coding genome? Promoter regions are the best known and understood regulatory regions in the human genome. They largely consist of sequences of DNA next to the transcriptional start sites of genes that are required to bind RNA polymerase II and other proteins that, together, comprise the transcriptional pre-initiation complex [17]. Their functions are distance and orientation dependant on the transcriptional start sites (TSS) that they control. However, although they are critically important in health and are affected by genetic and epigenetic changes, promoter sequences comprise a smaller proportion of the human genome than even coding regions. In addition, promoters, on their own, are unable to support the high levels of tissue specificity essential for the functional roles of many proteins [18]. Accordingly, other sequences are required to support the expression of genes critical to health in specific cells, in specific amounts and in response to specific cues. These sequences include enigmatic elements known as enhancers that interact with, and increase the activity of, promotor regions [15]. They are functionally distinct from promoters in that they are distance and orientation independent with respect to TSSs. Whilst our understanding of these sequences has increased enormously over the past 20 years, there is still confusion of how they can be reliably identified, what they are, their significance for health and disease and how their activity is affected by polymorphisms or environmental cues.

Attempts to identify enhancers

To date, the greatest effort to understand the non-coding genome can be represented by the 2012 ENCODE consortium release that consisted of a series of research papers which claimed to have mapped all the protein coding regions as well as all of the enhancer, silencer and insulator regions thereby representing an “Encyclopaedia of the Human Genome” [13]. These studies were based on the discovery that functional elements within the non-coding genome could be detected thanks to chromatin modification signatures and protein binding [19]. Therefore, a combination of techniques based around next generation sequencing (NGS) or “Big Data” approaches have been developed to understand the regulatory genome at a genome wide level. These include Dnase1 sensitivity-sequencing (DNase-seq) and “Assay for Transposase-Accessible Chromatin-sequencing” (ATAC-seq) that identify regions of the genome denuded of histones. . Chromatin immunoprecipitation sequencing (ChIP-seq) is used to detect the interaction of different histone modifications and DNA binding proteins thought to be diagnostic of functional components. For active enhancers these include, histone 2, lysine 4 mono-methylation (H3K4me), histone 3 lysine 27 acetylation (H3K27ac) and Histone acetyltransferase p300 (p300)). For active promoters these histone marks are Histone 3, lysine 4 tri-methylation (H3K4Me3)) and for insulators (CCCTC-Binding factor (CTCF)) . Chromatin conformation capture techniques such as 5C/Hi-seq allow for the detection of long-distance interactions within the genome which is now known to be organised into topological association domains (TAD), delimited by insulators, which have not only been conserved between tissues but also between species [21]. The influence of enhancer regions within specific regions of the genome is delimited by insulators [21]. In the ENCODE project these NGS based techniques were generally used to analyse the genomes of easily grown and accessible transformed human cell line monocultures [13], [20]. Based on these analysis, it was concluded that >80% of the human genome was functional, a conclusion that was not without controversy [22]. These studies also suggested that regions displaying chromatin signatures characteristic of enhancers were not conserved, an observation subsequently supported by studies based on H3K27ac distributions in the disaggregated liver cells of several vertebrate species [23]. The inference from these studies was that most enhancer sequences evolve and de-evolved rapidly during evolution and that few enhancers are conserved during evolution. However, earlier studies demonstrated evidence of functional enhancer heterogeneity; such that enhancers associated with distinct cellular functions may be partitioned based on specific combinations of multiple histone modifications [24]. In addition, a functional study of ENCODE predicted enhancer regions indicated that only 26% of these enhancers were active [25]. Moreover, deleting the genes encoding the Trithorax-related (Trr) proteins, responsible for adding H3K4me chromatin marks (characteristic of enhancers) from the Drosophila genome, did not significantly interrupt normal development [26]. Consequently, although chromatin marks will remain extremely useful in the future, there are concerns that extrapolating major conclusion about enhancer activity, based on a limited number of chromatin marks, may not be that helpful in understanding enhancer biology [27]. Accordingly, it has been recommended that the definition of an enhancer detected using “enhancer specific” chromatin modifications should only be accepted if supported by functional data, preferably derived using in-vivo models [28].

Identifying enhancers through expression quantitative trait loci

Techniques such as GWAS have been essential in dissecting the genetic architecture of major diseases including AUD [5,6]. However, another method that has come to the fore in the identification of functional regulatory regions which influence health is the identification of expression quantitative trait loci (eQTLs, Fig. 1B). eQTLs are SNPs whose allelic variants are associated with differences in gene expression [29]. The Genotype–Tissue Expression (GTEx) study (https://gtexportal.org/home/) is the most extensive multi-tissue eQTL catalogue produced to date and is easily accessible via the internet. The GTEx consortium collected data from over 15,000 postmortem tissue specimens from 838 genotyped donors, representing 49 tissues [30]. Use of the GTex database represents a useful way to identify functional regions within the human genome and a number of studies on primary human tissues have already made important inroads into identifying the mechanisms that contribute to the aetiology of schizophrenia [31].

Context dependant enhancers and evolutionary conservation

The evolutionary constraints placed on gene coding sequence is a function of the mostly inflexible 3-letter codon usage that determines the sequence and identity of the amino acids that make functional proteins. Thus, when added to the importance of proteins in health and species fitness, it is clear why protein coding sequences are highly conserved. Nevertheless, even though enhancer sequences do not encode proteins, is it possible that enhancer sequences can also be conserved? Can conservation be used as a method to identify context-specific enhancers? To answer these questions, we need to understand how enhancers work and in what form information is stored in their structure. Put most simplistically, enhancers are comprised of many different transcription-factor binding sites (4–20 bp long) that are clustered together within a short section of DNA typically less than one kilobase in length. The binding site selection of many transcription factors is promiscuous such that an individual transcription factor can bind several different sequences, although with different levels of affinity [32]. However, a series of elegant experiments have shown that the process of enhancer evolution is not random and that the precise identity, order and spacing of the transcription factor binding sites that make up enhancers,known as enhancer “syntax”, is often functionally constrained and critical in defining tissue specificity [33], [34], [35]. Moreover, enhancer context-dependency is also reliant on levels of affinity of different TFs to their binding sites within enhancers whereby optimising the binding site of a given transcription factor produces ectopic activity of the enhancer in different tissues thereby reducing its specificity [33], [34], [35]. In consequence, although the binding promiscuity of TFs would suggest that enhancers evolve rapidly through evolution, the need to preserve syntax and to achieve the finely balance TF affinity needed to achieve tissue specificity would argue against this. Indeed, most studies of functionally proven enhancers argue strongly in favour of their high degrees of conservation through evolution. For example, high levels of sequence conservation are associated with functionally verified enhancers that coordinate expression of the interleukin genes [36], the Sonic hedgehog (Shh) enhancer [37] and Pierre-Robin-sequence (PRS)enhancer [38]. Another exemplary study of GWAS associated SNPs associated with neuroblastoma succeeded in identifying a SNP (rs2168101 G>T) within a highly conserved enhancer inside intron 1 of the LIM Domain Only 1 (LMO1) gene [39]. Additionally, high throughput analysis of highly conserved enhancer sequences analysed using reporter genes in mouse embryos demonstrated that >70% of conserved non-coding regions had observable tissue specific enhancer function [40]. So, despite the widespread acceptance of chromatin markers as enhancer proxies, the case for using comparative genomics to detect functional enhancer sequences also remains strong (Fig. 1C). However, once a putative context-specific enhancer has been identified, what methods are available to allow us to deduce its function and how variables such as polymorphic variation, environmental changes and signal transduction events affect this function?

Methods of validating putative enhancer function

If identifying enhancers seems problematic, analysing their context-dependant functional activity is even more tricky and is the subject of much disagreement and debate. Enhancer activity is most often initially assayed using reporter assays where putative enhancer DNA is cloned into a reporter plasmid that also contains a promoter region driving the expression of an easily quantifiable protein product (Fig. 1D and E). Once cloned, the candidate enhancer can be cut up with enzymes or subjected to site directed mutagenesis to define their functional components (Fig. 1D and E). In the past, these reporters have included chloramphenicol acetyl-transferease (the basis of the CAT-assay), LacZ (that encodes the βgalactosidase gene) and various forms of luciferase [41]. Luciferases such as firefly luciferase, are considered the most accurate as they can detect changes in gene expression over many orders of magnitude [42]. These reporter plasmids are then transfected into different cell lines and the quantities of reporter protein expressed are assayed biochemically. Reporter assays in cell lines can be carried out rapidly, at relatively little expense and can be easily scaled up [43]. However, one major disadvantage is that the cell lines often used may not provide the context appropriate for activating many enhancers. Thus, the current trend of producing “high throughput platforms” to functionally characterise enhancer regions on a whole genome level using cultured cell lines, although generating huge amounts of data, may have a limited ability to shed light on the role of context-dependant enhancers within the human brain. The use of transgenic animals has helped to overcome many of the disadvantages of monoculture cell analysis. Reporter plasmids containing reporter genes such as LacZ or, more recently, different derivatives of the Green Fluorescent Protein (GFP), are used to make transgenic zebrafish [44] or microinjected into the pronuclei of single cell mouse embryos [45] (Fig. 1D). Although the use of these models is labour intensive and is less amenable to “scaling up”, they do provide a critical glimpse into the tissue specificity of enhancers whose activity may be undetectable in monoculture cell lines.

The benefits of CRISPR/Cas9

With the development of CAS9/CRISPR technology, it is now possible to rapidly delete enhancer regions from the mouse genome by microinjecting CAS9 mRNA or protein with single guide RNA (sgRNA) into the cytoplasm of 1-cell mouse embryos,a much easier process than pronuclear injection of reporter plasmids [46]. Although care must be taken with the possibility of generating “off-target” events, CAS9/CRISPR technology has largely superseded the previously widely used method of knocking out/in genes using embryonic stem cell targeting which was time consuming and expensive [47]. Briefly, the cytoplasm of single-cell mouse embryos are injected with either CAS9 protein (pre-incubated with sgRNA) or CAS9 mRNA and at least two sgRNA molecules, and allowed to develop to the 2-cell stage. These embryos are then oviduct transferred into a pseudopregnant host female mouse where they develop into pups. Although more challenging, CAS9/CRISPR technology can also be used to introduce human allelic variants in the mouse genome allowing a functional comparison of the effects of allelic variants on behaviour and health in-vivo [48] (Fig. 1D). This approach relies on the cells own homologous directed repair (HDR) mechanisms which are attracted to cut DNA. Hence, in the presence of a repair template, usually a 100 bp strand of DNA which is co-injected with CAS9-sgRNA complex, the cell will attempt to repair the CAS9 cut strand using a “Trojan Horse” repair template. The problem with this approach is that non-homologous end joining (NHEJ), which competes against HDR within the cell, is a much more active process in somatic cells with the result that only 10% of repairs within the cell are HDR directed [49]. Thus, although a very useful method of introducing allelic variants more development to encourage HDR over NHEJ repair pathways are required before CRISPR/CAS9 can persuasively complete against ES targeting in the short term [50].

Analysis of CRISPR/CAS9 enhancer knockouts

Whilst the main benefit of enhancer CRISPR knockouts is the ability to behaviourally test mice to assess the effects of deleting the enhancer on ethanol intake or co-morbidities such as anxiety, appetite and depression, (Fig. 1D) another benefit of these experiments is in determining the effects of these enhancer knockouts on the expression of down-stream genes which may also be involved in modulating ethanol intake. Thus, brain tissues can be recovered from these mice, subjected to RNA-seq analysis and compared to the expression of genes in wild type mice to determine which genes are modulated by the deleted enhancer. Although RNA-seq is able to determine the effects of deleting enhancers on the expression of the whole genome, the resolution of the technique at the cellular level is poor and relies on the fine dissection abilities of the operator. However, a recently developed technique called single cell RNA-seq (scRNA-seq) [51] allows the operator to define specific cell types based on the transcriptomes of individual cells. Briefly, disaggregated cells from dissected tissues are segregated either by fluorescent activated cell sorting (FACS) sorting or using a droplet-based-methods, into individual aqueous compartments in a lipid suspension. Within these compartments, cells are lysed and mRNA converted to cDNA prior to tagging with a barcode primer unique to each compartment. The cDNA is then recovered and combined for sequencing [51]. Sequences can then be desegregated by computers and the data is displayed as principal component analyses where cells are categorised based on their transcriptomes (Fig. 1D). In the case of neuronal tissues, which are comprised of heavily interdigitated cells, a refinement of this technology allows for the recovery of individual nuclei, which contain between 20 and 50% of the total cell mRNA, which can then be sorted and analysed in place of whole cells [52]. This technique can also be used to analyse frozen tissues where the integrity of the cells has been compromised by the freezing process. Thus, in combination with CAS9/CRISPR technology, it is now possible to identify which genes are regulated by enhancers, at the cellular level, on a genome wide scale and to finally deduce the maximum distance over which enhancers can influence promoter activity.

Conserved enhancers that modulate alcohol intake

Based on previous observations that many enhancer elements have been highly conserved through evolution we tested the hypothesis that highly conserved sequences next to the coding sequences of genes known to control ethanol intake could represent context-dependant enhancers with a critical role in controlling ethanol intake [40]. We first explored the role of an enhancer sequence within the cannabinoid-1 receptor (CB1) gene (CNR1), that we called ECR1, which had been conserved since the last common ancestor of humans and fish (400 million years) [53]. This sequence was of interest as it also contained a SNP that had been associated with addictive behaviours [54] [55] and alcohol abuse [56,57]. Our initial analyses using reporter assays in primary cell lines suggested that the ECR1 sequence acted as an enhancer sequence whose presence influenced the activity of the promoter region of the CNR1 gene [53]. Initially, we were able to demonstrate that the allelic variants of the ECR1 enhancer had a differential effect on the activity of the CNR1 promoter and responded differently to signal transduction agonists [58,59]. Based on these observations we undertook a functional analysis of the ECR1 enhancer in mice by deleting this enhancer from the mouse genome using CRISPR-Cas9 technology. Initial examination of these animals showed that, not only had the expression of the Cnr1 gene been significantly reduced in parts of the brain that included the hippocampus, but that the hypothermia response to CB1 agonisms had also been significantly reduced consistent with a reduction in Cnr1 expression [58,60]. Subsequent analysis of these animals also demonstrated a significant reduction in ethanol intake and an altered anxiety phenotype in male and female ECR1KO mice [58,59]. To the best of our knowledge, these studies represent the first evidence that conserved enhancer regions play a role in influencing ethanol intake and anxiety. We also explored the regulation of the GAL gene that encodes the galanin neuropeptide and has been associated with ethanol intake [61,62]. Genetic studies in humans had identified genotypes around the GAL locus that had associations with excess ethanol intake [63]. In a similar manner to ECR1, we used comparative genomics to identify a highly conserved polymorphic region of DNA that we called GAL5.1, that had also been conserved in birds and reptiles (350 million years) and lay 42 kb from the human GAL gene. We isolated this DNA sequence from human DNA and used it to produce transgenic reporter mice that expressed the βgal marker protein in cells of the hypothalamus and amygdala that also expressed galanin [64]. Subsequent luciferase analysis of human polymorphic variants of GAL5.1 in primary hypothalamic cell culture demonstrated a significant difference in the strength of this enhancer [64]. Based on these observations we examined the association of GAL5.1 polymorphic variants with ethanol abuse in the UK Biobank cohort (n = 115,865) and demonstrated a significant association between the GG genotype of this enhancer, ethanol intake and anxiety in men [65] (Fig. 1B). Intriguingly, CRISPR deletion of GAL5.1 (GAL5.1KO) almost completely ablated the expression of the GAL gene in all the GAL5.1KO mouse cell types analysed. Most importantly, deleting GAL5.1 produced mice that drank less ethanol whilst males suffered less anxiety mirroring our observations within the UK-Biobank [66]. Taken together, and in light of functional studies from other labs [36], [37], [38], [39], [40] these studies strongly suggest that enhancer regions critical to supporting tissue specific gene regulation can be highly conserved, most probably due to a need to conserve the syntax and the specific DNA binding specificities required to achieve robust levels of tissue-specificity. Critically, retrospective analysis of both ECR1 and GAL5.1 using the available ENCODE data base failed to identify chromatin marks diagnostic of active enhancers. The most likely reason for this observation was that neither ECR1 or GAL5.1 were active in the cell lines used by ENCODE. Consequently, our observations agree with previous conclusions that the enhancer status of a given sequence should only be accepted if supported by functional data, preferably derived using in-vivo models [28]

Signal transduction networks and enhancer polymorphisms

One of the major ambitions of medicine is to develop a personalised therapeutic approach to treating conditions such as AUD and anxiety which, in a large proportion of the population, resist current treatments [67]. The mechanisms controlling cell-cell interactions represent an important source of targets for the development of the personalised drug therapies of the future [68]. Receptor activation at the cell surface is followed by a cascading network of signal transduction interactions in the cytoplasm that terminate within the nucleus through the activation of DNA binding proteins[69]. Once activated through processes that include post-translational modifications, these proteins then assemble in a specific order on enhancer elements within the genome and recruit other factors that remodel the chromatin thereby controlling the transcriptional gene response [70]. For this cascade of events to unfold in an appropriate manner and to generate an appropriate transcriptional response, the interaction between activated transcription factors and their target enhancers is critical [71]. Since the protein components of signal transduction mechanisms have been so strongly conserved through evolution, it is highly likely that the plasticity that generate differences in drug response resides at the level of enhancer variance [72]. It is therefore important to understand the effects of enhancer variance on enhancer response to signal transduction activation (Fig. 1D and E)[70]. To identify the cell signalling networks that control tissue specific activity of the GAL5.1 enhancer, we exposed primary hypothalamic cell cultures transfected with reporter plasmids under the control of either the GAL5.1 or ECR1 enhancers. In the case of GAL5.1, we demonstrated that neither protein kinase A or MAPkinases could significantly affect its activity in primary hypothalamic cells. However, when expose to an agonist of protein kinase C (PKC) signalling, we observed a very significant increase in activity that was replicated in neuroblastoma cells[65]. Further dissection of the mechanisms governing the PKC demonstrated that co-transfection of cells with a plasmid expressing the EGR1 transcription factor further boosted the response of GAL5.1 to PKC agonists and Chromatin immunoprecipitation (ChIP) showed that EGR1 bound a single highly conserved consensus sequence within GAL5.166. Our most interesting observation came when we repeated these experiments with an allelic variant of GAL5.1. GAL5.1 hosts two common polymorphisms in perfect linkage disequilibrium (LD, rs2513280 (C/G) and rs2513281 (A/G)) to give a major allele (GG, 70–80%) and a minor allele (CA, 20–30%). Intriguingly, the CA allele demonstrated a significantly reduced response to PKC agonism and EGR1 expression [65]. Considering that the CA haplotype was also protective against anxiety and alcohol abuse in men, this difference suggests a direction for the development of anti-anxiety drugs, based on PKC antagonism, that may play a role in also reducing alcohol abuse in GG men.

Effects of environment on enhancer methylation and activity

The human genome is subject to a form of biochemical modification called DNA-methylation which is altered by environmental influence[73]. The best characterised form of methylation occurs on the 5th carbon of the cytosine ring (5mC) of the CpG dinucleotide, which has a significant influence on gene expression [73]. 5mC is initially deposited by the DNMT3A and DNMT3B proteins and is maintained by another protein called DNMT1 [73]. De-methylation can occur due to a failure of DNMT1 to continue to replicate methylation following cell division (passive de-methylation), or through active de-methylation that involved the stepwise degradation of 5mC by the Ten-eleven translocation proteins (TET1–3) to form 5-hydroxymethylation (5hmC) and a number of other forms (5fC and 5caC) [74]. Much is known about the effects of 5mC on the activity of promoter regions where high levels of methylation within the CpG island of many promoters is associated with reduced promoter activity due to binding of methyl-DNA binding proteins MBD1, MBD2, MBD4, andMeCP2 [75]. However, much less is known about the role of DNA methylation in enhancer elements[76]. Previous studies of the roles of 5mC in enhancer activity have explored the effects of environmental factors such as early life stress on 5mC levels within an enhancer region that controls the expression of the AVP gene that expresses the argenine vasopressin neuropeptide [77]. These elegant experiments showed that ELS induced hypomethylation of the AVP enhancer resulted in elevated levels of AVP expression in later life which could then be associated with increased depression like behaviours in ELS exposed animals [77]. These experiments drew a direct link between an environmental stimulus, changes in 5mC levels in an enhancer and changes in behaviour in later life. To determine a possible role for DNA-methylation in the activity of the GAL5.1 enhancer we exposed pregnant wild-type mice to standard low-fat diet or a choice of high-fat (60% calories from fat) and low-fat diet. We observed that levels of methylation of GAL5.1 were significantly elevated in male animals who were exposed to maternal high fat diet in utero [65]. In addition, we demonstrated that methylation of GAL5.1 repressed the activity of the GAL5.1 enhancer even when stimulated by PKC agonism or transfection of active EGR1 transcription factor [65]. These studies support the hypothesis that one of the ways that environmental conditions affect health is through epigenetic modification of enhancers [76]. If we also consider that the GG haplotype of GAL5.1 contains a CpG site lacking in the CA haplotype we can also see how enhancers might serve as a nexus between genetics and environment. Considering the known impact of DNA methylation on regulatory activity, gene expression and, hence phenotype, could the involvement of the environment in altering enhancer function through DNA methylation impact on the veracity of GWAS data? Thus, if human phenotypes can be changed through the methylation of enhancer elements, could this mask or exaggerate the presentation of disease phenotypes such as depression, anxiety or addiction? The role of DNA-methylation in enhancer activity is further complicated by observations that a product of the active removal of 5mC, 5hmC, seems to have a different effect on enhancer activity than 5mC where 5hmC is associated with active enhancers [78], [79], [80]. These findings are further complicated by the fact that the usual method used of analysing CpG methylation in the genome is through bisulfite sequencing which is unable to differentiate between 5mC and 5hmC [81]. Our studies also show that there is considerable variation in levels of enhancer methylation between different tissues such that 5mC/5hmC levels in amygdala and hypothalamus are twice what they are in the hippocampus raising questions of the relevance of peripheral blood based DNA methylation data in understanding DNA-methylation levels in the central nervous system [65]. Clearly, a great deal more research is required to understand the interaction of DNA methylation on enhancer activity before we can truly understand the roles and influences of enhancers on health and disease.

Enhancers as the regulatory basis of co-morbidities

The relationship between enhancer and genes is not straight forward such that one enhancer may influence the expression of many genes and the influence of one gene may be under that influence of many enhancers [82]. Indeed, changes in the relationships between the more plastic regulatory genome, such as enhancer co-option and loss, and the relatively fixed coding genome in vertebrates is likely to have been the major driving force in human evolution [83]. Moreover, it is well established that the maintenance of synteny blocks (regions where the same genes are clustered in the genomes of diverse vertebrates) reflects the need for many enhancers and genes to interdigitate [83,84]. A further complication is that a single enhancer may drive the expression of individual genes in many different cell types. For example, the GAL5.1 enhancer is active in the periventicular nucleus, medial nucleus and arcuate nucleus of the hypothalamus and in the medial amygdala [66]. This may explain the fact that deletion of GAL5.1, not only reduced ethanol intake in both sexes, but also decreased fat intake in both sexes and anxiety in male animals [65,66]. Accordingly, given the ability of individual enhancers to drive expression into many tissues, and to affect many behaviours, we should not be surprised that any polymorphism or DNA methylations which affects an individual enhancer may result in co-morbidities such as obesity, alcohol abuse and chronic anxiety. Again, more analysis of enhancers, their relationships to gene expression and the effects of polymorphic and DNA-methylation on their activities, is essential to understand the basis of human disease susceptibilities.

Conclusions

Following the sequencing of the human genome, numerous promises of huge advances in our understanding of health and disease, and the subsequent production of new therapeutic technologies, raised hopes for the understanding and treatment of chronic disorders including alcohol abuse within 10 years. However, 22 years have elapsed and the promised benefits, that largely justified the sequencing of the human genome, have yet to materialise. Yet, it is striking to see how far our understanding of the human genome has progressed. We now know that, disappointingly for many, the majority of what we need to know about the basis of human health and disease is hidden in a portion of the genome previously dismissed as “Junk DNA”. In other words, our analysis of the human genome, to date, has given us a better perspective on what we need to know. Although, attempts to understand the non-coding genome, as typified by ENCODE, although dismissed by some as misguided and a lesson in showboating [22], succeeded in producing a great deal of very useful data that will continue to be analysed for decades to come. However, it is clear that our biggest challenge is to design strategies that take account of the context-dependency displayed by many enhancers. Unfortunately, there is unlikely to be a high tech “quick fix” in this regard; where one or two markers of enhancer function will identify and characterise all context-dependant human enhancers using cell lines alone. Instead, we are in for a “long haul” where identifying functional enhancers, characterising the mechanisms regulating their context-dependency, and how they can be affected by genetic and environmental changes, necessitates the continued use of genetic manipulation of whole animal models such as zebrafish and mouse. But, thanks to our ability to rapidly engineer the genomes of vertebrate models such as mice and the rapid development of single cell sequencing technologies [85], we are in a much better position to develop a greater understanding of the role of context-dependant enhancers in normal development and health than we were even ten years ago. Only by understanding the mechanisms that modulate the context-dependency of gene expression, and in determine the effects of polymorphisms and environment on these mechanisms, will we succeed in understanding the molecular basis of mental health issues such as AUD, and its co-morbidities, and to devise therapies to treat it.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

82 in total

Review 1. Enhancer identification through comparative genomics.

Authors: Axel Visel; James Bristow; Len A Pennacchio
Journal: Semin Cell Dev Biol Date: 2007-01-05 Impact factor: 7.727

2. Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers.

Authors: Emma K Farley; Katrina M Olson; Wei Zhang; Daniel S Rokhsar; Michael S Levine
Journal: Proc Natl Acad Sci U S A Date: 2016-05-06 Impact factor: 11.205

Review 3. Enhancer DNA methylation: implications for gene regulation.

Authors: Allegra Angeloni; Ozren Bogdanovic
Journal: Essays Biochem Date: 2019-12-20 Impact factor: 8.000

4. In vivo sequence requirements of the SV40 early promotor region.

Authors: C Benoist; P Chambon
Journal: Nature Date: 1981-03-26 Impact factor: 49.962

Review 5. Decoding enhancers using massively parallel reporter assays.

Authors: Fumitaka Inoue; Nadav Ahituv
Journal: Genomics Date: 2015-06-10 Impact factor: 5.736

6. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons.

Authors: G G Loots; R M Locksley; C M Blankespoor; Z E Wang; W Miller; E M Rubin; K A Frazer
Journal: Science Date: 2000-04-07 Impact factor: 47.728

Review 7. Revealing the architecture of gene regulation: the promise of eQTL studies.

Authors: Yoav Gilad; Scott A Rifkin; Jonathan K Pritchard
Journal: Trends Genet Date: 2008-07-01 Impact factor: 11.639

8. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.

Authors: Dan Graur; Yichen Zheng; Nicholas Price; Ricardo B R Azevedo; Rebecca A Zufall; Eran Elhaik
Journal: Genome Biol Evol Date: 2013 Impact factor: 3.416

9. Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank (N=112 117).

Authors: T-K Clarke; M J Adams; G Davies; D M Howard; L S Hall; S Padmanabhan; A D Murray; B H Smith; A Campbell; C Hayward; D J Porteous; I J Deary; A M McIntosh
Journal: Mol Psychiatry Date: 2017-07-25 Impact factor: 15.992

Review 10. Corticotropin Releasing Factor Binding Protein as a Novel Target to Restore Brain Homeostasis: Lessons Learned From Alcohol Use Disorder Research.

Authors: Dallece E Curley; Ashley E Webb; Douglas J Sheffler; Carolina L Haass-Koffler
Journal: Front Behav Neurosci Date: 2021-11-29 Impact factor: 3.558