Literature DB >> 35514262

Non-coding genetic variation in regulatory elements determines thrombosis and hemostasis phenotypes.

Luca Stefanucci1,2,3, Mattia Frontini1,2,3,4.   

Abstract

Since the early inception of genome-wide association studies (GWAS), it became clear that, in all diseases or traits studied, most genetic variants are likely to exert their effect on gene expression mainly by altering the function of regulatory elements. At the same time, the regulation of the gene expression field broadened its boundaries, from the univocal relationship between regulatory elements and genes to include genome organization, long-range DNA interactions, and epigenetics. Next-generation sequencing has introduced genome-wide approaches that have greatly improved our understanding of the general principles of gene expression. However, elucidating how these apply in every single genomic locus still requires painstaking experimental work, in which several independent lines of evidence are required, and often this is helped by rare genetic variants in individuals with rare diseases. This review will focus on the non-coding features of the genome involved in transcriptional regulation, that when altered, leads to known cases of inherited (familial) thrombotic and hemostatic phenotypes, emphasizing the role of enhancers and super-enhancers.
© 2022 The Authors. Journal of Thrombosis and Haemostasis published by Wiley Periodicals LLC on behalf of International Society on Thrombosis and Haemostasis.

Entities:  

Keywords:  endothelial cells; gene regulation; hemostasis; megakaryocytes; super-enhancer; thrombosis

Mesh:

Year:  2022        PMID: 35514262      PMCID: PMC9540108          DOI: 10.1111/jth.15754

Source DB:  PubMed          Journal:  J Thromb Haemost        ISSN: 1538-7836            Impact factor:   16.036


INTRODUCTION

How genes and genetic polymorphisms influence human traits, and consequently cause diseases, has been a central question in biology and medicine since genetic inception. , , The technological developments that occurred over the last three decades have profoundly impacted the understanding of this topic. Genome‐wide association studies (GWAS) and whole, or targeted, genome sequencing have identified thousands of common and rare variants that influence human traits and diseases. , , The majority of these trait‐modifying polymorphisms are located in the 98% of the genome that does not encode for a protein (i.e., non‐coding genome), implying that they do not alter a protein amino acid sequence. Instead, these variants are thought to be of regulatory nature for the causal genes. Gene expression quantitative traits loci (eQTL) studies have confirmed the regulatory nature of some of those, where enough statistical power (sample size and effect size) was available. Colocalization analysis overlays information from independent sources and traits (e.g., GWAS and eQTL) and tests them for signals that are consistent with a shared causal variant. This approach is used to connect variants to genes and phenotypes, to identify molecular and cellular phenotypes (e.g., transcription levels) that are relevant for more complex traits (e.g., GWAS‐associated disease) and to determine the mechanism by which the GWAS variants are influencing the phenotype. Colocalization of GWAS and eQTL variants in tissues implicated in thrombosis and hemostasis has been reported in various studies. , , , Among others, rs1175170 was identified as a regulator of RGS18 transcription in platelets, linking this gene to arterial thrombosis. Colocalization also can be strengthened using additional chromatin features. For instance, Downes and colleagues identified rs10886430, in a GRK5 intron, as a regulator of platelet activation through the protease‐activated receptor‐1 pathway. The alternative nucleotide in rs10886430 locus alters GATA1 and MEIS1 binding sites in a megakaryocyte‐specific enhancer (Table 1) and alters GRK5 expression level.
TABLE 1

Summary of the non‐coding regulatory variants discussed in this review

VariantGenePhenotypePossible mechanismPMID
rs1175170 RGS18 Platelet aggregationAlteration of GATA1 and NFE2 binding site34131117
rs10886430 GRK5 Platelet activationAlteration of GATA1 and MEIS1 binding site34581777
GenBank: GQ246945 PLAU Gain‐of‐function platelet dependent fibrinolysisGene duplication leads to enhancer hijacking20007542, 32663239
GRCh37: CTCF3 4:155539849_155540258del FGA FGB FGG Reduction in fibrinogen levelsLoss of a CTCF binding site and consequent loss of local chromatin interactions30039577
GRCh37: CTCF4 4:155543772_155544212del FGA FGB FGG Reduction in fibrinogen levelsLoss of a CTCF binding site and consequent loss of local chromatin interactions30039577
GRCh37: X: 154230198–154 252 817 F8 Elevated FVIII levels and familial thrombophilia.Duplication of F8 gene promoter leads to the increased level of F8 transcript33275657
GRCh38: 10:27042550_28567796_dup‐inv‐dup WAC‐ANKRD26 fusion Familial thrombocytopeniaGain‐of‐function and cryptic ANKRD26 TSS33857290
rs9349379 EDN1 Increased risk of coronary artery disease, migraine headache, cervical artery dissection, fibromuscular dysplasia, and hypertensionIncrease expression of EDN1 via the alteration of the enhancer within the third intron of PHACTR1.28753427
GRCh37: 1:145399075_145594214del RBM8A Thrombocytopenia and absent radii (TAR) syndromeThis variant reduces the function of the RBM8A promoter22366785
rs12041331 PEAR1 Lower platelet function on aspirin and risk factor for cardiovascular eventsMinor allele leads to a loss of the methylation and reductions of PEAR1 expression27313330

Abbreviations: CTCF, CCCTC‐binding factor; FVIII, factor VIII; PMID, PubMed reference number.

Summary of the non‐coding regulatory variants discussed in this review Abbreviations: CTCF, CCCTC‐binding factor; FVIII, factor VIII; PMID, PubMed reference number. In parallel, the understanding of the role of non‐coding genomes increased exponentially. , The last decade has been crucial to untangling the structure, regulation, and function of the genome, a field of study generally referred to as functional genomics (Figure 1). , For instance, we now know that the control of gene expression in a spatio‐temporal fashion results from a dynamic and unique combination of DNA topology and regulatory elements activity to the point that cell identities are more granularly defined by their chromatin features than by the gene expression patterns , and that most of the genome has some sort of regulatory function in one cell type or another.
FIGURE 1

Chromatin structure, genomic features, and technologies widely adopted in functional genomics studies to characterize regulatory variants, cognate genes, and their effect on transcription. Several technologies can identify genetic variants and their location in regulatory regions. , , , To associate regulatory variants to their cognate genes, a series of other technologies are needed to investigate the chromatin structure in a cell‐type–specific fashion. , , To constrain the genome to regulatory regions, technologies such as ChIP‐Seq and ATAC‐Seq can inform us about the chromatin function via its post‐translational modifications and accessibility. The effect of regulatory variants on transcription can be estimated with MPRA and/or other technologies. FISH, fluorescence in situ hybridization; GAM, genome architecture mapping; MPRA, massively parallel reporter assays; qPCR, quantitative polymerase chain reaction; SNP, single nucleotide polymorphism; SPRITE, split‐pool recognition of interactions by tag extension.

Chromatin structure, genomic features, and technologies widely adopted in functional genomics studies to characterize regulatory variants, cognate genes, and their effect on transcription. Several technologies can identify genetic variants and their location in regulatory regions. , , , To associate regulatory variants to their cognate genes, a series of other technologies are needed to investigate the chromatin structure in a cell‐type–specific fashion. , , To constrain the genome to regulatory regions, technologies such as ChIP‐Seq and ATAC‐Seq can inform us about the chromatin function via its post‐translational modifications and accessibility. The effect of regulatory variants on transcription can be estimated with MPRA and/or other technologies. FISH, fluorescence in situ hybridization; GAM, genome architecture mapping; MPRA, massively parallel reporter assays; qPCR, quantitative polymerase chain reaction; SNP, single nucleotide polymorphism; SPRITE, split‐pool recognition of interactions by tag extension.

GENOME ARCHITECTURE AND TRANSCRIPTION

Multicellular organisms derive all cell types, with vastly different functions, using different parts of the information contained in the genome. Evolutionarily this has been achieved with the use of intergenic regions that structured and controlled gene expression. The function of the higher order of the genome is 2‐fold: (1) to separate active regions from inactive ones, called A and B compartments, respectively; and (2) to connect the regulatory regions to the genes, and to do so in a manner that avoids spurious gene activation. This is achieved by anchoring DNA to the nuclear lamina , and/or via DNA looping (Figure 1). Some loops are implicated in the tridimensional organization of the genome, while others are directly involved in transcriptional regulation by bringing together promoters and enhancers. Loops, in the interphase, are mainly organized by architectural factors such as the CCCTC‐binding factor (CTCF), the cohesin complex, , and other factors that bind to the DNA.

TOPOLOGICALLY ASSOCIATING DOMAINS

The sub‐chromosomal regions considered, to some extent, DNA functional units, are called topologically associating domains (TADs; Figure 1). TADs impose some spatial constraints on DNA’s ability to move, increasing the probability of interaction between regulatory regions and cognate genes. Their sizes in the human genome are variable, but on average are around one megabase (i.e., 106 base pairs; Mb). TADs were first observed in all‐versus‐all chromatin conformation capture (3C) experiments and then confirmed using microscopy approaches. , , , , These topological domains are mostly conserved across cell types , and contain, to some extent, all the genomic features that are required to allow the physiological gene expression (e.g., enhancers, promoters, and genes). , Smaller‐scale structures are observed within the TADs and are often referred to as sub‐TADs. , , These are highly dynamic structures that vary quite a lot from cell type to cell type and are mainly driven by promoter‐enhancer interactions. , The interactions occurring within TADs are crucial for gene expression but also to correctly structure the topology of TAD and sub‐TAD domains. TADs’ boundaries are enriched with features like regulatory elements and genes. However, it must be noted that TAD boundaries have different abilities to insulate. , While some exert a robust insulating effect, others do not, and allow interactions between different domains. Disruption of strong boundaries, either due to their deletion or by chromosomal rearrangement, as well as the formation of new ones, may result in alteration of gene expression and pathological sequelae. , For example, tandem duplication of the plasminogen activator urokinase (PLAU) gene and one of the enhancers for VCL, disrupting the sub‐TAD organization of this region on chromosome 10, results in PLAU over‐expression platelets and Quebec platelet disorder (Table 1). This phenomenon is known as enhancer hijacking and, in this case, results in a dominant platelet‐dependent fibrinolysis. Similarly, the expression of the fibrinogen gene cluster (FGA, FGB, FGG) is controlled via four enhancers, CNC12, PFE2, E3, and E4, located close to it. At the edge of this gene cluster, there is a CTCF binding site. Removing the FGG‐closest CTCF binding site rearranges the TAD, resulting in a reduction of FGB and FGG expression levels and a consequent halving of the amount of fibrinogen secreted from hepatic cells (Table 1). On the other hand, enhancers and promoters directly orchestrate the transcriptional process by establishing a permissive chromatin environment and recruiting the machinery necessary for gene expression (Figure 1). ,

TRANSCRIPTION FACTORS

Specific DNA sequences that are recognized by transcription factors (TFs) allow this permissive status. Pioneer TFs can bind to the DNA in the presence of nucleosomes and recruit remodeling complexes that displace the latter, creating open chromatin, thus allowing other TFs to bind to their motifs or binding site (TFBS). , This process occurs throughout organism development, from fertilization, through the three embryonic layers, down to the mature postmitotic cell types forming the different tissues and organs, sometimes with different TFs of the same family taking part in a relay to bind to the same site as differentiation proceeds. , , , Once TFs are bound to regulatory elements, the nearby nucleosomes are post‐transcriptionally modified with marks of active chromatin while the recruitment of the transcriptional machinery begins. The impact of DNA variants on TF binding was recognized early on with significant enrichment of variants associated with common diseases in open chromatin, that is, where TFs are bound. Similar enrichments have been observed for platelet‐related traits in the megakaryocyte’s enhancers. The consequences of genetic variation in regulatory elements span a wide range, from extremely small to very large. The former, often due to common variants, alters the observed trait by decimal points of the standard deviation percentage while the latter, usually associated with rare variants (minor allele frequency <0.1), drives the trait into the pathological spectrum. , An example of a common variant altering the phenotype of interest is rs9349379, located in the third intron of PHACTR1, and associated with five vascular diseases, including coronary heart disease (Table 1). This variant lies within a regulatory element that controls the expression of endothelin 1 (EDN1) located 600 kilobases (kb) away. As an example of the latter, in the megakaryocyte/platelet axis, two rare variants critically altering TFBS and gene expression are (1) rs139428292, located in the 5′ UTR of RBM8 and (2) a previously unknown polymorphism in the first intron of the same gene. When either is present in compound heterozygosity with a 1q21.1 deletion, the individual is affected by thrombocytopenia and absent radii syndrome (Table 1).

ENHANCERS AND PROMOTERS

Enhancers regulate gene expression mainly by coming in close proximity with the gene promoter and contributing to the recruitment of the necessary protein complexes, a model referred to as activity by contact. There are also examples in which the enhancer needs to be located away from the regulated gene promoter. In both cases, the enhancer positioning helps to reach a local conformation favoring transcription. Enhancer and promoter aberrations, either in quality or quantity, can be the etiology of human conditions. For instance, a form of familial thrombophilia has been identified in two independent families that carry a tandem duplication of a part of the F8 gene (exon 1 and intron 1; Table 1). Simioni and colleagues showed that the increased level of factor VIII (FVIII) is due to the duplication of a regulatory region present in F8’s first intron. The duplication of this enhancer increases the amount of relevant transcription factors that localize in the proximity of F8 promoter and, as a consequence, inflates the amount of FVIII produced by hepatocytes. Wahlster and colleagues used long‐read sequencing to identify a paired‐duplication inversion of ANKRD26‐WAC (Table 1) that leads to ANKRD26 not being silenced and consequently results in thrombocytopenia. Active enhancers and promoters are labeled with several post‐transcriptional modifications on the histones of the nearby nucleosomes. Among these, either histone 3 lysine 27 acetylation (H3K27ac) or histone 3 lysine 122 (H3K122Ac) together with histone 3 lysine 4 mono‐methylation (H3K3me1) , label enhancers and H3K27Ac with H3K4me3 label promoters.

SUPER ENHANCERS

Soon after chromatin modification genome‐wide studies became widely available, it was noted that the distribution of H3K27Ac is not equal across all enhancers, and a number of these are localized closer to each other than by chance. Enhancers located less than 12.5 Kb from each other can be grouped into super‐enhancers (SEs) as their constituents, also called stretch enhancers (Figure 2). SEs have some distinguishing properties. (1) They contribute to the large majority of the H3K27ac signal and some other regulatory proteins (e.g., Med1 , and p300 ; Figure 2). (2) Gene expression, on average, is higher in genes connected to SEs than in genes linked to the same number of regulatory regions as the SEs’ constituents but located more than 12.5 kb apart (and therefore do not qualify as SEs; Figure 2). (3) SEs play a pivotal role in regulating genes that orchestrate cell fate decisions during stem cell differentiation. ,
FIGURE 2

Super‐enhancers (SEs; colored in red) definition via ChIP‐Seq experiments and biological characteristics. Typical enhancers (TE; colored in green) are aggregated in SEs if the distance between them is less than 12.5 Kb. In ChIP‐Seq experiments, SEs are characterized by having a larger amount of sequencing reads (H3K27ac, Med1, p300). SEs are defined as those that, in the ranking of the ChIP‐Seq signal for H3K27ac (or Med1), are localized on the right side of the transition point (i.e., straight line of slope equals one and tangent to the curve).

Super‐enhancers (SEs; colored in red) definition via ChIP‐Seq experiments and biological characteristics. Typical enhancers (TE; colored in green) are aggregated in SEs if the distance between them is less than 12.5 Kb. In ChIP‐Seq experiments, SEs are characterized by having a larger amount of sequencing reads (H3K27ac, Med1, p300). SEs are defined as those that, in the ranking of the ChIP‐Seq signal for H3K27ac (or Med1), are localized on the right side of the transition point (i.e., straight line of slope equals one and tangent to the curve). In endothelial cells, the transcription factor ERG plays an essential role in establishing SEs, and variants associated with cardiovascular diseases are enriched in ERG TFBS localized in endothelial SEs. In megakaryocytes, SE constituents are physically connected and regulate genes implicated in several cellular processes. In platelet traits (i.e., mass, count, mean volume, and distribution width), genetic variants harbored in SEs influence the expression of genes implicated in the archetypical functions of these cells (response to wounding/wound healing, coagulation, hemostasis, platelet degranulation, actin cytoskeleton remodeling, regulation of body fluid levels). This evidence indicates that genetic variation in these genomic regions plays a key role in determining how each individual responds to pro‐coagulant stimuli. It is also interesting to note that, while each set of SEs defines the identity of a cell type, the majority of the SEs’ constituents are already specified as open chromatin early on during development. , As an example, of the 1067 megakaryocyte SEs, only 24 have a fully open chromatin profile in hematopoietic progenitors. This means that the final set of SEs is fully established by controlling the opening of about 2100 constituents in the mature cells. The 1067 SEs are connected to more than 3300 genes, and while there are several linear relationships between SE and genes, more complex relationships exist reflecting the constraint in degrees of freedom dictated by the DNA itself and the organization of RNA polymerase II factories. It is likely that these interactions are not happening all at the same time and/or in every cell, as different conformation supporting transcription might occur and only single‐cell data could provide a definitive answer. For instance, the VWF‐CD9 locus is controlled by three SEs, each contacting the promoters of both genes, which are also in contact with each other. A genetic variant, rs2363877, linked to platelet traits, lies in one of the SEs, and controls the transcription of one of the two genes. The minor allele favors VWF expression at the expense of CD9. Moreover, some of these interactions might be implicated in the silencing of VWF, whose expression, at least in endothelial cells, is controlled by a stochastic bi‐stable switch mediated by DNA methylation. DNA methylation plays an important role in hematopoiesis by determining permissive cell fates by controlling accessibility to regulatory elements. The same mechanism is also used to control the expression of genes implicated in platelet reactivity and cardiovascular disease like PEAR1 (Table 1). Overall, the last decade has opened a wealth of knowledge that has established several genome‐wide principles on how gene expression is organized. Unfortunately, it is less clear how these principles apply to individual genes and orthogonal lines of evidence, obtained with painstaking laboratory work, are still required to determine the effects of specific regulatory sequences. The introduction of mid‐ and high‐throughput measurements of functional phenotypes will lead, soon, to an increase in the number of discoveries linking phenotypes, including hemostasis and thrombosis, and diseases, with genotypes, especially rare variants, and one day there will be enough data to bypass the requirement for laboratory validation.

CONFLICT OF INTEREST

The authors have no conflicts of interest to disclose.

AUTHOR CONTRIBUTIONS

LS and MF discussed and wrote the manuscript together.
  79 in total

1.  Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants.

Authors:  Stephen C J Parker; Michael L Stitzel; D Leland Taylor; Jose Miguel Orozco; Michael R Erdos; Jennifer A Akiyama; Kelly Lammerts van Bueren; Peter S Chines; Narisu Narisu; Brian L Black; Axel Visel; Len A Pennacchio; Francis S Collins
Journal:  Proc Natl Acad Sci U S A       Date:  2013-10-14       Impact factor: 11.205

2.  Disease genomics: Transitioning from association to causation with eQTLs.

Authors:  Dorothy Clyde
Journal:  Nat Rev Genet       Date:  2017-03-27       Impact factor: 53.242

3.  Latent enhancers activated by stimulation in differentiated cells.

Authors:  Renato Ostuni; Viviana Piccolo; Iros Barozzi; Sara Polletti; Alberto Termanini; Silvia Bonifacio; Alessia Curina; Elena Prosperini; Serena Ghisletti; Gioacchino Natoli
Journal:  Cell       Date:  2013-01-17       Impact factor: 41.582

4.  A map of the cis-regulatory sequences in the mouse genome.

Authors:  Yin Shen; Feng Yue; David F McCleary; Zhen Ye; Lee Edsall; Samantha Kuan; Ulrich Wagner; Jesse Dixon; Leonard Lee; Victor V Lobanenkov; Bing Ren
Journal:  Nature       Date:  2012-08-02       Impact factor: 49.962

5.  Platelet function is modified by common sequence variation in megakaryocyte super enhancers.

Authors:  Romina Petersen; John J Lambourne; Biola M Javierre; Luigi Grassi; Roman Kreuzhuber; Dace Ruklisa; Isabel M Rosa; Ana R Tomé; Heather Elding; Johanna P van Geffen; Tao Jiang; Samantha Farrow; Jonathan Cairns; Abeer M Al-Subaie; Sofie Ashford; Antony Attwood; Joana Batista; Heleen Bouman; Frances Burden; Fizzah A Choudry; Laura Clarke; Paul Flicek; Stephen F Garner; Matthias Haimel; Carly Kempster; Vasileios Ladopoulos; An-Sofie Lenaerts; Paulina M Materek; Harriet McKinney; Stuart Meacham; Daniel Mead; Magdolna Nagy; Christopher J Penkett; Augusto Rendon; Denis Seyres; Benjamin Sun; Salih Tuna; Marie-Elise van der Weide; Steven W Wingett; Joost H Martens; Oliver Stegle; Sylvia Richardson; Ludovic Vallier; David J Roberts; Kathleen Freson; Lorenz Wernisch; Hendrik G Stunnenberg; John Danesh; Peter Fraser; Nicole Soranzo; Adam S Butterworth; Johan W Heemskerk; Ernest Turro; Mikhail Spivakov; Willem H Ouwehand; William J Astle; Kate Downes; Myrto Kostadima; Mattia Frontini
Journal:  Nat Commun       Date:  2017-07-13       Impact factor: 14.919

6.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

7.  On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.

Authors:  Dan Graur; Yichen Zheng; Nicholas Price; Ricardo B R Azevedo; Rebecca A Zufall; Eran Elhaik
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

8.  Polycomb contraction differentially regulates terminal human hematopoietic differentiation programs.

Authors:  A Lorzadeh; C Hammond; F Wang; D J H F Knapp; J Ch Wong; J Y A Zhu; Q Cao; A Heravi-Moussavi; A Carles; M Wong; Z Sharafian; J Steif; M Moksa; M Bilenky; P M Lavoie; C J Eaves; M Hirst
Journal:  BMC Biol       Date:  2022-05-13       Impact factor: 7.364

9.  Familial thrombocytopenia due to a complex structural variant resulting in a WAC-ANKRD26 fusion transcript.

Authors:  Lara Wahlster; Jeffrey M Verboon; Leif S Ludwig; Susan C Black; Wendy Luo; Kopal Garg; Richard A Voit; Ryan L Collins; Kiran Garimella; Maura Costello; Katherine R Chao; Julia K Goodrich; Stephanie P DiTroia; Anne O'Donnell-Luria; Michael E Talkowski; Alan D Michelson; Alan B Cantor; Vijay G Sankaran
Journal:  J Exp Med       Date:  2021-06-07       Impact factor: 17.579

10.  Stratification of TAD boundaries reveals preferential insulation of super-enhancers by strong boundaries.

Authors:  Yixiao Gong; Charalampos Lazaris; Theodore Sakellaropoulos; Aurelie Lozano; Prabhanjan Kambadur; Panagiotis Ntziachristos; Iannis Aifantis; Aristotelis Tsirigos
Journal:  Nat Commun       Date:  2018-02-07       Impact factor: 14.919

View more
  1 in total

Review 1.  Non-coding genetic variation in regulatory elements determines thrombosis and hemostasis phenotypes.

Authors:  Luca Stefanucci; Mattia Frontini
Journal:  J Thromb Haemost       Date:  2022-05-23       Impact factor: 16.036

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.