| Literature DB >> 35685360 |
Juan Mulero Hernández1, Jesualdo Tomás Fernández-Breis1.
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.Entities:
Keywords: Bioinformatics; Biological databases; Enhancers; Gene regulation; Human
Year: 2022 PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Traditional chromatin loop model (A). The enhancer physically interacts with the promoter through chromatin flexibility and mechanisms like the loop extrusion model. In this way, the enhancer can provide molecular elements that increase transcription of the target gene. Alternatively, in the hub model, the spatial proximity of multiple regulatory sequences allows the recruitment of high concentrations of molecular elements that can generate a network and a microenvironment, even phase-separated, that increases the transcription of target genes. This model can explain phenomena like the regulation of multiple genes by the same sequence (B) and the regulation of the same gene by multiple sequences (C). Modified from www.addgene.org and [42].
Fig. 2Proposed model for the representation of enhancers. Each enhancer is located in a region of the genome and belongs to one or more classifications of enhancers, which may differ according to the biological sample in question, because enhancers are sequence specific. The identification of the enhancer has evidence derived from the methodology used and must have a bibliographic reference that allows to verify the information. As regulatory sequences, enhancers regulate genes and this regulation is also supported by evidence. In addition, the enhancers can be enriched with information of interest such as their link to diseases or the TFBS that compose the sequence.
Fig. 3Classification of the main types of enhancers found in the literature. The characteristics of enhancers do not have a homogeneous profile. For this reason, we find different classifications in the literature, which have been compiled in this figure. Each classification is based on different properties, so an enhancer can belong to several types at the same time.
Fig. 4Enhancers can be identified through different methodologies, which follow a certain strategy or approach that provides the level of evidence for the sequence. We can distinguish two main types of evidence. Based on chromatin features: they appeal to sequence features, thus correlated properties that are not a direct measure of enhancer activity. Reporter-based: these measure enhancer activity directly, but the interpretation of the results can be complex.
Fig. 5Similar to the identification of enhancers, the determination of EPI can follow different strategies, which provide the level of evidence for the regulatory relationship. Two main groups are also distinguished. Experimental methods determine the relationship directly. Computational methods make predictions and can follow two main approaches. Supervised methods generate a model from a training set, while unsupervised methods lack this set.
Brief description of the databases included in this study.
| CancerEnD | Diseases | Set of enhancers for TCGA cancer types |
| dbInDel | Mutations | Enhancer-associated insertion and deletion variants |
| dbSUPER | SE general annotation | Super-enhancers archive |
| ENdb | Diseases | A manually curated database of experimentally supported enhancers for human and mouse |
| EnDisease 2.0 | Diseases | A manually curated database for enhancer-disease associations |
| EnhancerAtlas 2.0 | General annotation | General annotation of enhancers in different human biosamples and other species |
| EnhancerDB | General annotation | General annotation of enhancers in different human biosamples |
| EnhFFL | Feed-forward loops (FFL) with enhancers | A database of enhancer mediated feed-forward loops for human and mouse |
| Ensembl Regulatory Build v105 | General annotation | Set of regions of the genome that probably are involved in gene regulation |
| ETph | Pig-human homology | General enhancers and their targets in pig and human |
| FANTOM5 | Transcribed enhancers | Transcription-capable enhancers |
| FOCS | EPI | Method for inferring an extended enhancer-promoter and predicted set |
| GeneHancer 4.8 (UCSC) | General annotation | Integration of enhancer sequences to generate a consensus set |
| HACER | Transcribed enhancers | Transcription-capable enhancers |
| HEDD | Diseases | Human enhancers with a focus on their links to diseases |
| HeRA | Transcribed enhancers | Transcription-capable enhancers |
| RAEdb | Enhancers identified by reporter assays | Enhancers identified by high-throughput reporter assays |
| SCREEN V3 | General annotation | Set of regions of the genome that probably are involved in gene regulation |
| Roadmap epigenomics | General annotation | Genome annotation in states |
| SEA 3.0 | SE general annotation | Super-enhancers archive |
| SEanalysis | Biological networks with SE | Super-enhancers associated with regulatory networks |
| SEdb 1.03 | SE general annotation | Super-enhancers archive |
| RefSeq GRCh38.p13 | General annotation | Annotation of functional elements in the reference genome |
| TiED | General annotation | Identification and annotation of active and transcribed enhancers in 10 tissues |
| VISTA Enhancer | General annotation | Validated enhancers with transgenic mice |
Type of enhancers hosted by each database. The types of enhancers not included in this table are not covered by any database included in this study.
| Enhancers (without classification) | CancerEnD, dbInDel, ENdb, EnDisease 2.0, EnhancerAtlas 2.0, EnhancerDB, Etph, FOCS, GeneHancer 4.8, HEDD, RAEdb, RefSeq GRCh38.p13, VISTA Enhancer |
| Super-enhancers | dbSUPER, ENdb, EnhFFL, SEA 3.0, SEanalysis, SEdb |
| Typical enhancers | EnhFFL, SEA 3.0, SEanalysis, SEdb |
| Constituent enhancers | dbSUPER, SEanalysis, SEdb |
| Epromoters | RAEdb |
| Proximal enhancers | SCREEN V3 |
| Distal enhancers | SCREEN V3 |
| Active enhancers | Ensembl Regulatory Build v105, Roadmap, TiED |
| Primed enhancers | Ensembl Regulatory Build v105, Roadmap |
| Poised enhancers | Ensembl Regulatory Build v105, Roadmap |
| Inactive enhancers | Ensembl Regulatory Build v105 |
| Transcribed enhancers | FANTOM5, HACER, HeRA, TiED |
Fig. 6Coverage of the different items in the 25 biological databases analyzed with information about human enhancers.
Databases classified by the experimental evidence supporting the sequences that they contain.
| CancerEnD | X | ||||||
| dbInDel | X | ||||||
| dbSUPER | X | X | |||||
| ENdb | X | X | X | X | X | X | |
| EnDisease 2.0 | |||||||
| EnhancerAtlas 2.0 | X | X | X | X | X | ||
| EnhancerDB | X | X | X | X | X | ||
| EnhFFL | X | X | X | ||||
| Ensembl Regulatory Build v105 | X | X | X | X | |||
| ETph | X | X | |||||
| FANTOM5 | X | ||||||
| FOCS | X | X | |||||
| GeneHancer 4.8 (UCSC) | X | X | X | X | |||
| HACER | X | ||||||
| HEDD | X | X | |||||
| HeRA | X | X | |||||
| RAEdb | X | ||||||
| Roadmap epigenomics | X | ||||||
| SCREEN V3 | X | ||||||
| SEA 3.0 | X | X | |||||
| SEanalysis | X | X | X | ||||
| SEdb 1.05 | X | X | X | ||||
| RefSeq GRCh38.p13 | X | X | |||||
| TiED | X | X | X | ||||
| VISTA Enhancer | X | X |
Experimental approach used in the identification of EPI, which constitute the evidence of the regulatory relationship between sequences.
| CancerEnD | X | ||||
| dbInDel | X | ||||
| dbSUPER | X | ||||
| ENdb | X | ||||
| EnhancerAtlas 2.0 | X | ||||
| EnhancerDB | X | ||||
| EnhFFL | X | ||||
| ETph | X | ||||
| FANTOM 5 | X | ||||
| FOCS | X | ||||
| GeneHancer 4.8 (UCSC) | X | ||||
| HACER | X | X | X | ||
| HEDD | X | ||||
| HeRA | X | ||||
| SEA 3.0 | X | ||||
| SEanalysis | X | X | |||
| SEdb 1.03 | X | X | |||
| TiED | X | ||||
| VistaEnhancer | X |