Literature DB >> 30371817

SEdb: a comprehensive human super-enhancer database.

Yong Jiang¹, Fengcui Qian¹, Xuefeng Bai¹, Yuejuan Liu¹, Qiuyu Wang¹, Bo Ai¹, Xiaole Han¹, Shanshan Shi¹, Jian Zhang¹, Xuecang Li¹, Zhidong Tang¹, Qi Pan¹, Yuezhu Wang¹, Fan Wang¹, Chunquan Li¹.

Abstract

Super-enhancers are important for controlling and defining the expression of cell-specific genes. With research on human disease and biological processes, human H3K27ac ChIP-seq datasets are accumulating rapidly, creating the urgent need to collect and process these data comprehensively and efficiently. More importantly, many studies showed that super-enhancer-associated single nucleotide polymorphisms (SNPs) and transcription factors (TFs) strongly influence human disease and biological processes. Here, we developed a comprehensive human super-enhancer database (SEdb, http://www.licpathway.net/sedb) that aimed to provide a large number of available resources on human super-enhancers. The database was annotated with potential functions of super-enhancers in the gene regulation. The current version of SEdb documented a total of 331 601 super-enhancers from 542 samples. Especially, unlike existing super-enhancer databases, we manually curated and classified 410 available H3K27ac samples from >2000 ChIP-seq samples from NCBI GEO/SRA. Furthermore, SEdb provides detailed genetic and epigenetic annotation information on super-enhancers. Information includes common SNPs, motif changes, expression quantitative trait locus (eQTL), risk SNPs, transcription factor binding sites (TFBSs), CRISPR/Cas9 target sites and Dnase I hypersensitivity sites (DHSs) for in-depth analyses of super-enhancers. SEdb will help elucidate super-enhancer-related functions and find potential biological effects.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Year: 2019 PMID： 30371817 PMCID： PMC6323980 DOI： 10.1093/nar/gky1025

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Super-enhancers are a large cluster of transcriptionally active enhancers enriched in enhancer-associated chromatin characteristics (1). Compared to typical enhancers, super-enhancers are larger, exhibit higher transcription factor density (2,3), and are frequently associated with key lineage-specific genes that control cell state and differentiation in somatic cells (4). In cancer cells, super-enhancers drive the expression of critical oncogenes such as CACNA1H (5), LMO1 (6), RARA (7) and TAL1 (8), suggesting that cancer cells generate super-enhancers at oncogenes that are involved in tumor pathogenesis (9). Mack et al. discovered 15 important super-enhancers in ependymoma. In the absence of any of 15 super-enhancers, the survival rate of ependymoma cancer cells was reduced by at least 50% (5). In neuroblastomas, super-enhancer-associated TFs networks may mediate lineage differentiation of normal development, leading to epigenetic regulation of neuroblastoma and internal heterogeneity of tumors (10). A large number of disease-associated sequence variations are preferentially enriched in super-enhancers of disease-related cell types (11). For example, disease-associated SNPs for autoimmune diseases such as rheumatoid arthritis are often located in super-enhancer regions (12). The causal SNP rs539846, which is localized to a super-enhancer in intron 3 of B cell lymphoma 2-modifying factor, influences chronic lymphocytic leukemia susceptibility through altering a conserved RELA-binding motif (13). Oldridge et al. found that carcinogenic dependence in tumor cells is due to the difference in polymorphisms between super-enhancer elements in the first intron of LMO1, which binds and directly regulates LMO1 expression (6). Together, these studies demonstrate the importance of super-enhancers in addressing key issues associated with cancer biology and cell differentiation. The studies highlight the important and widespread utility of super-enhancers in biological and medical research. Previous studies showed that the histone H3K27ac mark is an efficient and robust means of super-enhancer demarcation (1,7,14). Although several super-enhancer databases have been developed such as dbSUPER (15) and SEA (16). These databases are effective data sources for super-enhancer investigation. Existing databases provide only basic information about super-enhancers, such as their genome location, cell or tissue types and associated genes (17). However, with the rapid development of human epigenetics studies, human H3K27ac ChIP-seq datasets are accumulating. The effective collection and processing of these data are urgently needed. More importantly, a number of studies show that super-enhancer-associated SNPs and TFs strongly influence human disease and biology processes (6,11,13). Follow-up studies of super-enhancers largely depend on subsequent reliable regulatory annotation (1). Therefore, building a human super-enhancer database is necessary to integrate, analyze, and reveal the regulatory mechanism of super-enhancers to accelerate research and discovery of their functions. To this end, we developed a comprehensive human super-enhancer database (SEdb, http://www.licpathway.net/sedb). SEdb focuses on providing a large number of available resources on human super-enhancers. It annotates their potential cell specific functions in gene regulation. The current version of SEdb documented a total of 331 601 super-enhancers from 542 samples, including samples from NCBI GEO/SRA (18,19), ENCODE (20), Roadmap (20,21) and GGR (Genomics of Gene Regulation Project) (20). Furthermore, SEdb provides detailed genetic and epigenetic information about super-enhancers including common SNPs, motif changes, eQTLs, risk SNPs, TFBSs, CRISPR/Cas9 target sites, DHSs and enhancers. The database supports the display of SNP effects on regulatory motifs for performing in-depth analyses of super-enhancers. SEdb is a comprehensive human super-enhancer database that integrates multiple functions of storage, browsing, annotation, and analysis. It could become a powerful work platform for mining deep functions and finding relevant regular patterns about super-enhancers.

DATA SOURCE AND PROCESSING

Identification of super-enhancers

In SEdb, we collected the 542 publicly available human H3K27ac samples for more than 240 tissues and cell types. To ensure the quality of super-enhancer identification, each of the H3K27ac samples collected by SEdb needs to contain H3K27ac ChIP-seq and the corresponding input control sequencing data. First, we integrated H3K27ac ChIP-seq data from NCBI GEO/SRA (18,19), ENCODE (20), Roadmap (20,21) and GGR (20) (Figure 1). Notably, we downloaded the data for ENCODE, Roadmap and GGR from the ENCODE/Roadmap website (www.encodeproject.org). In the process of screening NCBI GEO/SRA data, we did not consider samples that appeared in ENCODE, Roadmap or GGR, to prevent duplication. Furthermore, all data from ENCODE, Roadmap, GGR and GEO/SRA were further de-duplicated by manual screening according to the unique GEO/SRA series number. Second, to identify super-enhancers, Bowtie (v0.12.9) (1,22) was used for sequence alignment and to map ChIP-seq reads to hg19 reference genomes downloaded from UCSC Genome Bioinformatics (23). Third, for the sequence alignment file (in .sam format) generated by Bowtie, MACS14 (v1.4.2) (24) was used to identify enhancer enrichment regions. Fourth, the ROSE (9) algorithm was used to identify super-enhancers region as ‘python ROSE_main.py -g hg19 -i *******.gff –c *******_input.sort.bam -r *******_cas.sort.bam -o ******* -s 12500’. In the recognition process, H3K27ac peaks within ±1 kb of transcription start sites were subtracted and the enhancer sutured at a distance of 12 500 bp before ranking stitched enhancers according to H3K27ac ChIP-seq occupancy rates. Finally, a threshold was determined according to the geometric inflection point to distinguish between enhancers and super-enhancers (1,9). These steps identified 331 601 super-enhancers and 1 992 738 super-enhancer elements in the samples.

Figure 1.

Database content and construction. SEdb-calculated super-enhancers based on H3K27ac ChIP-seq data. Genetic and epigenetic annotations were collected or calculated including common SNPs, eQTLs, risk SNPs, TFBSs, CRISPR/Cas9 target sites, DHSs, enhancers, motif changes and LD SNPs. Users query super-enhancers using three types: tissue-category-based query, gene-based query and sample-based advanced query. SEdb includes analytical tools and personalized genome browser to discover potential biological effects of super-enhancers. DHS: Dnase I hypersensitivity site, TFBS: transcription factor binding site, eQTL: expression quantitative trait locus.

Annotation of super-enhancers

To mine the deeper functions of super-enhancers, we provided genetic and epigenetic annotations for each super-enhancer including common SNPs, motif changes, eQTLs, risk SNPs, TFBSs, CRISPR/Cas9 target sites, DHSs and enhancers. We used BEDTools (v2.25.0) (25) to annotate corresponding information to super-enhancers and displayed details of the annotation using interactive tables.

Common SNPs/Linkage disequilibrium SNPs/Risk SNPs

We obtained 38,063,729 common SNPs from dbSNP release 150 (26). For common SNPs, linkage disequilibrium (LD) was calculated using phased genotype information accompanying the 1000 Genomes Project phase 3 (27). We used VCFTools (v0.1.13) (28) to filter out SNPs with a minimum allele frequency (MAF) less than 0.05. We used plink (v1.9) (29) to calculate SNPs for MAF >0.05 in the LD (r2 = 0.8) of five super-populations (African, Ad Mixed American, East Asian, European and South Asian). For risk SNPs, genome-wide association studies (GWAS) results were obtained from a table curated by the GWAS Catalog (30) and GWASdb v2.0 (31) collection, which provided functional annotations for SNPs and insertion/deletions variants in the human disease/traits.

Motif changes

To annotate the effects of mutations on motifs, position weight matrices were collected from TRANSFAC (32) and JASPAR (33). The R package atSNP (34) was used to calculate binding affinities of mutations to motifs. For SNPs with MAF >0.05 of 1000 Genomes Project phase 3 (27) located in super-enhancer regions, a 30-bp region upstream and downstream of SNPs was calculated. After calculation, SEdb included 254 545 586 motif changes.

EQTLs

Human eQTL datasets were downloaded and merged from GTEx v5.0 (35,36), HaploReg (37) and PancanQTL (38). Data from GTEx v5.0 and HaploReg mainly included pairs of eQTL-gene relationships in different tissues. PancanQTL data included pairs of eQTL-gene relationships for different cancers in TCGA (https://tcga-data.nci.nih.gov/tcga). We mapped SNPs of eQTL and annotated them to super-enhancers regions, and the genes regulated by the SNP were provided for potential target genes for associated super-enhancers.

DHSs

DHS annotation was downloaded from UCSC (23) and ENCODE (20) for 69 860 705 DHSs of 293 samples. We match corresponding DNA hypersensitivity data to super-enhancers of sample/cell type in the database if DNA hypersensitivity data are available for that sample/cell type.

Enhancers

To annotate more enhancers within super-enhancer regions, we downloaded predicted enhancers from three major genome/epigenome annotation projects. We obtained 14 867 092 enhancers from ENCODE (20) and Roadmap (21) predicted by ChromHMM (39) method. We obtained 65,423 enhancers from FANTOM5 (40,41) predicted by cap analysis of gene expression (CAGE) (42).

TFBSs

TFBSs were downloaded from UCSC (23), including 5 797 266 TFBSs.

CRISPR/Cas9 target sites

CRISPR/Cas9 target sites can be used to induce precise cleavage of endogenous genomic loci in biological cells (43). The CRISPR/Cas9 target site annotation displays DNA sequence that targets the CRISPR RNA sequence and transcribed region within 200 bp of the genomic regions (23,44). CRISPR/Cas9 target sites were downloaded from UCSC (23), predicted by the tool CRISPOR (43), which helps to design, evaluate, and clone guide sequences for the CRISPR/Cas9 system.

Super-enhancers associated genes

Read density of H3K27ac ChIP-seq data around the TSS can be used to estimate which genes in a cell type are expressed (9). We therefore calculated the closest active genes of super-enhancers using H3K27ac ChIP-seq data, and proximities to genes using the Young et al. algorithm (45). Transcript ranking was obtained by grading the H3K27ac reading density in the ±1 kb region around the TSS for each transcript in each sample. The transcript was then assigned to the gene and the repeat gene was deleted, and the first two-thirds of the ranked gene was recognized as the closest active gene. Finally, the closest active genes of super-enhancers were obtained according to their proximities (45–47). The closest active genes of super-enhancers were used as the default gene-based query interface to query super-enhancers according to super-enhancer-associated genes. In addition, we used five other strategies to obtain super-enhancer-associated genes. Three of five strategies were the ROSE (9) method for predicting associated genes including overlap, proximal and closest. The other two were enhancer target–gene algorithms Lasso (48) and PreSTIGE (49), we directly downloaded the relevant results. When enhancers of samples were located in super-enhancer regions, the target gene of the enhancer was considered to be associated genes of the super-enhancer. The super-enhancer-associated genes obtained from all six strategies are provided in SEdb and can be used as the gene-based query interface in SEdb.

DATABASE USE AND ACCESS

A search interface for retrieving super-enhancers

The top navigation bar of SEdb is designed to help users access the database features (Figure 2A). SEdb provides a variety of query methods, including tissue-category-based, gene-based and sample-based advanced queries. Based on the tissue query, users can query the super-enhancer for all samples of a particular type of tissue. In the gene-based query, users can query a gene of interest and SEdb will return all super-enhancers that match the super-enhancer–gene relationship for all samples. In the sample-based advanced query, users determine the scope of the super-enhancer query by determining the sample and genome location for the results of interest. Brief information on the search results is displayed in a table on the results page (Figure 2B). The interactive table describes the super-enhancer's ID coded by SEdb (SE ID), genome location, size, rank, number of elements that are constituents of the super-enhancer, visualization and statistics of annotation in the region. For each sample, super-enhancers and typical enhancers can be downloaded from the results page. The results page also displays search parameters, sample description information, and usage parameters for the software. Users click ‘SE ID’ for details about the super-enhancer. In addition to general information about the super-enhancer (Figure 2C), SEdb lists more detailed annotation information including common SNPs, eQTLs, risk SNPs, TFBSs, CRISPR/Cas9 target sites, DHSs and enhancers (Figure 2D). For example, for common SNPs, SEdb provides the number of common SNPs in the super-enhancer region and details about common SNPs within the super-enhancer, computing the effect of SNPs on regulatory motifs (Figure 2D). Genes potentially associated with the super-enhancer are provided through five different identification strategies (Figure 2E). SEdb provides details and analysis of each element of the super-enhancer (Figure 2F). Detailed information for the elements is viewed by clicking ‘Detail’. A detailed page of the element includes annotation and summary information. SEdb provides links to other super-enhancers identified in samples that overlap with this super-enhancer, viewed in detail by clicking ‘SE ID’. This is equivalent to regional enrichment analysis (Figure 2G).

Figure 2.

The main functions and usage of SEdb. (A) Top navigation bar help users use functions of this database. (B) Table of search results including super-enhancer ID coded by SEdb (SE ID), Chr, Start, End, Size, Rank, Element, Common SNP, eQTL, Risk SNP, TFBS, CRISPR/Cas9 target site and Visualization (genome browser). (C) Overview of super-enhancer. (D) Detailed interactive table of annotation information. (E) Genes potentially associated with super-enhancers are provided through six identification strategies. Network diagram of their relationships. (F) Interactive table of super-enhancer elements, related annotation information and analysis tools. (G) Other super-enhancers identified in samples overlapping the super-enhancer. (H) Analysis of super-enhancers related to a query gene via relationships between super-enhancers and associated genes under different strategies. (I) Analysis of a common SNP, super-enhancers it appears in, and annotation of the common SNP. (J) Browse the sample details. (K) Visualization of genome browser for genetic and epigenetic information. (L) Data download. (M) Quantitative statistics of data sources in SEdb. (N) This is an overlap analysis tool that uses the super-enhancers provided in SEdb to annotate the user-submitted regions.

User-friendly browsing of samples

The ‘Data-Browse’ page is an interactive and alphanumerically sortable table that allows users to quickly search for samples and customize filters using ‘Data sources’, ‘Biosample type’, ‘Tissue type’ and ‘Biosample name’ (Figure 2J). Users use the ‘Show entries’ drop-down menu to change the number of records per page. To view super-enhancers for a given sample, users click on ‘Sample ID’.

Online analysis tools

Using the ‘Gene-SE analysis’ tool, users submit a gene and analyze super-enhancers associated with it via relationships between the super-enhancer and associated genes identified under different strategies (9,45,48,49) (Six strategies: the closest active gene, ROSE overlap, ROSE proximal, ROSE closest, Lasso and PreSTIGE) are obtained from determined and indeterminate samples (Figure 2H). SEdb also links to external resources including NCBI Gene (50,51), GeneCards (52), UniProt (53) and Wikipedia (https://www.wikipedia.org). With the ‘SNP-SE analysis’ tool, users submit a common SNP and find super-enhancers in which it appears, the super-enhancer's annotation information and LD SNPs of five super-populations (Figure 2I). With the ‘Overlap analysis’ tool, users submit a ’bed’ file and identify super-enhancers with overlapping relationships with the submitted regions by setting the percentage of overlap (Figure 2N). SEdb supports analysis tools of external links for search results and super-enhancer elements such as GREAT (54) and Galaxy (55).

Personalized genome browser and data visualization

To help users view proximity information of super-enhancers in genomes, we developed a personalized genome browser using JBrowse (56) with useful tracks (Figure 2K). Users see the proximity of super-enhancers to nearby genes, genome segments, SNPs, common SNPs, risk SNPs, DHSs, enhancers, TFBS conserved, TFBS by ChIP-seq and conservative score. SEdb exhibits super-enhancer-associated pie charts of chromosome distribution and histograms of annotation statistics (Figure 2C). Relationships between super-enhancers and associated genes are displayed using a D3 network visualization plugin (Figure 2E). SEdb also links to visualize data in the UCSC genome browser (23) by adding custom tracks.

CGI interface

For data sharing, a CGI program was built for SEdb. Users such as website developers provide a genome location and use the SEdb website to determine the super-enhancers that overlap with the location. Data obtained from the feedback is displayed directly on the platform.

Data download and statistics

SEdb provides downloads of super-enhancers and super-enhancer elements for each sample, including bed format and csv format (Figure 2L). Considering that typical enhancers may also be of importance to users, typical enhancers are also provided on the download page of the database. In addition, the database supports the packaged downloads of all enhancers, including super-enhancer and typical enhancer sets. The export of data is also supported for search results of interest to the users. In the ‘Statistics’ page, we can visually see the statistics of SEdb, including digital display and graphical display (Figure 2M). In addition, samples information for DHSs and enhancers are also provided.

Data submission

Database updates are critical to their sustainability. Users can share their H3K27ac data to SEdb. To ensure data quality control, we recommend that users submit the GEO/SRA series number for the raw data on the ‘Submit’ page. For data from other sources, the submitter needs to provide the corresponding accessible URL for storing the raw fastq files and the sample information for the submitted data (see ‘Submit’ page of SEdb for the detailed process). We will update the data dynamically according to number of samples, to ensure the timely release of the data.

SYSTEM DESIGN AND IMPLEMENTATION

The current version of SEdb was developed using MySQL 5.7.17 (http://www.mysql.com) and runs on a Linux-based Apache Web server (http://www.apache.org). We used PHP 7.0 (http://www.php.net) for server-side scripting. We designed and built the interactive interface using Bootstrap v3.3.7 (https://v3.bootcss.com) and JQuery v2.1.1 (http://jquery.com). We used ECharts (http://echarts.baidu.com) (57) and D3 (https://d3js.org) as a graphical visualization framework, and JBrowse (http://jbrowse.org) is the genome browser framework. We recommend using a modern web browser that supports the HTML5 standard such as Firefox, Google Chrome, Safari, Opera or IE 9.0+ for the best display. The SEdb database is freely available to the research community using the web link (http://www.licpathway.net/sedb). Users are not required to register or login to access features in the database.

DISCUSSION

The emerging importance of super-enhancers in human diseases and biological processes, coupled with their exquisite tissue-specificity, raises the need for comprehensive super-enhancer catalogs of human. The existing databases, including dbSUPER and SEA, are based on data mainly from the ENCODE/Roadmap project and integrate other research results. These databases contain significantly fewer human ChIP-seq samples than NCBI GEO/SRA. Therefore, we created SEdb, a comprehensive human super-enhancer database with a large number of human samples. SEdb integrated 542 human H3K27ac samples from NCBI GEO/SRA, ENCODE, Roadmap and GGR, and calculated 331 601 super-enhancers based on these data. By manually curating and classifying 410 available H3K27ac samples from >2000 ChIP-seq samples from the NCBI GEO/SRA. To ensure the quality of super-enhancer identification, each of the H3K27ac samples collected by SEdb needs to contain H3K27ac ChIP-seq and the corresponding input control sequencing data. Furthermore, a sample, as well as cell type, will be contained in the database if super-enhancers were successfully identified in the sample by H3K27ac ChIP-seq and corresponding input control sequencing. The number of samples in SEdb was more than 5-fold the samples in dbSUPER. SEA was updated to new version 2.0. Compared to the previous version, ChIP-seq samples from more species, including humans, were supported. However, the number of human H3K27ac samples in SEdb was 353 more than the SEA v2.0. SEdb provides a user-friendly interface to search, browse, analyze and visualize information about super-enhancers. SEdb has rich annotations and element information and useful analysis tools and visualizations. Table 1 compares SEdb with other super-enhancer databases for information and functions, showing the SEdb advantages. SEdb provides: (i) comprehensive genetic and epigenetic annotation of super-enhancers including common SNPs, motif changes, eQTLs, risk SNPs, TFBSs, CRISPR/Cas9 target sites, DHSs and enhancers, and user-friendly displays with interactive tables; (ii) online analysis tools such as ‘Gene-SE analysis’, ‘SNP-SE analysis’, ‘Overlap analysis’ and analysis tools from external links such as GREAT and Galaxy for search results; (iii) a customized genome browser for user-friendly visualizing of genomic context information of super-enhancers and links for visualizing data in the UCSC genome browser by adding custom tracks; (iv) user-friendly browsing of samples; (v) a CGI interface that can be easily used and quickly generate super-enhancers that overlap with the user-submitted genome location; (vi) detailed internal information on super-enhancer elements, including related annotations and related analysis tools; (vii) overlapping contacts with other super-enhancers in different samples.

Table 1.

Comparison of SEdb with other databases that are based on human super-enhancer-related data and functions (20 June 2018)

Function type	Data type/Specific function	SEdb	dbSUPER	SEA v2.0
Interaction table /annotation	Number of human samples	542	102	189
	Number of human super-enhancers	331 601	69 205	164 398
	Strategies of super-enhancer associated genes^a	6	3	3
	Common SNP	✓
	Motif changed	✓
	eQTL	✓
	Risk SNP	✓
	TFBS	✓		✓
	CRISPR/Cas9 target site	✓		✓
	DHS	✓
	Enhancer^b	✓
	LD SNP	✓
Genome browser	Super-enhancers	✓		✓
	Super-enhancer elements	✓
	Genome segments	✓
	SNP	✓
	Common SNP	✓		✓
	Risk SNP	✓		✓
	TFBS conserved	✓
	TFBS by ChIP-seq	✓		✓
	CRISPR/Cas9 target site	✓		✓
	DHS	✓
	Enhancer	✓
	Conservative score	✓		✓
Analysis functions	Gene-SE analysis	✓
	SNP-SE analysis	✓
	Overlap analysis	✓	✓	✓
	Region analysis^c	✓	✓	✓
Data browse	Simple browse^d	✓	✓	✓
	Browse based on samples classification^e	✓
	Alphanumerically sortable table	✓
CGI tool	Genome location overlap^f	✓
Other functions	Super-enhancer element annotation	✓
	Overlap with other super-enhancers in different samples	✓

aSuper-enhancer-associated genes obtained by different strategies or algorithms.

bChromHMM method or CAGE to predict enhancers.

cExternal link to GREAT and Galaxy.

dSimple browser function of super-enhancer samples.

eClassification of samples including Data sources, Biosample type, Tissue type and Biosample name.

fQuickly generate super-enhancers that overlap with the user-submitted genome location.

Comparison of SEdb with other databases that are based on human super-enhancer-related data and functions (20 June 2018) aSuper-enhancer-associated genes obtained by different strategies or algorithms. bChromHMM method or CAGE to predict enhancers. cExternal link to GREAT and Galaxy. dSimple browser function of super-enhancer samples. eClassification of samples including Data sources, Biosample type, Tissue type and Biosample name. fQuickly generate super-enhancers that overlap with the user-submitted genome location. SEdb is a super-enhancer database with the largest number of human super-enhancers and samples and the most comprehensive annotation information about super-enhancers. Because sequence variants in super-enhancer regions increase the risk of common human diseases, detailed genetic information on super-enhancers such as risk SNPs, eQTLs and motif changes are provided in SEdb. The current version of SEdb mainly focuses on super-enhancers identified by H3K27ac data through the ROSE algorithm, though typical enhancers can also be identified by ROSE. We therefore also provided general information on typical enhancers in SEdb, considering that these may be of interest to users of this database. However, given that the number of typical enhancers in a sample is much greater than that of super-enhancers, no further detailed annotation of typical enhancers is provided in SEdb. In future versions, we will provide more annotation information of super-enhancers, especially for typical enhancers, and practical analysis tools. In order to keep the data update, we add more annotation information and practical analysis tools. We believe that SEdb can promote researches and discovery of more potential biological effects of super-enhancers. Click here for additional data file.

53 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Core transcriptional regulatory circuitry in human embryonic stem cells.

Authors: Laurie A Boyer; Tong Ihn Lee; Megan F Cole; Sarah E Johnstone; Stuart S Levine; Jacob P Zucker; Matthew G Guenther; Roshan M Kumar; Heather L Murray; Richard G Jenner; David K Gifford; Douglas A Melton; Rudolf Jaenisch; Richard A Young
Journal: Cell Date: 2005-09-23 Impact factor: 41.582

3. CAGE: cap analysis of gene expression.

Authors: Rimantas Kodzius; Miki Kojima; Hiromi Nishiyori; Mari Nakamura; Shiro Fukuda; Michihira Tagami; Daisuke Sasaki; Kengo Imamura; Chikatoshi Kai; Matthias Harbers; Yoshihide Hayashizaki; Piero Carninci
Journal: Nat Methods Date: 2006-03 Impact factor: 28.547

4. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

5. GREAT improves functional interpretation of cis-regulatory regions.

Authors: Cory Y McLean; Dave Bristor; Michael Hiller; Shoa L Clarke; Bruce T Schaar; Craig B Lowe; Aaron M Wenger; Gill Bejerano
Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908

6. BEDTools: a flexible suite of utilities for comparing genomic features.

Authors: Aaron R Quinlan; Ira M Hall
Journal: Bioinformatics Date: 2010-01-28 Impact factor: 6.937

7. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors: Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal: Genome Biol Date: 2009-03-04 Impact factor: 13.583

8. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors: V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Entrez Gene: gene-centered information at NCBI.

Authors: Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. Model-based analysis of ChIP-Seq (MACS).

Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583

64 in total

1. SEanalysis: a web tool for super-enhancer associated regulatory analysis.

Authors: Feng-Cui Qian; Xue-Cang Li; Jin-Cheng Guo; Jian-Mei Zhao; Yan-Yu Li; Zhi-Dong Tang; Li-Wei Zhou; Jian Zhang; Xue-Feng Bai; Yong Jiang; Qi Pan; Qiu-Yu Wang; En-Min Li; Chun-Quan Li; Li-Yan Xu; De-Chen Lin
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

2. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species.

Authors: Tianshun Gao; Jiang Qian
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

3. Hotspots of Aberrant Enhancer Activity in Fibrolamellar Carcinoma Reveal Candidate Oncogenic Pathways and Therapeutic Vulnerabilities.

Authors: Timothy A Dinh; Ramja Sritharan; F Donelson Smith; Adam B Francisco; Rosanna K Ma; Rodica P Bunaciu; Matt Kanke; Charles G Danko; Andrew P Massa; John D Scott; Praveen Sethupathy
Journal: Cell Rep Date: 2020-04-14 Impact factor: 9.423

4. VARAdb: a comprehensive variation annotation database for human.

Authors: Qi Pan; Yue-Juan Liu; Xue-Feng Bai; Xiao-Le Han; Yong Jiang; Bo Ai; Shan-Shan Shi; Fan Wang; Ming-Cong Xu; Yue-Zhu Wang; Jun Zhao; Jia-Xin Chen; Jian Zhang; Xue-Cang Li; Jiang Zhu; Guo-Rui Zhang; Qiu-Yu Wang; Chun-Quan Li
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

5. A High-Resolution Map of Human Enhancer RNA Loci Characterizes Super-enhancer Activities in Cancer.

Authors: Han Chen; Han Liang
Journal: Cancer Cell Date: 2020-10-01 Impact factor: 31.743

6. LncSEA: a platform for long non-coding RNA related sets and enrichment analysis.

Authors: Jiaxin Chen; Jian Zhang; Yu Gao; Yanyu Li; Chenchen Feng; Chao Song; Ziyu Ning; Xinyuan Zhou; Jianmei Zhao; Minghong Feng; Yuexin Zhang; Ling Wei; Qi Pan; Yong Jiang; Fengcui Qian; Junwei Han; Yongsan Yang; Qiuyu Wang; Chunquan Li
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

7. SEA version 3.0: a comprehensive extension and update of the Super-Enhancer archive.

Authors: Chuangeng Chen; Dianshuang Zhou; Yue Gu; Cong Wang; Mengyan Zhang; Xiangyu Lin; Jie Xing; Hongli Wang; Yan Zhang
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

Review 8. The essential but enigmatic regulatory role of HERVH in pluripotency.

Authors: Corinne E Sexton; Richard L Tillett; Mira V Han
Journal: Trends Genet Date: 2021-07-30 Impact factor: 11.639

9. Integrative analysis of genomic and epigenomic data reveal underlying superenhancer-mediated microRNA regulatory network for human bone mineral density.

Authors: Wei-Yang Bai; Jiang-Wei Xia; Xiao-Li Rong; Pei-Kuan Cong; Saber Khederzadeh; Hou-Feng Zheng
Journal: Hum Mol Genet Date: 2021-11-01 Impact factor: 6.150

Review 10. Super-enhancer-mediated core regulatory circuitry in human cancer.

Authors: Yuan Jiang; Yan-Yi Jiang; De-Chen Lin
Journal: Comput Struct Biotechnol J Date: 2021-05-05 Impact factor: 7.271