| Literature DB >> 25378343 |
René Dreos1, Giovanna Ambrosini2, Rouayda Cavin Périer3, Philipp Bucher4.
Abstract
We present an update of EPDNew (http://epd.vital-it.ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from high-throughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.Entities:
Mesh:
Year: 2014 PMID: 25378343 PMCID: PMC4383928 DOI: 10.1093/nar/gku1111
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Current contents of EPDnew
| Organism, version | Assembly | Promoters | Genes | Gene catalog |
|---|---|---|---|---|
| hg19 | 23 360 | 16 599 (89%) | UCSC known Genes (Mar 2009) | |
| mm9 | 21 239 | 17 565 (90%) | UCSC known Genes (Mar 2011) | |
| dm3 | 15 073 | 12 603 (92%) | ENSEMBL 70 | |
| danRer7 | 10 728 | 10 235 (43%) | ENSEMBL 75 | |
| ce6 | 7120 | 6 363 (32%) | WormBase (WS220) |
Source data
| EPDnew database | Source data: type, reference or source repository | # of libraries | total tags (millions) |
|---|---|---|---|
| CAGE from ENCODE/RIKEN, downloaded from UCSC genome browser database ( | 148 | 3841 | |
| CAGE from FANTOM5 ( | 339 | 6236 | |
| CAGE from modENCODE ( | 57 | 646 | |
| CAGE from Nepal | 12 | 65 | |
| GRO-cap from Kruesi | 8 | 236 |
Current contents of the MGA repository (# of samples)
| Data type | Human | Mouse | Flya | Wormb | Fishc | Yeastd |
|---|---|---|---|---|---|---|
| ChIP-Seq | 4738 | 523 | 220 | 2 | 9 | 46 |
| RNA-seqe | 160 | 339 | 63 | 19 | 12 | |
| DNase FAIRE etc. | 973 | |||||
| DNA methylation | 12 | 4 | ||||
| Annotationsf | 20 | 10 | 3 | 1 | 1 | 1 |
| Sequence-derivedg | 13 | 3 | 1 | 4 | ||
| Total | 5916 | 879 | 287 | 22 | 26 | 46 |
aD. melanogaster.
bC. elegans.
cD. rerio (zebrafish).
dSaccharomyces cerevisiae.
eonly TSS mapping data.
fincludes features derived from primary data such as published ChIP-Seq peak lists.
ge.g. genome conservation scores, SNPs, etc.
Figure 1.EPD analysis and selection tools. (a) TATA-box occurrence profile in human promoters. This picture has been obtained by following the OProf link from the human EPDnew home page and then selecting the TATA-box weight matrix from the ‘promoter motifs’ menu on the OPROF input form. (b) Distribution of H3K4me3-marked nucleosomes around human promoters. The figure is based on MNase-processed ChIP-Seq data from (19) stored in the MGA repository and accessible via a pull-down menu from the ChIP-Cor input form. (c) BED file containing genomic TSS coordinates of human promoters containing a match to the TATA-box weight matrix between positions −35 and −20 relative to the TSS. This list has been generated with the program FindM. (d) Genomic TSS coordinates of a promoter subset enriched in CAGE tags from lymphoblastoid cell line GM12878. This list was obtained by following the ‘ChIP-Cor’ link from the human EPDnew home page and then selecting the specific CAGE tag library as target feature via the pull-down menu on the input form. On the results page, the ‘Enriched Feature Extraction Option’ was used to select those promoters which contain at least 100 CAGE tags between positions −50 and +50 relative to the TSS given in EPDnew.