| Literature DB >> 27899657 |
René Dreos1, Giovanna Ambrosini2,3, Romain Groux2, Rouaïda Cavin Périer2, Philipp Bucher2,3.
Abstract
We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more specifically on the EPDnew division, which contains comprehensive organisms-specific transcription start site (TSS) collections automatically derived from next generation sequencing (NGS) data. Thanks to the abundant release of new high-throughput transcript mapping data (CAGE, TSS-seq, GRO-cap) the database could be extended to plant and fungal species. We further report on the expansion of the mass genome annotation (MGA) repository containing promoter-relevant chromatin profiling data and on improvements for the EPD entry viewers. Finally, we present a new data access tool, ChIP-Extract, which enables computational biologists to extract diverse types of promoter-associated data in numerical table formats that are readily imported into statistical analysis platforms such as R.Entities:
Mesh:
Year: 2016 PMID: 27899657 PMCID: PMC5210552 DOI: 10.1093/nar/gkw1069
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic representation of the EPDnew development and production pipeline. (A) Download of authoritative gene catalogs and primary TSS mapping data from public databases, data repositories and consortium websites. (B) Quality control (QC) of incoming data (e.g. read mapping efficiency, contaminations, etc.). (C) Data passing QC are reformatted and incorporated into the MGA repository. (D) Selection of a subset of TSS mapping experiments for generating a new organism-specific TSS collection. (E) Input data for a new module of EPDnew. (F) Organism-specific automatic database assembly pipeline tailored to the input data, see (3) for a detailed description of the human EPDnew assembly pipeline. (G) Preliminary or final TSS collection (H) Manual sanity checks of individual randomly selected promoter entries using the corresponding entry viewer, see Figure 2D for an example of an entry view. (I) Automatic quality evaluation of the TSS collections as a whole by motif enrichment tests, see Figure 1A for an example and ref (22) for an explanation of the method. (L) Feedback is collected from quality evaluation steps H and I. This may lead to the exclusion, replacement or addition of source data sets or modifications (e.g. program parameter fine-tuning) of the computational database generation pipeline. Note that the development of a final, publicly released EPDNnew module typically involves several evaluation-modification cycles.
Figure 2.EPDnew analysis and tools. (Instructions how to generate these figures via the EPD web server are given in Supplementary Data.) (A) TATA-box (continuous lines) and Initiatior (Inr, dotted lines) occurrence profiles in three H. sapiens promoters databases. This picture has been obtained with the use of OProf from the SSA program package for two EPDnew versions (3 and 4) and from a list gene starts from the UCSC Gene list, which was used as input for the generation of the EPDnew collections. (B) Distribution of nucleosomes around S. cerevisiae, H. sapiens and D. rerio promoters. The Figure is based on MNase-seq data from (13–15) and has been made with the ChIP-Cor tool from the ChIP-Seq server (5). The MNase-seq data are stored in the MGA repository and are directly accessible via a pull-down menu from the ChIP-Cor input form. This comparative analysis shows the differences in the position of the N+1 (+40 from the TSS for S. cerevisiae and +120 for H. sapiens and D. rerio), distance between two consecutive nucleosomes (+160 in S. cerevisiae and +180 in H. sapiens and D. rerio) and length of the nucleosome-free region for the three organisms. (C) An example of ChIP-Extract output to study nucleosome maps around S. cerevisiae promoters. Each row in the matrix represents a promoter whereas each column the counts of MNase-seq reads found at a specific distance from the TSS. In this example the ordering option in ChIP-Extract has been turned on, which orders rows according to their similarity with the average signal (shown in Figure 2B). This simple procedure shows that some yeast promoters do not have the expected chromatin organization. Data is from (15). (D) An example of D. rerio EPD Hub visualized at the UCSC Genome Browser for the promoter of the ccni gene. The single experiment CAGE tracks shows the promoter shifting from maternally induced TSS (top lines) to zygotic specific TSS (bottom lines) (21). The blue icon near the bottom shows the TSS assignment of the corresponding promoter entry in EPDnew. Note that the two narrow TSS clusters are represented by only one promoter entry since they are too close to each other. The minimum distance requirement for two separate alternative promoters in EPDnew is 100 bp. In such cases, the EPDviewer provides essential information to users interested in the very details of the transcription initiation patterns.
Current contents of EPDnew
| Organism, version | Promoters, genesa | TSS librariesb | Chromatin data MNase–Dnasec | ChIP-seq samples histones–PIC–TFsd |
|---|---|---|---|---|
| 25 503, 17 785 (95%) | 1088 | 23–998 | 2231–491–3794 | |
| 21 239, 17 565 (90%) | 339 | 4–0 | 174–60–384 | |
| 15 073, 12 603 (92%) | 57 | 6–23 | 29–12–189 | |
| 10 728, 10 235 (43%) | 12 | 4–4 | 12–3–1 | |
| 7120, 6363 (32%) | 8 | 6–6 | 2–1–3 | |
| 6493, 5712 (53%) | 16 | 0–0 | 0–0 | |
| 10 229, 10 177 (37%) | 1 | 0–0 | 0–0–32 | |
| 17 081, 15 828 (59%) | 8 | 0–0 | 8–0–0 | |
| 5117, 5110 (88%) | 19 | 1–27 | 0–8–17 | |
| 3440, 3438 (67%) | 1 | 8–8 | 6–0–51 |
aIn parenthesis is indicated the percentage of genes coverage.
bCAGE, GRO-cap and TSS-seq samples used to build the relative database.
cMNase-seq and DNase-seq samples that are present in the MGA repository.
dChIP-seq samples for histone marks and variants (such as H3K4me3, H2A.Z, H3), components of the PIC (such as Pol-II, TFIID, TFIIB, TBP, etc.) and Transcription Factors that are present in the MGA repository.