| Literature DB >> 34791440 |
Katherine T Decker1, Ye Gao1, Kevin Rychel1, Tahani Al Bulushi1, Siddharth M Chauhan1, Donghyuk Kim2, Byung-Kwan Cho3, Bernhard O Palsson1,4,5.
Abstract
The transcriptional regulatory network in prokaryotes controls global gene expression mostly through transcription factors (TFs), which are DNA-binding proteins. Chromatin immunoprecipitation (ChIP) with DNA sequencing methods can identify TF binding sites across the genome, providing a bottom-up, mechanistic understanding of how gene expression is regulated. ChIP provides indispensable evidence toward the goal of acquiring a comprehensive understanding of cellular adaptation and regulation, including condition-specificity. ChIP-derived data's importance and labor-intensiveness motivate its broad dissemination and reuse, which is currently an unmet need in the prokaryotic domain. To fill this gap, we present proChIPdb (prochipdb.org), an information-rich, interactive web database. This website collects public ChIP-seq/-exo data across several prokaryotes and presents them in dashboards that include curated binding sites, nucleotide-resolution genome viewers, and summary plots such as motif enrichment sequence logos. Users can search for TFs of interest or their target genes, download all data, dashboards, and visuals, and follow external links to understand regulons through biological databases and the literature. This initial release of proChIPdb covers diverse organisms, including most major TFs of Escherichia coli, and can be expanded to support regulon discovery across the prokaryotic domain.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34791440 PMCID: PMC8728212 DOI: 10.1093/nar/gkab1043
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of proChIPdb's sources, pipeline, and content. Gray arrows in this figure present the flow of data. Data sources (upper left) include the European Nucleotide Archive, Gene Expression Omnibus (GEO), Pubmed, the Sequence Read Archive, and in-house data (which has been posted to GEO). This data enters the processing pipeline (lower) through the various file types indicated by the blue flow arrows. Processing through the pipeline occurs from left to right via the gray arrows, with green rectangles indicating data types and files and black text indicating processing steps and tools. Towards the right of the pipeline, data feeds through the green arrows into the proChIPdb site (upper right), which consists of a binding site table, genome viewer, and feature visualization panel.
Key statistics for the datasets underlying proChIPdb. Individual columns to match the dataset pages for ‘Escherichia coli’ and ‘All Other Organisms’ are delineated as well as total counts across proChIPdb
| Dataset description | ChIP-seq and ChIP-exo data for | ChIP-seq and ChIP-exo data for all other organisms on proChIPdb | Total proChIP data | |
|---|---|---|---|---|
| Number of organisms | 1 | 13 | 14 | |
| Number of unique strains | 4 | 14 | 18 | |
| Number of TF pages | 65 | 35 | 100 | |
| Number of samples | ChIP-exo samples | 184 (92.9%) | 28 (38.4%) | 212 (78.2%) |
| ChIP-seq samples | 14 (7.1%) | 45 (61.6%) | 59 (21.8%) | |
| Total count | 198 | 73 | 271 | |
| Percent of TFs with curated TFBSs available | 92.3% (60 of 65) | 85.7% (30 of 35) | 90.0% (90 of 100) | |
Figure 2.TF dashboard webpage for E. coli K-12 MG1655 Fur regulator. (A) Metadata and relevant links. (B) Binding site table with curated list of binding peaks. Each tab contains the TFBSs for a unique condition (DPD versus Fe supplementation in this example). (C) Embedded Integrative Genomics Viewer (igv.js) component with annotation tracks and genome-wide peaks from raw data. (D–G) Feature visuals, which contains tabs for the various additional plots. (D) The active tab shows a histogram of binding peak widths. (E) A scatterplot of peak positions relative to their closest downstream gene. (F) The consensus binding sequence motifs.( G) Venn diagram comparing proChIPdb identified target genes versus literature regulon genes.