| Literature DB >> 30445619 |
Ivan Yevshin1, Ruslan Sharipov1,2,3, Semyon Kolmykov1,4, Yury Kondrakhin1,2, Fedor Kolpakov1,2.
Abstract
The current version of the Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org) contains information about: (i) transcription factor binding sites (TFBSs) and transcription coactivators identified by ChIP-seq experiments for Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Arabidopsis thaliana; (ii) regions of open chromatin and TFBSs (DNase footprints) identified by DNase-seq; (iii) unmappable regions where TFBSs cannot be identified due to repeats; (iv) potential TFBSs for both human and mouse using position weight matrices from the HOCOMOCO database. Raw ChIP-seq and DNase-seq data were obtained from ENCODE and SRA, and uniformly processed. ChIP-seq peaks were called using four different methods: MACS, SISSRs, GEM and PICS. Moreover, peaks for the same factor and peak calling method, albeit using different experiment conditions (cell line, treatment, etc.), were merged into clusters. To reduce noise, such clusters for different peak calling methods were merged into meta-clusters; these were considered to be non-redundant TFBS sets. Moreover, extended quality control was applied to all ChIP-seq data. Web interface to access GTRD was developed using the BioUML platform. It provides browsing and displaying information, advanced search possibilities and an integrated genome browser.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30445619 PMCID: PMC6323985 DOI: 10.1093/nar/gky1128
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The content of the GTRD database and its derived informational resources.
Data statistics for human and mouse TFs and their respective binding sites predicted with position weight matrices taken from the HOCOMOCO database
| Species | Number of TFs | Number of TFBSs |
|---|---|---|
|
| 402 | 445249948 |
|
| 358 | 366668327 |
Comparison statistics for GTRD and other databases based on ChIP-seq data
| Database | Number of TF ChIP-seq samples* | Number of TFs | Species | ChIP-seq peak callers | Meta-cluster approach | ||
|---|---|---|---|---|---|---|---|
| GTRD v18.06 | total: | 17485** | total: | 2399 |
| MACS, SISSRs, GEM, PICS | Yes |
| human: | 7239** | human: | 852 | ||||
| ChIP-Atlas | total: | 19414** | total: | 1929** |
| MACS2 | No |
| human: | 8368** | human: | 820** | ||||
| Cistrome DB | total: | 20408** | total: | Unknown |
| MACS2 | No |
| human: | 11348** | human: | Unknown | ||||
| ReMap 2018 | total: | 2829** | total: | 485** |
| MACS2 | Yes (CRMs) |
| human: | 2829** | human: | 485** | ||||
| ENCODE | total: | 3684 | total: | Unknown |
| SPP, GEM, PeakSeq, MACS | No |
| human: | 2489 | human: | Unknown | ||||
| ChIPBase | total: | 4290 | total: | Unknown |
| >10 in total, but no uniform pipeline, each ChIP-seq is processed by different peak caller | No |
| human: | 2498 | human: | Unknown | ||||
| Factorbook | total: | 1007 | total: | 167** |
| None | No |
| human: | 837 | human: | 51** | ||||
| NGS-QC | total: | 22398 | total: | Unknown |
| None | No |
| human: | 11597 | human: | Unknown | ||||
*The number of ChIP-seq samples cannot be directly compared between databases as definition of sample may be distinct.
**These numbers includes non-TF ChIP-seq samples and non-TF proteins besides TF-related.