| Literature DB >> 27924024 |
Ivan Yevshin1,2, Ruslan Sharipov1,2,3, Tagir Valeev1,4, Alexander Kel1,5, Fedor Kolpakov6,2.
Abstract
GTRD-Gene Transcription Regulation Database (http://gtrd.biouml.org)-is a database of transcription factor binding sites (TFBSs) identified by ChIP-seq experiments for human and mouse. Raw ChIP-seq data were obtained from ENCODE and SRA and uniformly processed: (i) reads were aligned using Bowtie2; (ii) ChIP-seq peaks were called using peak callers MACS, SISSRs, GEM and PICS; (iii) peaks for the same factor and peak callers, but different experiment conditions (cell line, treatment, etc.), were merged into clusters; (iv) such clusters for different peak callers were merged into metaclusters that were considered as non-redundant sets of TFBSs. In addition to information on location in genome, the sets contain structured information about cell lines and experimental conditions extracted from descriptions of corresponding ChIP-seq experiments. A web interface to access GTRD was developed using the BioUML platform. It provides: (i) browsing and displaying information; (ii) advanced search possibilities, e.g. search of TFBSs near the specified gene or search of all genes potentially regulated by a specified transcription factor; (iii) integrated genome browser that provides visualization of the GTRD data: read alignments, peaks, clusters, metaclusters and information about gene structures from the Ensembl database and binding sites predicted using position weight matrices from the HOCOMOCO database.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27924024 PMCID: PMC5210645 DOI: 10.1093/nar/gkw951
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison of databases that are based on ChIP-seq data
| Database, URL | Source of human and mouse data | Number of samples (TF-related)* | Number of TFs | Number of ChIP-seq peak callers used | Metaclus-ter approach | Uniform data processing | Genome browser |
|---|---|---|---|---|---|---|---|
| ChIPBase ( | GEO, ENCODE | total 3549 human 2498 mouse 1036 rat 15 | 252 TFs and non-TFs for 10 species | >10 in total, but no uniform pipeline, each ChIP-seq is processed by different peak caller | No | No | Self-developed: deepView genomeView |
| Cistrome DB ( | GEO, SRA, ENA, ENCODE | total 10 276 (TF+non-TF) human 5774 mouse 4502 rat 0 | 260 TFs and non-TFs | 1 (MACS2) | No | Yes | UCSC genome browser |
| ENCODE ( | ENCODE | total 1448 human 1254 mouse 194 rat 0 | 295 TFs and non-TFs for human, 52 TFs and non-TFs for mouse | 5 (SPP, GEM, PeakSeq, MACS, Hotspot/Hotspot2) | No | Yes | Self-developed: UCSC genome browser and WashU epigenome browser |
| Factorbook ( | ENCODE | total 1007 human 837 mouse 170 rat 0 | 167 TFs, co-factors and chromatin remodeling factors for human, 51—for mouse | None | No | No | No |
| GTRD ( | GEO, SRA, ENCODE | total 5078 human 2955 mouse 2107 rat 16 | 476 human and 257 mouse sequence specific TFs, corresponding to 542 TFClass classes. | 4 (MACS, SISSRs, GEM, PICS) | Yes | Yes | Self-developed |
| ChIP-Atlas ( | SRA | total 10 774 human 5914 mouse 4860 rat 0 | 699 human and 502 mouse TFs and others. | 1(MACS2) | No | Yes | IGV |
| GeneProf ( | SRA, ENCODE, literature | total 1692 human 693 mouse 999 rat 0 | 133 human and 131 mouse TFs | 1(MACS) | No | Yes | Self-developed: based on GenomeGraphs |
| NGS-QC ( | GEO | total 6672 human 4234 mouse 2438 rat 0 | unknown | None | No | Yes | No |
*The number of ChIP-seq samples cannot be directly compared between databases as definition of sample may be distinct.
Figure 1.Reconstruction of the human USF1 TFBS in the neighborhood of the PIGR gene by using the GTRD six-step workflow. From the bottom to the top: Step 1: reduction of raw data to FASTQ format; Step 2: read alignment for nine datasets (from a to i; reads of the last one are depicted for demonstration purpose); Step 3: ChIP-seq peaks (with denoted centers) identified by four peak callers for nine datasets a-i; Step 4: peak clusters calculated for each peak caller result; Step 5: metacluster calculated on the base of four clusters; Step 6: USF1 TFBS identified by using respective PWM from the HOCOMOCO database; K. USF1 TFBS known from literature (26,27); G. A part of the PIGR gene structure.
Figure 2.Infocard for the reconstructed USF1 TFBS from Figure 1. Such information is reachable by clicking on a metacluster in the genome browser.