| Literature DB >> 27789702 |
Shenglin Mei1,2, Qian Qin1,2, Qiu Wu1,2, Hanfei Sun2, Rongbin Zheng2, Chongzhi Zang3,4, Muyuan Zhu2, Jiaxin Wu5, Xiaohui Shi2, Len Taing3, Tao Liu6, Myles Brown4,7, Clifford A Meyer8,4, X Shirley Liu9,2,3,4.
Abstract
Chromatin immunoprecipitation, DNase I hypersensitivity and transposase-accessibility assays combined with high-throughput sequencing enable the genome-wide study of chromatin dynamics, transcription factor binding and gene regulation. Although rapidly accumulating publicly available ChIP-seq, DNase-seq and ATAC-seq data are a valuable resource for the systematic investigation of gene regulation processes, a lack of standardized curation, quality control and analysis procedures have hindered extensive reuse of these data. To overcome this challenge, we built the Cistrome database, a collection of ChIP-seq and chromatin accessibility data (DNase-seq and ATAC-seq) published before January 1, 2016, including 13 366 human and 9953 mouse samples. All the data have been carefully curated and processed with a streamlined analysis pipeline and evaluated with comprehensive quality control metrics. We have also created a user-friendly web server for data query, exploration and visualization. The resulting Cistrome DB (Cistrome Data Browser), available online at http://cistrome.org/db, is expected to become a valuable resource for transcriptional and epigenetic regulation studies.Entities:
Mesh:
Year: 2016 PMID: 27789702 PMCID: PMC5210658 DOI: 10.1093/nar/gkw983
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of Cistrome DB data sources and web-interface features. Cistrome DB collects publically available ChIP-seq, DNase-seq and ATAC-seq data from gene expression omnibus (GEO), Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomics. Metadata is manually curated and annotated with PubMed information. All data are processed by a streamlined analysis pipeline and stored in a MySQL relationship database. Cistrome DB provides methods to query and visualize data. Users can search by key words or select by term. Detailed metadata annotations, analysis results and quality control (QC) metrics are presented for each sample. Data can be explored in more detail using the Cistrome analysis pipeline (Cistrome AP) and visualized using the UCSC and WashU genome browsers.
Figure 2.Database content. (A) Growth statistics of ChIP-seq and chromatin accessibility data. (B) Statistics of processed ChIP-seq and chromatin accessibility (CA) data in Cistrome DB. (C) Statistics of transcription factor and histone modification type. (D) Example of Quality control metric in Cistrome DB. (E) Batch sample visualization through WashU browser showing the co-binding pattern between master transcription factors in embryonic stem cells.