| Literature DB >> 25177346 |
Iros Barozzi1, Pranami Bora2, Marco J Morelli2.
Abstract
DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions ("footprints"), corresponding to bound molecules on the DNA. Integrating footprint positions close to transcription start sites with motif analysis can reveal the presence of regulatory interactions between specific transcription factors (TFs) and genes. However, this inference heavily relies on the accuracy of the footprint call and on the sequencing depth of the DNase-seq experiment. Using ENCODE data, we comprehensively evaluate the performances of two recent footprint callers (Wellington and DNaseR) and one metric (the Footprint Occupancy Score, or FOS), and assess the consequences of different footprint calls on the reconstruction of TF-TF regulatory networks. We rate Wellington as the method of choice among those tested: not only its predictions are the best in terms of accuracy, but also the properties of the inferred networks are robust against sequencing depth.Entities:
Keywords: DNase-seq; bioinformatics tools and databases; comparison of methods; footprinting; gene regulatory networks
Year: 2014 PMID: 25177346 PMCID: PMC4133688 DOI: 10.3389/fgene.2014.00278
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1(A) Receiver-Operator Characteristic (ROC) curves for the predictions provided by the binding motifs alone. (B–D) ROCs for the sets of footprints obtained by DNaseR, Wellington and for the set used in Neph et al. (2012c). (E) Area Under the Curve (AUC) corresponding to the ROCs of (A–D) Wellington scores consistently better than all the other methods. (F) Running times for DNaseR and Wellington on chromosome 19, for different significance thresholds.
Figure 2Heatmaps summarizing the comparison among the TF-TF networks reconstructed with the sets of footprints obtained with DNaseR, Neph, Wellington in three different cell lines (K562, SkMC, HepG2). Networks obtained by running Wellington on 30 and 70% subsamples of aligned reads are also included. (A) Edge-to-edge correlation: DNaseR networks cluster separately; networks obtained with Wellington and Neph separate according to the cell type of origin. (B) The rank correlation of the betweenness centrality (a measure quantifying how many times a node is present in the shortest paths between two nodes) for the different networks show a comparable pattern, except that in this case K562 and HepG2 networks show much higher positive correlation between each other as compared to SkMC.