| Literature DB >> 25953800 |
Hamed S Najafabadi1, Mihai Albu1, Timothy R Hughes2.
Abstract
UNLABELLED: Current methods for motif discovery from chromatin immunoprecipitation followed by sequencing (ChIP-seq) data often identify non-targeted transcription factor (TF) motifs, and are even further limited when peak sequences are similar due to common ancestry rather than common binding factors. The latter aspect particularly affects a large number of proteins from the Cys2His2 zinc finger (C2H2-ZF) class of TFs, as their binding sites are often dominated by endogenous retroelements that have highly similar sequences. Here, we present recognition code-assisted discovery of regulatory elements (RCADE) for motif discovery from C2H2-ZF ChIP-seq data. RCADE combines predictions from a DNA recognition code of C2H2-ZFs with ChIP-seq data to identify models that represent the genuine DNA binding preferences of C2H2-ZF proteins. We show that RCADE is able to identify generalizable binding models even from peaks that are exclusively located within the repeat regions of the genome, where state-of-the-art motif finding approaches largely fail.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25953800 PMCID: PMC4547615 DOI: 10.1093/bioinformatics/btv284
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.RCADE workflow and benchmarking results. (A) RCADE starts by predicting a set of motifs from the target C2H2-ZF protein sequence, using a previously published bacterial-one-hybrid assay-based recognition code, or B1H-RC (Najafabadi ), which are evaluated against the ChIP-seq peak sequences to identify significantly enriched motifs, and are then iteratively optimized. (B) Benchmarking workflow for evaluation of RCADE. The peak sequences were divided into two sets of ERE-overlapping and non-ERE peaks. The ERE-overlapping peaks for each protein were used for motif discovery using RCADE, and the motifs were validated using non-ERE peaks. (C,D) Validation results for 18 ERE-binding proteins. The arrows show the improvement in the AUROC of RCADE motifs compared with seed B1H-RC motifs. (E) Example motifs for two proteins that show the largest difference between RCADE and MEME validation results. The top-scoring MEME motif is shown for each protein, followed by the top-scoring motif that is directly predicted from protein sequence using the B1H-RC, and the RCADE optimized motif. The Pearson similarity of the B1H-RC and RCADE motifs was calculated as described previously (Najafabadi )