| Literature DB >> 35410161 |
Irene M Kaplow1,2, Abhimanyu Banerjee3, Chuan Sheng Foo4,5.
Abstract
BACKGROUND: Many transcription factors (TFs), such as multi zinc-finger (ZF) TFs, have multiple DNA binding domains (DBDs), and deciphering the DNA binding motifs of individual DBDs is a major challenge. One example of such a TF is CCCTC-binding factor (CTCF), a TF with eleven ZFs that plays a variety of roles in transcriptional regulation, most notably anchoring DNA loops. Previous studies found that CTCF ZFs 3-7 bind CTCF's core motif and ZFs 9-11 bind a specific upstream motif, but the motifs of ZFs 1-2 have yet to be identified.Entities:
Keywords: Binding strength; CTCF; Deep neural network; Motif; Mutated transcription factor; Zinc finger
Mesh:
Substances:
Year: 2022 PMID: 35410161 PMCID: PMC9004084 DOI: 10.1186/s12864-022-08486-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Using Differential Peak Prediction to Identify Motifs of Different DNA Binding Domains. To identify the motif of a DBD, we train a deep convolutional neural network to predict whether a TF ChIP-seq peak is preserved or significantly weaker in a dataset from that TF with a mutated DBD relative to a dataset from the wild-type TF. We then use DeepLIFT followed by TF-MoDISco to identify the motifs that the neural network learned. The browser tracks in this figure are pooled replicate fold-change bigWigs from wild-type CTCF and CTCF with ZF1 mutated from [11] and were visualized using the WashU Epigenome Browser [35] with assembly mm10 [36]. The motif logo in this figure is the motif we discovered when interpreting the model for the ZF 1 mutant
Fig. 2Performance of Neural Networks. a We compared the performance of our neural networks to those of logistic regressions in which the features were the motif hit scores of the motifs from [11]. We also compared both sets of models to logistic regressions with the top TF-MoDISco motif hit scores as their only features. Performance was measured by the area under the precision-recall curve (AUPRC). b We aggregated the hypothetical scores of the seqlets corresponding to the motifs from DeepLIFT followed by TF-MoDISco to visualize the TF-MoDISco motifs. The box indicates the discovered downstream motif, and the underlined part indicates the weak putative motif for ZF 2. The TF-MoDISco motif for ZF 1 has a G or a T at a position where the other TF-MoDISco motifs have a G (indicated by first arrow) and a G or an A at a position where the other TF-MoDISco motifs have a G (indicated by second arrow). The TF-MoDISco motif for ZF 8 emphasizes a downstream nucleotide in the upstream motif (indicated by arrow)
Fig. 3Comparisons of Discovered Downstream Motif to Other CTCF Data. a We compared our TF-MoDISco motifs from the mutants of ZFs 1 and 2 to aggregated reads from CTCF HT-SELEX cycle 4 and to computationally predicted motifs of CTCF’s DBDs from the RCADE2 model, which was trained on in vitro B1H ZF binding data. b We compared motif hits of the core followed by discovered downstream motif in reads from CTCF HT-SELEX data in cycle 0 to cycle 4. c We compared the strength of the core followed by discovered downstream motif in HeLa cell CTCF peaks from [28] to HeLa peaks from CTCF’s alternative isoform from the same study and HeLa CTCF peaks from ENCODE [39]
Fig. 4Proposed Motif for CTCF Based on Findings from Interpreting Neural Networks