| Literature DB >> 21602262 |
Tom Whitington1, Martin C Frith, James Johnson, Timothy L Bailey.
Abstract
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) allows researchers to determine the genome-wide binding locations of individual transcription factors (TFs) at high resolution. This information can be interrogated to study various aspects of TF behaviour, including the mechanisms that control TF binding. Physical interaction between TFs comprises one important aspect of TF binding in eukaryotes, mediating tissue-specific gene expression. We have developed an algorithm, spaced motif analysis (SpaMo), which is able to infer physical interactions between the given TF and TFs bound at neighbouring sites at the DNA interface. The algorithm predicts TF interactions in half of the ChIP-seq data sets we test, with the majority of these predictions supported by direct evidence from the literature or evidence of homodimerization. High resolution motif spacing information obtained by this method can facilitate an improved understanding of individual TF complex structures. SpaMo can assist researchers in extracting maximum information relating to binding mechanisms from their TF ChIP-seq data. SpaMo is available for download and interactive use as part of the MEME Suite (http://meme.nbcr.net).Entities:
Mesh:
Substances:
Year: 2011 PMID: 21602262 PMCID: PMC3159476 DOI: 10.1093/nar/gkr341
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Algorithm procedure and output. Step 1. The strongest match to the specified primary motif is identified in each ChIP-seq peak region genomic sequence. Each sequence is centred on the motif occurrence and trimmed to a consistent length. Step 2. A library of secondary motifs is considered. For a given secondary motif, the processed sequences are scanned to identify the strongest match in each sequence, and the displacements from the primary hit to the secondary hit are recorded. Output. Same-strand and opposite-strand histograms are produced. For the example output shown, the primary and secondary motifs are Gata6 and Ebox, respectively, and the input ChIP-seq data set is human GATA1 in the K562 cell line. The same-strand displacement histogram indicates a clear enrichment of sequences with a secondary–primary displacement of −8 bp.
Positive predictive value of top predictions
| TF/tissue | Primary motif | Secondary motif | Likely partner | Lowest | Evid. |
|---|---|---|---|---|---|
| Esrrb/ESC | C Esrrb | C Esrrb | Esrrb | 4.23 × 10−56 | S ( |
| STAT1/HeLa Stim. | C Stat3 | J YY1 | YY1 | 1.52 × 10−29 | |
| GABP/Jurkat | U Gabpa i | U Fhl1 | ? | 7.95 × 10−28 | |
| cFos/Gm12878 | C NFYA | J NFYA | C/EBP | 2.87 × 10−23 | S ( |
| cFos/K562 | C NFYA | U Cbf1 b | ? | 8.62 × 10−21 | |
| Jund/Gm12878 | U Jundm2 ii | U Irf4 i | Irf4 | 2.02 × 10−16 | |
| GATA1/K562b | U Gata6 i | C Ebox | SCL | 2.76 × 10−16 | ( |
| cJun/K562 | U Jundm2 ii | J SPIB | PU.1 | 3.49 × 10−16 | ( |
| cFos/K562 | U Jundm2 ii | J SPIB | PU.1 | 9.24 × 10−14 | ( |
| Tcfcp2l1/ESC | C Tcfcp2l1 | C Tcfcp2l1 | Tcfcp2l1 | 9.24 × 10−14 | S |
| GATA1/G1EER4 | U Gata6 i | U Ascl2 i | SCL | 1.32 × 10−10 | ( |
| STAT1/HeLa Stim. | C Stat3 | J YY1 | YY1 | 9.70 × 10−10 | |
| Srebp1a/Hepg2 | C Srebp | U Rsc30 | ? | 3.58 × 10−8 | |
| Klf4/ESC | U Klf7 i | U Zfp740 i | Klf4 | 4.35 × 10−7 | S |
| Nfe2/K562 | C Nfe2 | U Jundm2 ii | Nfe2 | 1.08 × 10−5 | S |
| cMyc/K562 | J Mycn | J bZIP910 | ? | 6.30 × 10−5 | |
| Sox2/ESC | C Oct4 | U Sry ii | Sox2 | 1.33 × 10−4 | ( |
| Tcf4/Hct116 | U Tcf3 i | U Jundm2 ii | c-Jun | 3.12 × 10−4 | ( |
| SRF/Jurkat | U Srf i | J ETS1 | SAP-1 | 3.99 × 10−4 | ( |
| E2F1/ESC | J E2F1 | J YY1 | YY1 | 9.39 × 10−4 | ( |
For each input dataset that yielded one or more results at a P-value threshold of 0.001, the single most significant result is presented. In the first column, the TF tissue and reference for the ChIP-seq data set is given. The ‘primary motif’ indicates the motif used during the first step of the algorithm. The ‘secondary motif’ indicates the motif found to exhibit the significant spacing. Summary names are provided for both motifs, where ‘J’ indicates a JASPAR (15) motif, ‘U’ indicates a Uniprobe (16) motif, ‘C’ indicates a custom motif. Corresponding sequence logos (29) are shown in Supplementary Table S4. The ‘Likely partner’ column indicates the TF that we manually assigned to the secondary motif, with ‘?’ indicating we could not assign a likely partner. The P-value corresponds to the single most significant spacing interval. The ‘Evid.’ column states evidence validating the given prediction, with references indicating literature confirmation, and ‘S’ indicating that the primary and secondary motifs are highly similar.
Classes of motif spacing
In the first column, the genome assembly, TF, tissue and reference for the input ChIP-seq data set is given. For ‘Primary motif’ and ‘Secondary motif’ columns, the sequence logos and summary names are provided. Same strand and opposite strand displacement histograms are shown in columns three and four. The X-axis of each histogram shows the motif displacement value. The Y-axis shows the number of sequences that exhibited the given secondary–primary motif displacement value, and is scaled linearly with the origin corresponding to zero. The ‘Sig. Interval’ specifies the displacement value and strand for the single most significant interval, with ‘Opp.’ indicating opposite strand. The corrected P-value of that interval is given. The ‘Evid.’ column is described in Table 1. ‘#’: the cited studies demonstrate that GATA1 and Tcfe2a (Tcf3; E2A; E47) form at least two distinct DNA-binding complexes. While neither of these complexes correspond to our predicted ‘U Gata6 i’/‘JHand1::Tcfe2a’ motif spacing, they do support our predicted association between GATA1 and Tcfe2a. The reverse complement of the ‘C NFYA’ motif is shown in row 5 in order to exhibit similarity with the secondary motif ‘J NFYA’. Literature evidence is as follows: 1 = (12), 2 = (34), 3 = (35).
Discovery of multiple distinct spacings for a single TF
See Table 2 caption for explanation of columns. 1: This observation is supported by evidence from ref. (40).
Figure 2.Ternary complex structure elucidation. (A) Displacement histograms for GABP/CREB1, with corresponding predicted GABP/CREB1/DNA ternary complex structure. The distance indicated by the red dotted line is 6.8 Å. This is the minimum distance between any pair of GABP and CREB1 atoms at this estimated contact point. (B) Displacement histograms for SRF/ETS, with corresponding known SRF-ELK1 ternary complex structure (PDB accession 1K6O).