| Literature DB >> 29940855 |
Niels Aerts1,2, Suzanne de Bruijn1,3, Hilda van Mourik1,3, Gerco C Angenent1,3, Aalt D J van Dijk4,5,6.
Abstract
BACKGROUND: Correct flower formation requires highly specific temporal and spatial regulation of gene expression. In Arabidopsis thaliana the majority of the master regulators that determine flower organ identity belong to the MADS-domain transcription factor family. The canonical DNA binding motif for this transcription factor family is the CArG-box, which has the consensus CC(A/T)6GG. However, so far, a comprehensive analysis of MADS-domain binding patterns has not yet been performed.Entities:
Keywords: CArG-box; ChIP-seq; MADS-domain proteins; Sequence conservation; Transcription factor binding specificity
Mesh:
Substances:
Year: 2018 PMID: 29940855 PMCID: PMC6019531 DOI: 10.1186/s12870-018-1348-8
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Fig. 1CArG-box like binding motifs for MADS domain proteins involved in flower formation. Logos represents CArG-box motifs found by MEME. a AG (b) AP1 (c) AP3 (d) FLC (e) PI (f) SEP3 (g) SOC1 (h) SVP
Summary of the analyzed datasets
| Protein | Number of peaks | Amount of reads in sample file | Amount of reads in control file |
|---|---|---|---|
| AGAMOUS (AG) | 897 | 26,754,529 | 33,740,022 |
| APETALA1 (AP1) | 789 | 33,454,823 | 47,828,731a |
| APETALA3 (AP3) | 1237 | 31,863,205 | 29,265,976 |
| FLOWERING LOCUS C (FLC) | 59 | 18,810,650 | 19,800,993 |
| PISTILLATA (PI) | 2156 | 27,679,860 | 29,265,976 |
| SEPALLATA3 (SEP3) | 4447 | 40,853,093 | 47,828,731a |
| SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 (SOC1) | 301 | 31,448,718 | 35,116,752 |
| SHORT VEGETATIVE PHASE (SVP) | 445 | 22,114,548 | 54,952,456 |
aAP1 and SEP3 have the same control file
Fig. 2Enrichment of CArG-box variants in peak centers. A peak center is defined as the 250 bp upstream and downstream of the peak summit. a Frequency of peak centers containing different CArG-box variants divided by the frequency in random 500 bp stretches in the Arabidopsis thaliana genome. Black, relative frequency of CArG-box variant in all Arabidopsis promoters. b Kernel density plot of positions of different CArG-box variants in peak centers of SEP3 relative to peak summits
Summary of secondary motifs founda
| Dataset | GA/CT-rich motif | G-box | TCP type I motif | TCP type II motif | WRKY-like motif |
|---|---|---|---|---|---|
| AG | Yes | Yes | No | No | No |
| AP1 | Yes | No | No | Yes | No |
| AP3 | Yes | Yes | No | No | No |
| FLC | No | No | No | No | No |
| PI | Yes | Yes | No | No | No |
| SEP3 | Yes | Yes | Yes | Yes | Yes |
| SOC1 | Yes | No | No | Yes | No |
| SVP | No | Yes | No | No | No |
a Sequence logos of the motifs summarized in this table can be found in Additional file 10: Figure S4 (GA/CT-rich motif), Additional file 11: Figure S5 (G-box) and Additional file 12: Figure S6 (TCP type I and II). The WRKY-like motif is defined as GTTGACTTT
Fig. 3Non-CArG box motifs. a Relative enrichment of peak centers containing a secondary motif compared to the promoter background. The frequency of peak centers containing a secondary motif was calculated and divided by a background frequency. A peak center is defined as the 250 bp upstream and downstream of a peak summit. G-box: CACGTG, TCP class I: GGNCCCAC, TCP class II: GGGNCC(A/G)C. b Enrichment of a WRKY-like motif (GTTGACTTT) in SEP3 peaks. c Kernel density plot of positions of the perfect CArG-box and the WRKY-like motif in the peak center compared to the peak summit
Fig. 4Significantly overrepresented 3′ extensions of the CArG-box core. Enrichment of extensions of 3 nucleotides is calculated as the frequency of the extension after the CC(A/T)6GGN-core in ChIP-seq peaks divided by the expected frequency of the extension based on the frequencies of nucleotides in the ChIP-seq peaks. All extensions are depicted for which at least one dataset is significant at p < 0.05. For visualization purposes, all extensions that are enriched relative to what is expected from nucleotide frequencies, but are not significant, are set to 1. Note that a similar analysis, but for CArG-box like sequences picked up by MEME-ChIP, is presented in Additional file 15: Table S7
Fig. 5Conservation of CArG-boxes in ChIP-seq peaks among Arabidopsis thaliana ecotypes. For each position in each CArG-box, mutational entropy was divided by an average background entropy to give a mutation index (blue), averaged over all motif occurrences. This was also done for the subset of 428 perfect CArG-boxes (red). Positions for which the difference in mutation index between perfect CArG-boxes and all CArG boxes was statistically significant are indicated with an asterisk
Fig. 6Five models that explain the occurrence of different binding motifs in MADS-domain protein ChIP-seq data analyzed in the present study. a The MADS-domain protein binds to a CArG-box. b The MADS-domain protein binds to another transcription factor, which binds DNA at a motif specific for that transcription factor. c Same as (a), but because by chance or as part of an enhanceosome there is a binding site of another transcription factor close by, both the CArG-box and the other motif occur in the ChIP-seq peak. d The MADS-domain protein needs another transcription factor for binding to a motif that is a hybrid between a CArG-box and the motif for the other transcription factor. e The motif is competitively bound by the MADS-domain protein and another protein and is therefore a hybrid between a CArG-box and the motif of the other transcription factor