| Literature DB >> 22287636 |
Arumugam Srinivasan1, Rakesh K Mishra.
Abstract
Chromatin domain boundary elements prevent inappropriate interaction between distant or closely spaced regulatory elements and restrict enhancers and silencers to correct target promoters. In spite of having such a general role and expected frequent occurrence genome wide, there is no DNA sequence analysis based tool to identify boundary elements. Here, we report chromatin domain Boundary Element Search Tool (cdBEST), to identify boundary elements. cdBEST uses known recognition sequences of boundary interacting proteins and looks for 'motif clusters'. Using cdBEST, we identified boundary sequences across 12 Drosophila species. Of the 4576 boundary sequences identified in Drosophila melanogaster genome, >170 sequences are repetitive in nature and have sequence homology to transposable elements. Analysis of such sequences across 12 Drosophila genomes showed that the occurrence of repetitive sequences in the context of boundaries is a common feature of drosophilids. We use a variety of genome organization criteria and also experimental test on a subset of the cdBEST boundaries in an enhancer-blocking assay and show that 80% of them indeed function as boundaries in vivo. These observations highlight the role of cdBEST in better understanding of chromatin domain boundaries in Drosophila and setting the stage for comparative analysis of boundaries across closely related species.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22287636 PMCID: PMC3378885 DOI: 10.1093/nar/gks045
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 4.Predicted boundary elements function as enhancer blockers in Drosophila S2 cells. (A) The enhancer-blocking assay vector, NPG, showing the neo resistance gene, the PE enhancer, the GFP reporter gene and the test DNA insertion site. If the test DNA blocks enhancer-promoter communication, the stably transfected cells would have a lesser number of GFP positive cells. (B) Flow cytometry analysis was used to determine number of GFP positive cells. For each test DNA, percentage of GFP positive cells was calculated and plotted relative to NPG vector transfection. Filled black boxes indicate strong enhancer-blocking activity and half-filled ones indicate moderate activity and empty boxes show weak or no blocking activity.
Motif frequency and fold enrichment in boundary compared to whole genome
| S. No. | Motif | Motif sequenceb | Boundary level frequency | Whole genome frequencyc,d | Fold enrichment in boundaries | |
|---|---|---|---|---|---|---|
| Boundary | Occurrencec | |||||
| 1 | BEAF | CGATA | SCS′ and BE28 | 9.15 | 1.292 | 7.09 |
| 2 | Zw5 | GCTGMG | SCS | 5.03 | 0.963 | 5.23 |
| 3 | GAF | GAGAG | Fab-7 | 4.89 | 1.344 | 3.64 |
| 4 | Su(Hw)-M1 | YRYTGCATAYYY | – | – | 0.022 | 156.55e |
| Su(Hw)-M2 | YWGCMTACTTHY | (2L-203)f | 3.47 | 0.022 | 156.55 | |
| 5 | Elba | MCAATAAG | Fab-7 and Fab-8 | 0.99 | 0.069 | 14.21 |
| 6 | CTCF-M1 | MHRGRKGKCGCY | Fab-8 | 2.49 | 0.016 | 150.91 |
| CTCF-M2 | YAGRKGKCGC | Fab-8 | 1.25 | 0.020 | 61.58 | |
| CTCF-M3 | RRCGCCMYCYRKY | Fab-8 | 1.25 | 0.008 | 165.67 | |
| 7 | CCAATTGG | Fab-7 | 1.63 | 0.022 | 73.02 | |
aMotif names are defined based on the binding protein for the purpose of computer searching. M1, M2 and M3 are alternative or additional binding motifs of the protein. bIUPAC code. cOccurrence per kb. dWhole genome used here includes only the Euchromatic regions (X, 2L, 2R, 3L, 3R and 4) of release 4.1. eAssigned based on the value obtained for Su(Hw)-M2 Motif, as both have similar genomic frequencies. f2L-203, 3.09, 3.28, 2L-203, X-103 and y-45.
Five boundary types and prediction criteria for new boundaries
| Boundary type | Motif cluster | Boundary mapping criteria | ||
|---|---|---|---|---|
| Specific feature | Motif/gapa | Score | ||
| 1. | ≥2 kinds of motifs | 8/90 | 60 | |
| 2. | ≥1 CTCF motif(s) | 2/90 | 75 | |
| 3. | Two Zw5 motifsb | 8/90 | 37 | |
| 4. | Six BEAF motifsc | 6/90 | 43 | |
| 5. | ≥2 Su(HW) motifs | 2/125 | 313 | |
Motifs in bold are the predominant/experimentally tested motifs in a particular boundary type, numbers in bracket indicate their occurrences. aMotifs/gap combination shows the number of total motifs required and with allowed average gap. bHere two high affinity motifs (Zw5 motif flanked by next Zw5 motif with 13 bases as maximum allowed gap) are required. cHere two high affinity motifs (BEAF motif flanked by next BEAF motif with 16 bases as maximum allowed gap) are required.
Figure 1.cdBEST analysis for the boundaries in the Drosophila Bithorax Complex. A 320 kb region of chromosome 3R, which consist of the BX-Complex is drawn according to scale. The upper yellow panel shows the in vivo binding profiles of various boundary proteins [plotted using a data from the recent study (64)]. The known boundaries were mapped and shown as red boxes. The lower panel shows annotated genes and cdBEST predicted boundaries with boundary numbers (corresponding to chr3R prediction). Dashed vertical lines show alignment of the cdBEST predictions against in vivo binding profiles of boundary interacting proteins.
Whole genome analysis for boundary elements using cdBEST
| Chromosome arm | Size (bp) | No. of boundaries | Boundary frequency [per 100 kb] | No. of genes | Gene density [genes/100 kb] | Average domain size | Average genes per domain |
|---|---|---|---|---|---|---|---|
| 2L | 23011 544 | 784 | 3.41 | 2766 | 12.0 | 29.4 | 3.5 |
| 2R | 21 146 708 | 830 | 3.92 | 3088 | 14.6 | 25.5 | 3.7 |
| 3L | 24 543 557 | 793 | 3.23 | 2848 | 11.6 | 31.0 | 3.6 |
| 3R | 27 905 053 | 953 | 3.42 | 3547 | 12.7 | 29.3 | 3.7 |
| 4 | 1 351 857 | 52 | 3.85 | 90 | 6.7 | 26.0 | 1.7 |
| X | 22 422 827 | 1164 | 5.19 | 2314 | 10.3 | 19.3 | 2.0 |
| Whole genome | 120 381 546 | 4576 | 3.80 | 14 653 | 12.2 | 26.3 | 3.2 |
aDomain size was calculated by dividing the chromosome size with number of boundaries.
Transposon associated multicopy boundary elements in D. melanogsater
| S. No. | Predicted boundary | Number of copies | Associated transposon | Predominant motif(s) |
|---|---|---|---|---|
| 1 | X_52 | 39 | Doc | GAF |
| Elba | ||||
| 2 | 2L_14 | 23 | blood | BEAF |
| 3 | 2R_83 | 8 | Rt1a | CTCF |
| GAF | ||||
| 4 | 4_2/4_3 | 12 | GATE | BEAF |
| CTCF | ||||
| 5 | X_1143 | 7 | G-element | BEAF |
| 6 | 2L_768 | 6 | Rt1b | CTCF |
| 7 | 4_48 | 5 | TART-A | BEAF |
| GAF | ||||
| 8 | 2L_86 | 5 | mdg3 | BEAF |
| 9 | X_921 | 5 | 297 | GAF |
| F7M |
Figure 2.Boundaries and their repetitive nature in 12 Drosophila species. Four different data series, boundaries, repetitive boundaries, genome sizes and their repeat contents are plotted in logarithmic scale covering all 12 Drosophila species. Repetitive boundaries curve closely follows the repeat contents of the genomes indicating a strong positive correlation between them (i.e. genomes with higher repeat content are more likely to have higher number of repetitive boundaries).
Figure 3.Predicted boundary elements mark the borders of Polycomb mediated repressed domains. The A and B parts are two representative regions of chromosome 3R of Drosophila genome. Upper panels show the predicted boundaries and annotated gene transcripts with scale. Lower panels show the binding profiles (ChIP/input ratio) for H3K27me3, PC, PSC and E(Z) proteins obtained from previously published ChIP-chip study (37).