| Literature DB >> 22208168 |
Hugo J Parker1, Paul Piccinelli, Tatjana Sauka-Spengler, Marianne Bronner, Greg Elgar.
Abstract
BACKGROUND: Gene regulation through cis-regulatory elements plays a crucial role in development and disease. A major aim of the post-genomic era is to be able to read the function of cis-regulatory elements through scrutiny of their DNA sequence. Whilst comparative genomics approaches have identified thousands of putative regulatory elements, our knowledge of their mechanism of action is poor and very little progress has been made in systematically de-coding them.Entities:
Mesh:
Year: 2011 PMID: 22208168 PMCID: PMC3261376 DOI: 10.1186/1471-2164-12-637
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Meis2 CNEs from zebrafish and lamprey drive equivalent expression patterns in zebrafish and lamprey embryos. A, multiple alignment of orthologous genomic regions containing the gene c15orf41 (blue peak), downstream of meis2, revealing CNEs (red peaks). Human, zebrafish and lamprey sequences are aligned with the fugu sequence as a baseline. Zebrafish CNE 329X is translocated in the current zebrafish genome assembly so does not appear in this alignment. B-M, orthologous elements from lamprey (B-G) and zebrafish (H-M) drive similar GFP expression patterns in the nervous system of zebrafish embryos at 54hpf: element 3285 in the cranial ganglia (arrows) and primary neurons of the hindbrain and spinal cord (arrowhead) (B, H); 3288 in neurons of the hindbrain posterior to rhombomere (r) 4 (C, I), as determined by comparison with r3r5 RFP expression (D, J); 3299 in the anterior hindbrain - r2-4 for the lamprey homolog (E, F) and r3-4 plus the corresponding neural crest for the zebrafish homolog (K, L); 329X in the hindbrain and neurons of the midbrain (G, M). N-O , embryonic day 14-15 lamprey embryos transgenic for lamprey elements 3285 (N) and 3299 (O) show GFP expression in the cranial ganglia (arrowheads) and anterior hindbrain respectively, consistent with their expression in zebrafish (3285: B, 3299: E). P-Q, dorsal views of the head of lamprey (P) and zebrafish (Q) embryos transgenic for lamprey element 3299. cg: cranial ganglia; hb: hindbrain; mb: midbrain; nc: neural crest; sc: spinal cord.
Figure 2CNE 3299 harbours essential conserved Pbx-Hox and Meis binding-sites. A, multiple sequence alignment of a region of CNE 3299 from human, zebrafish and lamprey highlighting conserved Pbx-Hox (blue box) and Meis (green box) binding-site motifs. The specific sites mutated in elements sub1 and sub2 are indicated below the alignment. B-C, compared to wild-type 3299 expression (B), mutating the first Pbx-Hox and Meis motif cluster (sub1) results in the loss of reporter expression in the neural crest (arrow) and broader expression in the hindbrain (arrowhead) (C). nc: neural crest.
Figure 3Pbx-hox motifs in CNEs strongly resemble verified PBX-HOX binding-sites. Position frequency logos generated from Gnathostome alignments (based on 712 conserved human TGATNNAT motifs in 4529 CNE alignments), Human CNEs (generated from the CONDOR CNE set using Cis-finder [41]) and from previous studies [36] (Literature). The relative base frequencies at positions 5 and 6, and 9 and 10, in CNEs, are in good agreement with known functional Pbx-Hox binding sites, supporting a strong KR consensus.
Frequency of KR motifs, compared to shuffled versions, in different test sets
| Motif | Human:Shark CNEs (from 13] | ||||||
|---|---|---|---|---|---|---|---|
| TGTANNATKR | 171 | 188 | 388 | 65 | 52 | 12 | 3 |
| GTATNNATKR | 150 | 168 | 279 | 54 | 39 | 9 | 2 |
| GTTANNATKR | 150 | 178 | 325 | 79 | 65 | 8 | 2 |
| TTGANNATKR | 200 | 245 | 447 | 80 | 64 | 9 | 1 |
| TTAGNNATKR | 167 | 238 | 398 | 74 | 55 | 7 | 0 |
| ATGTNNATKR | 259 | 297 | 452 | 86 | 72 | 20 | 1 |
| ATTGNNATKR | 233 | 297 | 436 | 74 | 61 | 20 | 2 |
| AGTTNNATKR | 215 | 254 | 431 | 85 | 68 | 9 | 0 |
| TAGTNNATKR | 147 | 154 | 297 | 54 | 42 | 10 | 4 |
| TATGNNATKR | 176 | 198 | 365 | 74 | 60 | 10 | 3 |
| GATTNNATKR | 274 | 315 | 419 | 97 | 74 | 11 | 0 |
| TGATNNTAKR | 106 | 143 | 314 | 65 | 50 | 6 | 1 |
| TGTANNTAKR | 142 | 151 | 421 | 82 | 60 | 14 | 1 |
| GTATNNTAKR | 59 | 73 | 195 | 41 | 34 | 1 | 0 |
| GTTANNTAKR | 105 | 108 | 253 | 50 | 33 | 5 | 0 |
| TTGANNTAKR | 162 | 205 | 385 | 72 | 62 | 10 | 0 |
| TTAGNNTAKR | 73 | 97 | 235 | 41 | 31 | 0 | 0 |
| ATGTNNTAKR | 103 | 124 | 376 | 64 | 55 | 3 | 0 |
| ATTGNNTAKR | 136 | 158 | 305 | 57 | 42 | 6 | 1 |
| AGTTNNTAKR | 85 | 121 | 320 | 64 | 50 | 5 | 1 |
| TAGTNNTAKR | 66 | 69 | 198 | 37 | 27 | 1 | 0 |
| TATGNNTAKR | 84 | 94 | 345 | 80 | 62 | 5 | 1 |
| GATTNNTAKR | 144 | 177 | 292 | 58 | 42 | 2 | 0 |
| 165.38 | 196.58 | 353.54 | 70.58 | 55.46 | 8.33 | 1.25 | |
| 102.75 | 122.06 | 93.73 | 24.74 | 20.92 | 5.52 | 1.67 | |
| 1.57 | |||||||
| 5.68E-05 | 6.16E-05 | 3.30E-03 | 1.00E-04 | 2.00E-04 | N/S | 3.00E-04 |
Enrichment analysis for Pbx-Hox KR motifs, relative to shuffled versions (retaining G+C content for each binding site), within different sets of CNEs. CNEs from the VISTA enhancer browser (EB) and zebrafish cneBrowser (CB) sets have also been grouped according to annotated expression in the hindbrain (HB), branchial arches (BA) or cranial nerves (CN). All sequences are human except the Zebrafish cneBrowser set. N/S = not significant
Figure 4Pbx-Hox motifs correlate with segment-specific hindbrain and pharyngeal arch reporter expression. A-R, zebrafish elements from the lamprey (A-J, M, O, Q) and jawed vertebrate (K, L, N, P, R) CNE sets drive GFP expression in the hindbrain and pharyngeal arches. Elements: Evi1_40224 (A, B), Tshz3_43509 (C, D), NR2F2_27254 (E, F), Pax2_217 (G, dorsal view: H), ZNF503_32799 (I, J), Nkx6-1_4281 (K, L), Tshz3_24804 (M), Pax9_2099 (N), TshZ3_24805-6 (O), FoxP1_886 (P), Tshz3_24807 (Q), BCL11A_2554 (R). Expression in the hindbrain is often restricted to certain rhombomeres, as shown by comparison with r3r5 RFP expression (B, D, F, H, J, L). Tshz3_24807 drives expression in the trunk musculature (Q). Elements show temporal variation in reporter expression, expressing most strongly at 24-30hpf (C, D), 48-54hpf (A, B, E, F, I, J, Q) or 72-78hpf (G, H, K, L, M, N, O, P, R). hb: hindbrain; pa: pharyngeal arches; m: muscle.
Frequency of KR motifs in CNEs at different gene loci
| GENE | # KR motifs in test set | Length of CNE seq for locus (kb) | #hits per kb | # hits in control set (mean) | standard deviation | z-score | p-value |
|---|---|---|---|---|---|---|---|
| ZNF503 | 36 | 27.781 | 1.30 | 3.18 | 1.76 | 18.62 | 0.00E+00 |
| TSHZ3 | 30 | 23.323 | 1.29 | 3.09 | 1.77 | 15.23 | 0.00E+00 |
| IRX5 | 27 | 37.059 | 0.73 | 5.39 | 2.33 | 9.29 | 0.00E+00 |
| IRX2 | 21 | 23.981 | 0.88 | 3.10 | 1.80 | 9.95 | 0.00E+00 |
| TSHZ1 | 16 | 10.351 | 1.55 | 1.63 | 1.32 | 10.93 | 0.00E+00 |
| PBX3 | 16 | 17.886 | 0.89 | 1.89 | 1.35 | 10.44 | 0.00E+00 |
| HOXD9 | 16 | 17.77 | 0.90 | 2.19 | 1.44 | 9.59 | 0.00E+00 |
| NR2F2 | 16 | 18.99 | 0.84 | 2.52 | 1.59 | 8.49 | 0.00E+00 |
| NR2F1 | 16 | 25.655 | 0.62 | 3.72 | 1.84 | 6.67 | 2.53E-11 |
| MEIS2 | 16 | 24.553 | 0.65 | 3.42 | 1.91 | 6.59 | 4.49E-11 |
| ZFHX1B | 13 | 23.275 | 0.56 | 3.13 | 1.72 | 5.73 | 9.86E-09 |
| SALL3 | 12 | 11.405 | 1.05 | 1.43 | 1.21 | 8.76 | 0.00E+00 |
| FOXP1 | 12 | 15.857 | 0.76 | 1.73 | 1.24 | 8.25 | 2.22E-16 |
| MAF | 11 | 7.334 | 1.50 | 1.15 | 1.10 | 8.95 | 0.00E+00 |
| NKX6-1 | 10 | 6.853 | 1.46 | 0.82 | 0.92 | 9.94 | 0.00E+00 |
Details are shown for the 15 gene loci from the CONDOR CNE set with the highest number of Pbx-Hox KR motifs in their CNEs, showing enrichment relative to shuffled CNE sets (Methods and Additional File 6). For each gene locus, the number of Pbx-Hox KR motifs in the associated CNEs is given. The number of Pbx-Hox KR motifs per kb of CNE sequence for each locus (column 4) is calculated by dividing the number of Pbx-Hox KR motifs in the CNEs of that locus (column 2) by the total combined length of the CNEs in that locus (column 2). Control sets were generated by zero order Markov shuffling of CNEs at each locus in 1000 randomisations (Methods). Some gene loci also contain other genes besides the one after which they are named, for instance the IRX5 locus contains Irx3, Irx5 and Irx6.