| Literature DB >> 27739488 |
Bernard Fongang1,2, Fanping Kong1, Surendra Negi1, Werner Braun1, Andrzej Kudlicki1,2,3.
Abstract
The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated.Entities:
Mesh:
Year: 2016 PMID: 27739488 PMCID: PMC5064350 DOI: 10.1038/srep35415
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The average GC content of the coding sequences in the different regions of Hox genes.
| Species | GC 5′ region | GC Homeobox | GC 3′ region | 5′ > Hbx pval | 3′ > Hbx pval |
|---|---|---|---|---|---|
| Human | 0.654+/−0.058 | 0.524+/−0.057 | 0.624+/−0.068 | 1.6e-12 | 6.6e-7 |
| Mouse | 0.639+/−0.056 | 0.528+/−0.052 | 0.605+/−0.066 | 1.3e-14 | 3.7e-6 |
| Fly | 0.619+/−0.021 | 0.542+/−0.031 | 0.606+/−0.036 | 4.0e-4 | 2.7e-2 |
The GC content within the homeobox is significantly lower than outside of it; a pattern preserved between vertebrate and invertebrate animals. In the averages we included coding regions 3′ of the homeobox only if they were at least 60 bp long. The significances (t-test) of the differences are listed in the last two columns.
Figure 1The periodicity of hydroxyl cleavage pattern within the homeobox.
(A) The pattern of HRC in the mouse HoxB4 homeobox coding sequence (red) shows a period of three base pairs (dotted line represents a harmonic oscillation with a 3 bp period; note that the two plots are consistently in phase with one another). (B) The periodic signature is absent in other regions of the gene. (C) The periodograms of the homeobox HRC (red), the HRC of a coding region adjacent to the homeobox (green) and the HRC of a simulated DNA sequence coding for the same protein sequence of the homeodomain but using different codons (blue). The highly significant peak is present only in the actual homeobox. (D) The HRC3 patterns in homeobox DNA are more prevalent than in other coding sequences. The median periodogram of HRC of the homeoboxes of mouse Hox genes (red), all homeobox genes (green), outside of homeobox in homeotic genes (blue: 180 bp adjacent towards 5′ end, teal: between homeobox and the 3′ end), and randomly chosen coding sequences (dark blue). (E) Histograms of periodicity score at T = 3 base pairs.
The HRC3 signature in mouse and human Hox genes.
| Gene | Mouse | Human | ||||
|---|---|---|---|---|---|---|
| PHRC3 | P-value | GC | PHRC3 | P-value | GC | |
| HOXA1 | 5.540 | 3.90E-03 | 0.54 | 5.995 | 2.50E-03 | 0.55 |
| HOXA2 | 3.540 | 2.90E-02 | 0.50 | 3.771 | 2.30E-02 | 0.49 |
| HOXA3 | 10.684 | 2.30E-05 | 0.61 | 10.635 | 2.40E-05 | 0.60 |
| HOXA4 | 10.599 | 2.50E-05 | 0.57 | 12.757 | 2.90E-06 | 0.58 |
| HOXA5 | 5.079 | 6.20E-03 | 0.49 | 4.617 | 9.90E-03 | 0.49 |
| HOXA6 | 10.689 | 2.30E-05 | 0.57 | 11.736 | 8.00E-06 | 0.61 |
| HOXA7 | 14.362 | 5.80E-07 | 0.59 | 15.833 | 1.30E-07 | 0.61 |
| HOXA9 | 7.268 | 7.00E-04 | 0.53 | 7.464 | 5.70E-04 | 0.51 |
| HOXA10 | 8.846 | 1.40E-04 | 0.50 | 9.232 | 9.80E-05 | 0.51 |
| HOXA11 | 5.735 | 3.20E-03 | 0.46 | 6.545 | 1.40E-03 | 0.46 |
| HOXA13 | 0.47 | 0.45 | ||||
| HOXB1 | 6.128 | 2.20E-03 | 0.56 | 4.615 | 9.90E-03 | 0.53 |
| HOXB2 | 6.449 | 1.60E-03 | 0.59 | 7.125 | 8.10E-04 | 0.59 |
| HOXB3 | 11.396 | 1.10E-05 | 0.59 | 13.075 | 2.10E-06 | 0.60 |
| HOXB4 | 13.465 | 1.40E-06 | 0.60 | 11.062 | 1.60E-05 | 0.60 |
| HOXB5 | 8.118 | 3.00E-04 | 0.56 | 11.100 | 1.50E-05 | 0.60 |
| HOXB6 | 11.421 | 1.10E-05 | 0.59 | 11.195 | 1.40E-05 | 0.59 |
| HOXB7 | 11.473 | 1.00E-05 | 0.56 | 11.789 | 7.60E-06 | 0.57 |
| HOXB8 | 7.122 | 8.10E-04 | 0.54 | 6.490 | 1.50E-03 | 0.54 |
| HOXB9 | 7.762 | 4.30E-04 | 0.48 | 7.240 | 7.20E-04 | 0.47 |
| HOXB13 | 5.420 | 4.40E-03 | 0.54 | 6.938 | 9.70E-04 | 0.57 |
| HOXC4 | 7.682 | 4.60E-04 | 0.53 | 7.102 | 8.20E-04 | 0.52 |
| HOXC5 | 11.849 | 7.10E-06 | 0.51 | 11.590 | 9.30E-06 | 0.51 |
| HOXC6 | 8.671 | 1.70E-04 | 0.56 | 9.285 | 9.30E-05 | 0.57 |
| HOXC8 | 4.261 | 1.40E-02 | 0.48 | 4.004 | 1.80E-02 | 0.47 |
| HOXC9 | 9.494 | 7.50E-05 | 0.49 | 10.092 | 4.10E-05 | 0.49 |
| HOXC10 | 4.892 | 7.50E-03 | 0.42 | 4.892 | 7.50E-03 | 0.42 |
| HOXC11 | 5.699 | 3.30E-03 | 0.46 | 7.817 | 4.00E-04 | 0.47 |
| HOXC12 | 9.925 | 4.90E-05 | 0.51 | 7.857 | 3.90E-04 | 0.52 |
| HOXC13 | 3.359 | 3.50E-02 | 0.56 | 4.255 | 1.40E-02 | 0.57 |
| HOXD1 | 4.137 | 1.60E-02 | 0.45 | 3.541 | 2.90E-02 | 0.46 |
| HOXD3 | 12.070 | 5.70E-06 | 0.59 | 8.328 | 2.40E-04 | 0.57 |
| HOXD4 | 6.594 | 1.40E-03 | 0.51 | 6.046 | 2.40E-03 | 0.50 |
| HOXD8 | 0.46 | 0.44 | ||||
| HOXD9 | 5.634 | 3.60E-03 | 0.50 | 5.397 | 4.50E-03 | 0.49 |
| HOXD10 | 8.345 | 2.40E-04 | 0.48 | 7.509 | 5.50E-04 | 0.47 |
| HOXD11 | 6.070 | 2.30E-03 | 0.45 | 6.389 | 1.70E-03 | 0.45 |
| HOXD12 | 9.766 | 5.70E-05 | 0.50 | 8.711 | 1.60E-04 | 0.50 |
| HOXD13 | 3.192 | 4.10E-02 | 0.46 | 0.45 | ||
Columns contain gene name, the HRC3 amplitude PHRC3, its significance and the GC content in mouse and in human for homeoboxes of 39 Hox genes.
Figure 2Periodograms of HRC in homeoboxes of fly Hox genes.
The same pattern as in mammalian genes is present, revealing evolutionary conservation of the HRC3 periodic structural signature.
The HRC3 signature in Hox genes of metazoan species.
| Species | Type | Hox genes | Median | |||||
|---|---|---|---|---|---|---|---|---|
| N_All | PHRC3 > 3 | % | PHRC3 > 6 | % | PHRC3 > 3 | p-value | ||
| S | 8 | 7 | 88 | 7 | 86 | 10.1 | 4.23 × 10−5 | |
| S | 8 | 7 | 88 | 7 | 86 | 5.2 | 0.00552 | |
| U | 9 | 5 | 56 | 0 | 0 | 4.1 | 0.01663 | |
| V | 47 | 45 | 96 | 14 | 30 | 4.29 | 0.01373 | |
| V | 29 | 16 | 55 | 10 | 34 | 4.95 | 0.00707 | |
| V | 39 | 36 | 92 | 27 | 69 | 7.24 | 0.00814 | |
| V | 39 | 37 | 95 | 25 | 64 | 7.27 | 0.0005 | |
| M | 8 | 5 | 62 | 1 | 12 | 3.48 | 0.0309 | |
Columns contain species, the number of Hox genes considered, absolute and relative numbers of Hox genes with significant (PHRC3 > 3; p < 0.05) and highly significant (PHRC3 > 6.0; p < 0.00248), the median HRC3 amplitude PHRC3, and median significance for the species. “Organism Types” in column 2 are as follows: S-segmented invertebrate, V-vertebrate, M-mollusk, U-unsegmented invertebrate. Detailed information on individual genes is provided in Supplementary Table S4.
Figure 3The GO biological processes significantly enriched in mouse genes containing the HRC3 signature (PHRC3 ≥ 10).
Note the high prevalence of processes associated with development.
Transcription Factor Binding Sites (TFBS) from the ENCODE project with peaks overlapping and non overlapping HRC3 in human genome version hg19.
| TFBS | Binding Sites | HRC3 Overlap | HRC3 Ratio | SIM Fold | SIM Min | SIM Median | SIM Mean | SIM Max |
|---|---|---|---|---|---|---|---|---|
| Examples of DNA binding proteins significantly enriched in HRC3 loci | ||||||||
| 14818 | 2028 | 0.1368 | 6.0864 | 283 | 333.5 | 333.2 | 388 | |
| 19205 | 2121 | 0.1104 | 5.9445 | 299 | 358.5 | 356.8 | 406 | |
| 5772 | 598 | 0.1036 | 8.0236 | 49 | 74 | 74.53 | 110 | |
| 8399 | 794 | 0.0945 | 6.7345 | 84 | 118.5 | 117.9 | 157 | |
| 10390 | 945 | 0.0909 | 6.2582 | 121 | 149 | 151 | 198 | |
| 17247 | 1494 | 0.0866 | 5.8087 | 212 | 256.5 | 257.2 | 308 | |
| 13613 | 1131 | 0.0830 | 6.8173 | 132 | 166.5 | 165.9 | 206 | |
| 13061 | 1034 | 0.0791 | 5.7797 | 137 | 177.5 | 178.9 | 217 | |
| 17997 | 1392 | 0.0773 | 5 | 219 | 278 | 278.4 | 327 | |
| 12943 | 963 | 0.0744 | 5.3233 | 139 | 180 | 180.9 | 217 | |
| 16981 | 1254 | 0.0738 | 5.0240 | 206 | 247 | 249.6 | 305 | |
| 8485 | 625 | 0.0736 | 4.4770 | 97 | 141 | 139.6 | 184 | |
| 1110 | 81 | 0.0729 | 8.8621 | 2 | 9 | 9.14 | 18 | |
| 5352 | 382 | 0.0713 | 4.4444 | 59 | 85 | 85.95 | 117 | |
| 6537 | 459 | 0.0702 | 4.6181 | 63 | 99 | 99.39 | 143 | |
| 162209 | 4733 | 0.0291 | 3.1590 | 1385 | 1496 | 1498 | 1594 | |
| Examples of TFs with no enrichment of HRC3 in binding loci | ||||||||
| 131528 | 1047 | 0.0079 | 1.0335 | 918 | 1012 | 1013 | 1114 | |
| 32419 | 239 | 0.0073 | 1.2513 | 154 | 191.5 | 191 | 234 | |
| 47076 | 346 | 0.0073 | 1.0282 | 285 | 336 | 336.5 | 401 | |
| 84087 | 618 | 0.0073 | 1.0363 | 540 | 596.5 | 596.3 | 650 | |
| 2806 | 19 | 0.0067 | 0.5521 | 21 | 34 | 34.41 | 54 | |
| 89906 | 604 | 0.0067 | 0.9596 | 532 | 630 | 629.4 | 707 | |
| 791 | 5 | 0.0063 | 0.8605 | 0 | 5.5 | 5.81 | 13 | |
| 4574 | 26 | 0.0056 | 0.8373 | 15 | 31 | 31.05 | 47 | |
| 40866 | 227 | 0.0055 | 0.8156 | 225 | 277 | 278.3 | 323 | |
| 4087 | 16 | 0.0039 | 0.644 | 13 | 24 | 24.82 | 43 | |
For each TFBS the total number of peaks is represented as well as peaks overlapping with HRC3. The HRC3 Ratio is the proportion of HRC3 peaks present in the TFBS Chip-Seq data. To compute the significance of the testing, each TFBS was shuffled 100 times and the number of peaks overlapping with the HRC3 data was computed using bedtools. The median number of overlapping peaks (SIM Median column) then represents the number of overlapping due to random effect. Typically, statistically significant overlapping peaks (top) will have their SIM Median values lower than their HRC3 Overlap. The complete list of the 161 TFBSs is presented in Table S7.