| Literature DB >> 15613238 |
Albin Sandelin1, Peter Bailey, Sara Bruce, Pär G Engström, Joanna M Klos, Wyeth W Wasserman, Johan Ericson, Boris Lenhard.
Abstract
BACKGROUND: Evolutionarily conserved sequences within or adjoining orthologous genes often serve as critical cis-regulatory regions. Recent studies have identified long, non-coding genomic regions that are perfectly conserved between human and mouse, termed ultra-conserved regions (UCRs). Here, we focus on UCRs that cluster around genes involved in early vertebrate development; genes conserved over 450 million years of vertebrate evolution.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15613238 PMCID: PMC544600 DOI: 10.1186/1471-2164-5-99
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Over-representation of protein domains in genes flanking UCRs. Bonferroni-corrected and uncorrected Fisher Exact Test p-values are shown for the 16 most over-represented InterPro domains. Typical transcription factor domains (DNA binding domains) are indicated in bold. A full list of all InterPro domains with P-values is given in [Additional file 3].
| IPR000047 | 6.40E-20 | 5.36E-17 | |
| IPR001356 | 1.60E-12 | 1.34E-09 | |
| IPR001827 | 1.37E-10 | 1.15E-07 | |
| IPR001523 | 2.39E-05 | 2.00E-02 | |
| IPR001092 | 2.40E-05 | 2.01E-02 | |
| IPR000327 | 3.06E-05 | 2.56E-02 | |
| IPR003654 | 3.08E-05 | 2.58E-02 | |
| IPR001766 | 6.15E-05 | 5.15E-02 | |
| IPR001628 | 7.45E-05 | 6.23E-02 | |
| IPR000536 | 1.06E-04 | 8.86E-02 | |
| IPR000910 | 1.81E-04 | 1.51E-01 | |
| IPR001723 | 2.63E-04 | 2.20E-01 | |
| IPR003068 | 7.62E-04 | 6.38E-01 | |
| IPR001781 | 1.10E-03 | 9.18E-01 | |
| IPR000003 | 1.28E-03 | 1.07E+00 | |
| FN_III | IPR003961 | 2.57E-03 | 2.15E+00 |
Figure 1Spatial correlation of transcription factor gene families to UCRs in the human genome. A. Cumulative distribution of distances to the closest UCR for selected subsets of genes. Distance to the closer end of the transcript mapping (either 3' or 5'). Majority of major classes of transcription factors are closer to UCRs than random genes. B, C. Occurrence of UCRs around selected subsets of genes. This plot summarizes the distribution of distances to all UCRs on the same chromosome for each gene in the subset. There is a visible over-representation of UCRs up to 300 kb from homeobox genes, and up to 150 kb from C2H2 zinc finger genes.
Figure 2Genomic distributions of UCRs and transcription factor genes. A. Distribution of UCRs on human chromosome 2 is shown in yellow, and total gene density along the chromosome is shown in blue (top track). Note the lack of correlation between gene density and UCR density. Positions of homebox-domain containing genes locations are marked in red, and generally coincide with local maxima of UCR density. The remaining UCR density peaks coincide with genes for transcription factors belonging to structural classes other than homeobox. B. Close-up of a UCR cluster coinciding with the HoxD gene cluster. The HoxD cluster coincides with one of the larger UCR density peaks on chromosome 2, and is associated with nine UCRs. UCR locations are shaded in yellow.
Figure 3Genomic landscape surrounding the most prominent UCR clusters in the human genome. UCRs were counted by sliding a 500 kb window along the chromosomes. Overlapping UCR-containing windows were merged into a single cluster span. Each of the regions shows a 4 MB region around the corresponding UCR cluster. The cluster span coordinates correspond to the human genome NCBI build 33 (UCSC hg15, April 2003). Transcription factor genes are colored according to structural class. UCR clusters are visibly correlated with transcription factor genes; other developmental regulators that do not contain any of the probed protein domains were located manually (boxed), such as the autism susceptibility gene (chromosome 7, number 37) and the DACH gene (chromosome 13, number 10). The numbers correspond to annotations in [Additional file 6 and 7]. The figure was created with the help of the Bio::Graphics Perl library[27].
Figure 4Sets of UCRs sharing high sequence similarity are involved in regulation of related genes: the case of Iroquois gene clusters. Four similarly positioned UCRs are located within the two Iroquois gene clusters at chromosomes 5 and 16. Block arrows indicate significant sequence similarity. The arrow width is inversely proportional to the alignment BLASTN E-value. There are additional shorter blocks of similarity between the two three-gene clusters; however, most UCRs have diverged between the two clusters, while still preserved across vertebrates.