| Literature DB >> 19909546 |
Simon C Lovell1, Xiting Li, Nimmi R Weerasinghe, Kathryn E Hentges.
Abstract
BACKGROUND: With the completion of the whole genome sequence for many organisms, investigations into genomic structure have revealed that gene distribution is variable, and that genes with similar function or expression are located within clusters. This clustering suggests that there are evolutionary constraints that determine genome architecture. However, as most of the evidence for constraints on genome evolution comes from studies on yeast, it is unclear how much of this prior work can be extrapolated to mammalian genomes. Therefore, in this work we wished to examine the constraints on regions of the mammalian genome containing conserved gene clusters.Entities:
Mesh:
Year: 2009 PMID: 19909546 PMCID: PMC2779822 DOI: 10.1186/1471-2164-10-521
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The correlation between microsynteny and density of orthologs of disease-related genes on mouse autosomes. The relationship between conserved microsynteny (black) and density of mouse genes orthologous to human disease-related genes (blue) is shown for all mouse autosomes. Percentage of genes with conserved microsynteny and those with disease-related orthologs are calculated for a 20 Mb sliding window, offset by steps of 5 Mb. At each position the Z-score is plotted at the center of the sliding window. Pearson's correlation coefficient and P-values for the analysis of co-localization of conserved microsyteny and disease gene orthologs are given for each chromosome. The results of a randomization of the disease genes (red) demonstrate that there is no correlation between microsynteny (black) and random assignment of gene status.
Variation of microsynteny and disease gene density on all mouse autosomes.
| Chromosome | Total | Conserved | Percentage | Disease | Percentage |
|---|---|---|---|---|---|
| 1 | 1186 | 486 | 41% | 98 | 8% |
| 2 | 1846 | 696 | 38% | 123 | 7% |
| 3 | 996 | 379 | 38% | 82 | 8% |
| 4 | 1289 | 591 | 46% | 95 | 7% |
| 5 | 1219 | 428 | 35% | 100 | 8% |
| 6 | 1128 | 418 | 37% | 91 | 8% |
| 7 | 1867 | 650 | 35% | 111 | 6% |
| 8 | 1051 | 364 | 35% | 89 | 8% |
| 9 | 1219 | 481 | 39% | 86 | 7% |
| 10 | 978 | 321 | 33% | 75 | 8% |
| 11 | 1674 | 734 | 44% | 122 | 7% |
| 12 | 651 | 213 | 33% | 49 | 8% |
| 13 | 786 | 199 | 25% | 66 | 8% |
| 14 | 772 | 279 | 36% | 55 | 7% |
| 15 | 780 | 374 | 48% | 65 | 8% |
| 16 | 658 | 237 | 36% | 46 | 7% |
| 17 | 1019 | 369 | 36% | 67 | 7% |
| 18 | 495 | 192 | 39% | 48 | 10% |
| 19 | 725 | 318 | 44% | 50 | 7% |
| Total for all autosomes | 20339 | 7729 | 38% | 1518 | 7% |
Comparison of regions with Z>1 microsynteny to sequence-based synteny blocks.
| Mouse Micro-synteny Interval | Dog Sequence Synteny Blocks | Rat Sequence Synteny Blocks | Human Sequence Synteny Blocks |
|---|---|---|---|
| Chr1 60 - 90 Mb | 60 - 80 Mb = Chr37: 14.6 - 32.7 Mb, | 60 - 85 Mb = Chr9: 58.5 - 84.4 Mb, | 60 - 90 Mb = Chr2: 203.6 - 234.6 |
| Chr1 120 - 145 Mb | 120 - 132 = Chr19: 31.4 - 43.3 Mb | 120 - 145 Mb = Chr13: 30.9 - 57.8 Mb | 120 - 132 Mb = Chr2: 125.6 - 138.4 Mb, |
| Chr1 155 - 175 Mb | 155 - 168 Mb = Chr7: 19.4 - 33.9 Mb, | 155 - 175 Mb = Chr13: 68.2 - 89.3 Mb | |
| Chr2 45 - 85 Mb | 45 - 53 Mb = Chr19: 52.6 - 56.7 Mb, | 45 - 85 Mb = Chr3: 25.6 - 68.4 Mb | 45 - 84 Mb = Chr2: 145.2 - 188.3 Mb, |
| Chr2 115 - 140 Mb | 115 - 127 = Chr30: 7.1 - 19.7 Mb, | 115 - 140 Mb = Chr3: 101.6 - 128.3 Mb | 115 - 126 Mb = Chr15: 36.8 - 51.2 Mb, |
| Chr3 15 - 35 Mb | 15 - 16 Mb = Chr24: 22.4 - 22.5 Mb, | 15 - 35 Mb = Chr2: 87.0 - 122.2 Mb | 15 - 16 Mb = Chr20: 1.5 - 1.6 Mb, |
| Chr3 45 - 65 Mb | 45 - 65 Mb = Chr2: 132.8 - 155.1 Mb | 45 - 52 Mb = Chr4: 134.0 - 140.8 Mb, | |
| Chr3 55 - 75 Mb | 55 - 57 Mb = Chr25: 7.6 - 8.0 Mb, | 55 - 75 Mb = Chr2: 144.4 - 166.1 Mb | |
| Chr4 30 - 65 Mb | 30 - 35 Mb = Chr12: 52.6 - 49.6 Mb, | 30 - 65 Mb = Chr5: 48.2 - 82.0 Mb | |
| Chr4 105 - 125 Mb | 105 - 107 Mb = Chr5: 57.1 - 58.9 Mb, | 105 - 125 Mb = Chr5: 127.3 - 144.8 Mb | |
| Chr6 0 - 20 Mb | 0 - 13 Mb = Chr14: 21.6 - 30.2 Mb | 0 - 20 Mb = Chr4: 28.1 - 44.7 Mb | 0 - 20 Mb = Chr7: 92.7 - 117.8 Mb |
| Chr9 40 - 70 Mb | 40 - 54 Mb = Chr5: 13.7 - 27.7 Mb, | 40 - 70 Mb = Chr8: 43.2 - 74.6 Mb | |
| Chr11 60 - 90 Mb | 60 - 90 Mb = Chr10: 46.4 - 78.6 Mb | ||
| Chr11 100 - 120 Mb | 100 - 120 Mb = Chr10: 89.0 - 109.7 Mb | 100 - 104 Mb = Chr17: 39.6 - 45.2 Mb, | |
| Chr12 55 - 105 Mb | 55 - 72 Mb = Chr8: 15.7 - 31.0 Mb, | 55 - 105 Mb = Chr6: 74.3 - 127.9 Mb | 55 - 72 Mb = Chr14: 34.2 - 52.3 Mb, |
| Chr15 65 - 104 Mb | 65 - 76 Mb = Chr13: 31.6 - 41.2 Mb, | 65 - 104 Mb = Chr7: 103.1 - 142.6 Mb | 65 - 76 Mb = Chr8: 132.9 - 146.2 Mb |
| Chr17 65 - 95 Mb | 65 - 72 Mb = Chr9: 103.9 - 110.8 Mb, | 65 - 66 Mb = Chr5: 109.2 - 110.0 Mb, | |
| Chr19 30 - 65 Mb | 30 - 34 Mb = Chr26: 38.4 - 41.9 Mb, | 30 - 65 Mb = Chr1: 233.7 - 267.8 Mb | 30 - 32 Mb = Chr10: 51.9 - 54.5 Mb, |
Sequence-based synteny blocks were identified from the Ensembl genome browser. The intervals are listed according to positions in the mouse genome, with the positions of synteny in the other genomes listed following the "=" sign. The specific region of synteny in the other species is listed following the chromosome number and a ':'. Forward to forward alignments are show in normal text, forward to reverse alignments are shown in italics.
Figure 2Conserved genes and disease genes are not randomly distributed throughout the mouse genome. Panel A: Conserved genes are found to have conserved genes as neighbors more often than expected if gene position was random. The results of 10,000 randomization trials are shown in the histogram, while the observed data (number of conserved genes with at least one conserved neighboring gene) is shown with the red line. Panel B: Disease genes neighbor other disease genes more often than expected by chance. The results of 10,000 randomization trials are shown in the histogram, while the observed data (number of disease genes with at least one disease gene as a neighbor) is shown with the red line.
Figure 3There is a significant correlation between microsynteny and density of disease-gene orthologs over the mouse genome as a whole. Panel A: The number of conserved genes plotted against the number of disease genes for 20 Mb sliding windows of the mouse genome. Note the fit with the regression line (Pearson's R = 0.90, P < 1 × 10-6). Panel B: The relationship between the proportion of genes with conserved microsynteny (number of conserved genes per window/total genes per window) and the proportion of genes with disease orthologs (number of disease-related genes/total genes per window). The correlation for the whole mouse genome is significant (Pearson's R = 0.40, P < 4.0 × 10-4).
Robustness of correlation to variations in window size.
| Window size | 20 Mb | 10 Mb | 5 Mb | 2 Mb | 1 Mb |
|---|---|---|---|---|---|
| R = 0.40, | R = 0.293, | R = 0.256, | R = 0.157, | R = 0.154, | |
| R = 0.02, | R = 0.018, | R = 0.058, | R = 0.046, | R = 0.035, | |
| R = 0.11 | R = 0.11, | ||||
| R = 0.012, | R = 0.020, | ||||
Adjusting the window size does not eliminate the correlation between regions of conserved microsynteny and regions of high density of disease gene orthlogs. Randomization of gene annotations does eliminate the correlation at all window sizes except 1 Mb. However, when windows with no annotated genes are omitted from the analysis (bottom two rows), the correlation for actual annotations is improved at both 2 Mb and 1 Mb window sizes, and is eliminated for random annotations. As the genome does not have any windows lacking gene annotation for windows of 20 Mb, 10 Mb, or 5 Mb, the correlation values could not be recalculated to omit annotation-free windows at those window sizes. R = Pearson's R.
Number of windows in mouse genome at varying window sizes.
| Window Size | 20 Mb | 10 Mb | 5 Mb | 2 Mb | 1 Mb |
|---|---|---|---|---|---|
| 465 | 1912 | 3718 | 4897 | 24541 | |
| 0 | 0 | 0 | 237 | 2761 | |
| 0 | 22 | 244 | 1211 | 9505 | |
| 2 | 92 | 634 | 2160 | 14598 | |
| 0 | 16 | 166 | 987 | 8243 | |
The number of windows with no annotations of genes, conserved genes, or disease orthologs is shown. Note that the bottom row shows the number of windows lacking both conserved and disease genes, which are a subset of the number of windows with either no disease genes or no conserved genes.