| Literature DB >> 19568804 |
Walter Salzburger1, Dirk Steinke, Ingo Braasch, Axel Meyer.
Abstract
The evolution of genome size as well as structure and organization of genomes belongs among the key questions of genome biology. Here we show, based on a comparative analysis of 30 genomes, that there is generally a tight correlation between the number of genes per chromosome and the length of the respective chromosome in eukaryotic genomes. The surprising exceptions to this pattern are placental mammalian genomes. We identify the number and, more importantly, the uneven distribution of gene deserts among chromosomes, i.e., long (>500 kb) stretches of DNA that do not encode for genes, as the main contributing factor for the observed anomaly of eutherian genomes. Gene-rich placental mammalian chromosomes have smaller proportions of gene deserts and vice versa. We show that the uneven distribution of gene deserts is a derived character state of eutherians. The functional and evolutionary significance of this particular feature of eutherian genomes remains to be explained.Entities:
Mesh:
Year: 2009 PMID: 19568804 PMCID: PMC2746894 DOI: 10.1007/s00239-009-9251-4
Source DB: PubMed Journal: J Mol Evol ISSN: 0022-2844 Impact factor: 2.395
Organisms used in this study and information about their genomesa
| Taxon | Assembly size (Mb) | Deserts ( | Deserts (%) | 18S GenBank accession number |
|---|---|---|---|---|
| 3175.6 | 949 | 36.80 | AC183378 | |
| 3047.0 | 915 | 38.33 | NR_003286 | |
| 2863.7 | 810 | 30.44 | CN805008 | |
| 2654.9 | 895 | 34.26 | NR_003278 | |
| 2718.9 | 743 | 23.19 | X01117 | |
| 2445.1 | 552 | 20.14 | DQ287955 | |
| 2422.9 | 149 | 4.26 | DQ222453 | |
| 2367.1 | 573 | 23.13 | AJ311673 | |
| 3431.4 | 1098 | 27.77 | AJ311676 | |
| 1843.0 | 161 | 8.37 | AJ311679 | |
| 1031.9 | 200 | 16.13 | AF173612 | |
| 1277.1 | 129 | 7.10 | XR_045186 | |
| 724.2 | 29 | 6.06 | AB105163 | |
| 400.9 | 0 | 0.00 | DW607648 | |
| 217.4 | 5 | 1.17 | AJ270032 | |
| 938.1 | 0 | 0.00 | AB013017 | |
| 228.2 | 0 | 0.00 | AM157179 | |
| 120.4 | 0 | 0.00 | EU188739 | |
| 100.3 | 0 | 0.00 | AY284652 | |
| 91.2 | 0 | 0.00 | U13929 | |
| 29.4 | 0 | 0.00 | NT_166520 | |
| 12.3 | 0 | 0.00 | AF114470 | |
| 12.1 | 0 | 0.00 | J01353 | |
| 1979.4 | 0 | 0.00 | NC_008332 | |
| 119.2 | 0 | 0.00 | X16077 | |
| 370.8 | 0 | 0.00 | AY120865 | |
| 303.1 | 0 | 0.00 | AF321270 | |
| 13.2 | 0 | 0.00 | DQ007077 | |
| 32.8 | 0 | 0.00 | NC_007268 | |
| 22.9 | 0 | 0.00 | NC_004325 |
Deserts (n)—number of gene deserts (>500 kB) in a given genome; deserts (%)—size fraction of gene deserts (>500 kB) in a given genome; 18S acc. no
aGenBank accession numbers of 18S sequences used for regression equation mapping are also given. Data were obtained from GenBank. Note that no information on the size of intergenic regions was available for A. thaliana, A. fumigatus, L. ajor, O. sativa, and Z. mays. Also note that assembly size does not necessarily equal actual genome size
Fig. 1The relationship between the number of genes per chromosome (NG) over chromosome length (LC) shows a strong correlation in nonmammals, irrespective of genome size, chromosome number and taxonomy. In noneutherian genomes, the slope of the trend-line can be interpreted as measurement for genome-compactness. The small R2-value in zebrafish (Danio rerio; 0.37) can be explained by two outlier chromosomes, that of medaka (Oryzias latipes; 0.48) by the relatively equal size of its chromosomes and variance in the number of genes. The somewhat smaller R2-value observed in Caenorhabditis elegans (R2 = 0.65) is due to the relatively even length of its chromosomes (Nelson et al. 2004), which makes it difficult to test for a linear relationship between NG and LC
Fig. 2The correlation between NG and LC is weak in genomes of placental mammals. In general, larger chromosomes also tend to have more genes in mammals; however, many chromosomes significantly deviate from a constant NG/LC ratio, rendering the genome-wide trend much weaker in placental mammalian genomes than in all other genomes. The highest R2 value in a mammal was found for rat (R2 = 0.70) whose genome contains the smallest relative fraction of gene deserts
Fig. 3Gene deserts counterbalance the number of genes on mammalian chromosomes. The sum of the proportion of genes per chromosome plus the proportion of gene deserts per chromosome is plotted against chromosome length
Results from the partial regression analysisa
| Feature | ||||||||
|---|---|---|---|---|---|---|---|---|
| LINEs | 0.330 | 0.260 | 0.156 | 0.081 | 0.149 | −0.441* | 0.537* | 0.132 |
| SINEs | 0.322 | 0.832* | 0.666* | 0.3562* | ||||
| LTRs | −0.0218 | 0.034 | 0.001 | 0.452 | 0.121 | −0.095 | −0.620 | −0.2172 |
| DNA transp. | −0.233 | −0.303 | 0.001 | 0.132 | 0.164 | 0.140 | 0.395 | 0.305 |
| Simple repeats | −0.132 | −0.156 | 0.127 | −0.374 | 0.041 | −0.339* | −0.542* | 0.186 |
| Gene deserts | 0.533* | 0.883* | 0.353 | 0.603* |
aThe partial regression coefficients for the respective contribution to NG/LC is given for LINEs, SINEs, LTRs, DNA transposons, simple repeats, and gene deserts. The highest coefficient for each genome is shown in bold, and significant values (p > 0.01) are marked with an asterisk