| Literature DB >> 18477403 |
Leng Han1, Bing Su, Wen-Hsiung Li, Zhongming Zhao.
Abstract
BACKGROUND: CpG islands, which are clusters of CpG dinucleotides in GC-rich regions, are considered gene markers and represent an important feature of mammalian genomes. Previous studies of CpG islands have largely been on specific loci or within one genome. To date, there seems to be no comparative analysis of CpG islands and their density at the DNA sequence level among mammalian genomes and of their correlations with other genome features.Entities:
Mesh:
Year: 2008 PMID: 18477403 PMCID: PMC2441465 DOI: 10.1186/gb-2008-9-5-r79
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
CpG islands and other genomic features in ten mammalian genomes
| Genome | CpG islands | |||||||||
| Species | Size (Gb)* | Number of chromosome pairs | Number of arms† | GC content (%) | ObsCpG/ExpCpG | Number of CGIs | CGI density (/Mb) | Avgerage length (bp) | GC content (%) | ObsCpG/ExpCpG |
| Human | 2.85 | 23 | 82 | 40.9 | 0.236 | 37,531 | 13.2 | 1,089 | 62.0 | 0.743 |
| Chimpanzee | 2.75 | 24 | 84 | 40.7 | 0.233 | 35,845 | 13.0 | 1,011 | 60.3 | 0.761 |
| Macaque | 2.65 | 21 | 84 | 40.7 | 0.245 | 39,498 | 14.9 | 957 | 60.8 | 0.749 |
| Mouse | 2.48 | 20 | 40 | 41.7 | 0.192 | 20,458 | 8.2 | 1,043 | 60.6 | 0.756 |
| Rat | 2.48 | 21 | 64 | 41.9 | 0.220 | 19,568 | 7.9 | 1,004 | 59.7 | 0.758 |
| Dog | 2.31 | 39 | 80 | 41.0 | 0.244 | 58,327 | 25.3 | 1,102 | 62.2 | 0.753 |
| Cow | 2.29 | 30 | 62 | 41.9 | 0.236 | 36,729 | 16.0 | 1,023 | 61.2 | 0.740 |
| Horse | 2.03 | 32 | 92 | 41.0 | 0.285 | 33,135 | 16.3 | 937 | 59.2 | 0.749 |
| Opossum | 3.34 | 9 | 24 | 37.6 | 0.129 | 24,938 | 7.5 | 919 | 60.8 | 0.698 |
| Platypus‡ | 0.41 | 26 | NA | 43.3 | 0.296 | 14,686 | 35.9 | 929 | 56.8 | 0.785 |
*The nucleotides marked as 'N' were not included in the analysis. †Number of arms in a female. ‡Incomplete genome sequences (only 19 partially assembled chromosomes). NA, not available.
Figure 1Correlations between CGI density and genomic features in nine mammalian genomes. The platypus chromosomes were excluded because of incomplete genome sequence data and chromosome data. (a) CGI density (per Mb) versus number of chromosome pairs. (b) CGI density (per Mb) versus log10(chromosome size). The Y chromosomes were excluded because of insufficient data. (c) CGI density (per Mb) versus chromosome GC content (%). (d) CGI density (per Mb) versus chromosome ObsCpG/ExpCpG.
CGI densities in chromosomes with different sizes in nine mammalian genomes
| Chromosome size (Mb) | Number of chromosomes | CGI density/Mb ± SD |
| <25 | 5 | 29.7 ± 17.7 |
| 25-50 | 35 | 24.0 ± 13.2 |
| 50-75 | 47 | 21.7 ± 11.3 |
| 75-100 | 43 | 14.7 ± 7.4 |
| 100-150 | 49 | 11.7 ± 4.6 |
| 150-200 | 26 | 9.7 ± 2.6 |
| >200 | 14 | 9.4 ± 3.6 |
| Total | 219 | 16.4 ± 10.5 |
SD, standard deviation.
Correlation between CGI density and genomic features in different human genomic regions
| Gene-associated CGIs (24,228) | Intergenic CGIs (13,026) | Intragenic CGIs (12,136) | TSS CGIs (11,192) | |||||
| Log10(chromosome size) | -0.54 | 3.9 × 10-3 | -0.55 | 3.4 × 10-3 | -0.55 | 3.1 × 10-3 | -0.51 | 7.0 × 10-3 |
| GC content | 0.88 | 1.7 × 10-8 | 0.87 | 2.9 × 10-8 | 0.85 | 1.9 × 10-7 | 0.91 | 5.4 × 10-10 |
| ObsCpG/ExpCpG | 0.92 | 1.5 × 10-10 | 0.91 | 8.3 × 10-10 | 0.92 | 2.5 × 10-10 | 0.91 | 1.0 × 10-9 |
Summary of correlations between CGI density and genomic features
| Algorithm | Genomic features | Shown in figure | ||
| TJ (9 genomes) | Chromosome pairs | 0.88 | 7.9 × 10-4 | 1a |
| Log10(chromosome size) | -0.51 | 2.6 × 10-16 | 1b | |
| Chromosome GC content | 0.65 | 3.5 × 10-28 | 1c | |
| Chromosome ObsCpG/ExpCpG | 0.75 | 2.8 × 10-41 | 1d | |
| Chromosome arms | 0.62 | 0.037 | ||
| Genome size | -0.53 | 0.073* | ||
| Genomic GC content | 0.24 | 0.27* | ||
| Genomic ObsCpG/ExpCpG | 0.63 | 0.035 | ||
| TJ (9 genomes, intergenic CGIs) | Chromosome pairs | 0.79 | 0.005 | S2a |
| Log10(chromosome size) | -0.55 | 7.3 × 10-19 | S2b | |
| Chromosome GC content | 0.39 | 8.6 × 10-10 | S2c | |
| Chromosome ObsCpG/ExpCpG | 0.67 | 3.7 × 10-30 | S2d | |
| TJ (10 genomes) | Chromosome pairs | 0.58 | 0.039 | S1a |
| Log10(chromosome size) | -0.70 | 2.6 × 10-37 | S1b | |
| Chromosome GC content | 0.64 | 3.7 × 10-29 | S1c | |
| Chromosome ObsCpG/ExpCpG | 0.89 | 1.5 × 10-81 | S1d | |
| GF (9 genomes) | Chromosome pairs | 0.92 | 2.0 × 10-4 | S5a |
| Log10(chromosome size) | -0.63 | 1.3 × 10-25 | S5b | |
| Chromosome GC content | 0.72 | 3.2 × 10-37 | S5c | |
| Chromosome ObsCpG/ExpCpG | 0.81 | 2.4 × 10-53 | S5d | |
| CpGcluster (9 genomes) | Chromosome pairs | 0.81 | 0.004 | S6a |
| Log10(chromosome size) | -0.52 | 1.6 × 10-16 | S6b | |
| Chromosome GC content | 0.21 | 0.001 | S6c | |
| Chromosome ObsCpG/ExpCpG | 0.61 | 5.5 × 10-24 | S6d |
*Insignificant correlation. GF, Gardiner-Garden and Frommer's algorithm; TJ, Takai and Jones' algorithm.
Correlation between CGI density and recombination rate in human, mouse and rat
| Window size (Mb) | |||
| Human | 1 | 0.18 | 1.1 × 10-22 |
| 5 | 0.33 | 5.9 × 10-16 | |
| 10 | 0.40 | 1.7 × 10-12 | |
| Mouse | 5 | 0.24 | 3.6 × 10-7 |
| 10 | 0.33 | 8.0 × 10-8 | |
| Rat | 5 | 0.17 | 8.1 × 10-5 |
| 10 | 0.26 | 1.7 × 10-5 |
The detailed distributions are shown in Additional data file 4. Human recombination rate data measured with a 1 Mb window were based on the deCODE genetic map and downloaded from the UCSC Genome Browser [30]. Recombination rate data measured with 5 Mb and 10 Mb windows were prepared by Jensen-Seaman et al. [31] and downloaded from the associated supplementary material website.
Figure 2Correlation between CGI density and recombination rate (cM/Mb) in the human genome; a 5 Mb window was used.
Figure 3Distribution of CGI density (per Mb) on human chromosome 8. The data indicate a trend of higher CGI density in telomeric regions.
CpG islands and other genomic features in non-mammalian genomes
| Genome | CpG islands | ||||||||
| Species | Length (Mb)* | Number of chromosome pairs | GC content (%) | ObsCpG/ExpCpG | Number of CGIs | CGI density (/Mb) | Avgerage length (bp) | GC content (%) | ObsCpG/ExpCpG |
| Chicken† | 985 | 39 | 41.4 | 0.248 | 22,623 | 23.0 | 1,098 | 60.0 | 0.844 |
| Microchromosome | 167 | 20 | 45.7 | 0.305 | 8,634 | 51.7 | 1,040 | 60.4 | 0.810 |
| Macrochromosome | 674 | 6 | 40.0 | 0.219 | 10,125 | 15.0 | 1,138 | 59.6 | 0.863 |
| Lizard | 1,742 | 18 | 40.4 | 0.296 | 45,171 | 25.9 | 899 | 56.8 | 0.728 |
| Tetraodon | 187 | 21 | 45.9 | 0.601 | 30,175 | 161.6 | 1,013 | 56.7 | 0.782 |
| Stickleback | 391 | 21 | 44.5 | 0.662 | 61,768 | 157.8 | 824 | 55.8 | 0.842 |
| Medaka | 582 | 24 | 40.1 | 0.479 | 21,522 | 37.0 | 746 | 55.8 | 0.784 |
| Zebrafish | 1,524 | 25 | 36.5 | 0.531 | 22,392 | 14.7 | 1,162 | 57.0 | 0.869 |
| Fugu | 351 | 22 | 45.5 | 0.565 | 47,251 | 134.5 | 872 | 56.0 | 0.808 |
*The nucleotides marked as 'N' were not included in the analysis. †Only 30 chromosomes were used in the analysis because chromosomes 29-31 and 33-38 were too small to assemble [39]. The microchromosomes included chromosomes GGA11-28, 32 and W and the macrochromosomes included chromosomes GGA1-5 and Z.
Figure 4CGI density comparison between mammals and non-mammals. This figure shows the distribution of CGI density (per Mb) versus chromosome GC content (%). (a) Comparison of four groups in mammals. (b) Comparison of mammals, chicken and fish.
Figure 5Correlation between CGI density and other genetic factors. (a) Significant correlation between CGI density and body temperature. (b) Insignificant correlation between CGI density and lifespan.
Names and sequence information of ten mammals and other vertebrates
| Common name | Species name | Sequence build | Data source |
| Mammal | |||
| Human | 35.1 | NCBI [44] | |
| Chimpanzee | 2.1 | NCBI [44] | |
| Macaque | 1.1 | NCBI [44] | |
| Mouse | 34.1 | NCBI [44] | |
| Rat | 4.1 | NCBI [44] | |
| Dog | 2.1 | NCBI [44] | |
| Cow | 3.1 | NCBI [44] | |
| Horse | 1.1 | NCBI [44] | |
| Opossum | 2.1 | NCBI [44] | |
| Platypus* | 1.1 | NCBI [44] | |
| Non-mammal vertebrate | |||
| Chicken† | 2.1 | NCBI [44] | |
| Green anole lizard‡ | anoCar1 | UCSC [30] | |
| Tetraodon | tetNig1 | UCSC [30] | |
| Stickleback | gasAcu1 | UCSC [30] | |
| Medaka | oryLat1 | UCSC [30] | |
| Zebrafish | danRer5 | UCSC [30] | |
| Fugu‡ | fr2 | UCSC [30] |
*The platypus genome was partially assembled. Only chromosomes 1-7, 10-12, 14, 15, 17, 18, 20, X1-X3, and X5 were available. †Only chromosomes 1-28, 32, W, and Z were available. ‡No assembled chromosomes.