| Literature DB >> 19232104 |
Abstract
BACKGROUND: CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Hackenberg et al. (2006) recently developed a new algorithm, CpGcluster, which uses a completely different mathematical approach from previous traditional algorithms. Their evaluation suggests that CpGcluster provides a much more efficient approach to detecting functional clusters or islands of CpGs.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19232104 PMCID: PMC2652441 DOI: 10.1186/1471-2105-10-65
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics and distribution of CGIs and CGCs in the human (NCBI build 36) and mouse genomes (NCBI build 37)
| Human | Mouse | |||
| CGIs (%a) | CGCs (%a) | CGIs (%a) | CGCs (%a) | |
| Genome length (bp) | 2.86 × 109 | 2.86 × 109 | 2.61 × 109 | 2.61 × 109 |
| # CGIs/CGCs | 37,729 | 198,702 | 21,326 | 121,885 |
| Coverage (%)b | 1.44 | 1.90 | 0.85 | 1.48 |
| Length (bp) | 1,090 ± 717 | 273 ± 246 | 1,045 ± 519 | 318 ± 297 |
| GC content (%) | 60.61 ± 5.06 | 63.78 ± 7.50 | 60.0 ± 4.0 | 61.4 ± 10.0 |
| ObsCpG/ExpCpG | 0.717 ± 0.082 | 0.855 ± 0.265 | 0.730 ± 0.093 | 0.949 ± 0.426 |
| Promoter regions | 13,196 (35.0) | 29,156 (14.7) | 10,942 (51.3) | 19,791 (16.2) |
| TSSs | 15,106 (40.0) | 21,741 (10.9) | 12,175 (57.1) | 16,675 (13.7) |
| Genic regions | 24,841 (65.8) | 104,924 (52.8) | 15,541 (72.9) | 63,555 (52.1) |
| Intergenic regions | 12,888 (34.2) | 93,778 (47.2) | 5,785 (27.1) | 58,330 (47.9) |
| 8-bp CGCs | ||||
| Sub-total | 232 | 775 | ||
| Promoter regions | 12 (5.2c) | 13 (1.7c) | ||
| Intergenic regions | 144 (62.1c) | 439 (56.6c) | ||
aProportion (%) of CGIs or CGCs in the genomic region over the total number of CGIs or CGCs in the genome.
bProportion (%) of the genome sequence covered by CGIs or CGCs.
cProportion (%) of the CGCs in the promoter regions or intergenic regions among the total 8-bp CGCs.
Figure 1Length distribution of CGIs or CGCs in the human genome. (A) CGIs versus CGCs. (B) For CGCs, promoter regions versus intergenic regions.
Figure 2Multiple short CGCs embedded in one CGI in the promoter region. Dark box: CGCs identified by CpGcluster. Grey box: CGI identified by Takai and Jones' algorithm. The length of each CGC is labeled below the dark box and the distance between two neighboring CGCs is above the line. The transcription start site (TSS) is marked by an arrow. (A) CAP1. (B) ADAM33.
Figure 3Distribution of distance between two neighbouring CGCs in the promoter region of a gene.