| Literature DB >> 21819607 |
Abstract
BACKGROUND: Developing a model for codon substitutions is essential for the analyses of protein sequences. Recent studies on the mutation rates in the non-coding regions have shown that CpG mutation rates in the human genome are negatively correlated to the local GC content and to the densities of functional elements. This study aimed at understanding the effect of genomic features, namely, GC content, gene density, and frequency of CpG islands, on the rates of codon substitution in human chromosomes.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21819607 PMCID: PMC3169530 DOI: 10.1186/1471-2164-12-397
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of data
| Chromosome | GC Content (%) | Gene Densitya | CpG Island Densitya | Chromosome Size | No. of genes used in this study | No. of codons used in this study |
|---|---|---|---|---|---|---|
| 1 | 41.74 | 8.30 | 33.69 | 249250621 | 1123 | 504737 |
| 2 | 40.24 | 5.39 | 28.03 | 243199373 | 708 | 349066 |
| 3 | 39.69 | 5.68 | 18.20 | 198022430 | 644 | 311774 |
| 4 | 38.25 | 4.39 | 19.13 | 191154276 | 442 | 222017 |
| 5 | 39.52 | 5.24 | 24.77 | 180915260 | 521 | 266213 |
| 6 | 40.33 | 6.23 | 28.13 | 171115067 | 623 | 260600 |
| 7 | 40.75 | 6.45 | 42.13 | 159138663 | 472 | 214102 |
| 8 | 40.18 | 5.39 | 32.47 | 146364022 | 359 | 158173 |
| 9 | 41.31 | 5.94 | 35.58 | 141213431 | 420 | 192744 |
| 10 | 41.58 | 5.99 | 40.98 | 135534747 | 431 | 194136 |
| 11 | 41.57 | 10.24 | 38.98 | 135006516 | 593 | 255208 |
| 12 | 40.81 | 8.10 | 35.40 | 133851895 | 567 | 258243 |
| 13 | 38.53 | 2.95 | 23.29 | 115169878 | 198 | 100596 |
| 14 | 40.89 | 6.18 | 28.97 | 107349540 | 360 | 168500 |
| 15 | 42.20 | 6.91 | 29.00 | 102531392 | 321 | 165732 |
| 16 | 44.79 | 10.58 | 74.48 | 90354753 | 435 | 196533 |
| 17 | 45.54 | 15.48 | 81.09 | 81195210 | 549 | 246160 |
| 18 | 39.79 | 3.89 | 31.89 | 78077248 | 192 | 87095 |
| 19 | 48.34 | 25.28 | 134.13 | 59128983 | 525 | 205475 |
| 20 | 44.13 | 9.04 | 56.61 | 63025520 | 319 | 130636 |
| 21 | 40.84 | 5.17 | 44.32 | 48129895 | 127 | 54345 |
| 22 | 47.99 | 8.77 | 73.68 | 51304566 | 199 | 84980 |
| X | 39.50 | 5.62 | 19.66 | 155270560 | 244 | 90162 |
a per Mb
Figure 1Scatter plot of CpG and non-CpG substitution rates that are positively correlated to the number of CpG islands per site on the chromosome. Open squares: Synonymous C to T substitution rate at CpG sites on the autosomes. Closed squares: Synonymous C to T substitution rate at CpG Sites on the X chromosome.
Correlation coefficient analysis between ln (substitution rate ×108) and chromosomal features
| Type | Grantham Distance | GC Content (%) | Gene Densitya | CpG Island Densitya | Chromosome Size |
|---|---|---|---|---|---|
| CpG to TpG | -0.23** | -0.04** | -0.03** | -0.08** | -0.13** |
| TpG to CpG | -0.39** | 0.09** | 0.09** | 0.02* | -0.25** |
| non-CpG Transition | -0.35** | 0.08** | 0.08** | 0.01** | -0.24** |
| Transversion | -0.23** | 0.06 | 0.07** | -0.02** | -0.32** |
*at 1% level of significance; **at 0.1% level of significance; aper Mb
Partial coefficient of Grantham distance, GC content, gene density, CpG island density, and chromosome size obtained by multiple regression analysis
| Type | Partial Correlation Coefficient ± SD |
|---|---|
| Intercept | |
| CpG to TpG | 5.13 ± 0.40 |
| TpG to CpG | 2.59 ± 0.04 |
| non-CpG Transition | 3.09 ± 0.17 |
| Transversion | 3.84 ± 0.14 |
| Grantham Distance | |
| CpG to TpG | -5.05 × 10-3 ± 2.29 × 10-4** |
| TpG to CpG | -9.48 × 10-3 ± 2.25 × 104** |
| non-CpG Transition | -7.81 × 10-3 ± 9.79 × 10-5** |
| Transversion | -3.91 × 10-3 ± 7.25 × 10-5** |
| GC Content (%) | |
| CpG to TpG | -7.13 × 10-2 ± 1.03 × 10-2** |
| TpG to CpG | -1.52 × 10-2 ± 1.01 × 10-2 |
| non-CpG Transition | -3.23 × 10-2 ± 4.40 × 10-3** |
| Transversion | -6.18 × 10-2 ± 3.66 × 10-3** |
| Gene Density a | |
| CpG to TpG | 7.21 × 10-3 ± 1.68 × 10-3** |
| TpG to CpG | 8.36 × 10-3 ± 1.62 × 10-3** |
| non-CpG Transition | 8.14 × 10-3 ± 6.94 × 10-4** |
| Transversion | 9.78 × 10-3 ± 5.55 × 10-4** |
| CpG Island Density a | |
| CpG to TpG | -4.23 × 10-2 ± 6.72 × 10-3** |
| TpG to CpG | -5.38 × 10-2 ± 6.46 × 10-3** |
| non-CpG Transition | -4.64 × 10-2 ± 2.80 × 10-3** |
| Transversion | -5.76 × 10-2 ± 2.25 × 10-3** |
| Chromosome Size | |
| CpG to TpG | -3.83 × 10-9 ± 2.64 × 10-10** |
| TpG to CpG | -4.70 × 10-9 ± 2.52 × 10-10** |
| non-CpG Transition | -4.30 × 10-9 ± 1.08 × 10-10** |
| Transversion | -5.65 × 10-9 -9 ± 8.40 × 10-11 ** |
*at 1% level of significance; **at 0.1% level of significance; aper Mb
Correlation coefficients squared (r2) between ln (Rate ×108) and chromosomal features
| Type | r2 obtained by single regression to Grantham distance | Adjusted r2 obtained by multiple regression to Grantham distance, GC content, gene density, CpG island density |
|---|---|---|
| CpG to TpG | 0.051 | 0.089 |
| TpG to CpG | 0.154 | 0.224 |
| non-CpG Transition | 0.122 | 0.182 |
| Transversion | 0.054 | 0.190 |