| Literature DB >> 21696599 |
Carina F Mugal1, Hans Ellegren.
Abstract
BACKGROUND: A major goal in the study of molecular evolution is to unravel the mechanisms that induce variation in the germ line mutation rate and in the genome-wide mutation profile. The rate of germ line mutation is considerably higher for cytosines at CpG sites than for any other nucleotide in the human genome, an increase commonly attributed to cytosine methylation at CpG sites. The CpG mutation rate, however, is not uniform across the genome and, as methylation levels have recently been shown to vary throughout the genome, it has been hypothesized that methylation status may govern variation in the rate of CpG mutation.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21696599 PMCID: PMC3218846 DOI: 10.1186/gb-2011-12-6-r58
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Mean values and variances of the explanatory variables for datasets A, B and C
| Dataset A | Dataset B | Dataset C | ||||
|---|---|---|---|---|---|---|
| Mean | Variance | Mean | Variance | Mean | Variance | |
| Non-CpG divergence | 0.0206 | 3.98e-05 | 0.0207 | 3.00e-05 | 0.0203 | 2.93e-05 |
| Methylation level | 0.7723 | 1.17e-02 | 0.7588 | 1.45e-02 | 0.4348 | 4.09e-02 |
| GC content | 0.3961 | 3.28e-03 | 0.4221 | 3.93e-03 | 0.4719 | 5.42e-03 |
| Transcription level | 2.5920 | 4.22e-01 | 2.5770 | 4.00e-01 | 2.6240 | 3.98e-01 |
| Female recombination ratea | 0.0721 | 2.03e-01 | 0.0832 | 2.23e-01 | 0.0462 | 3.08e-01 |
| Male recombination ratea | -0.3969 | 5.01e-01 | -0.3380 | 4.50e-01 | -0.3551 | 5.27e-01 |
| CpG[o/e]b | 0.4168 | 1.85e-02 | 0.4334 | 1.68e-02 | 0.7026 | 6.82e-02 |
Dataset A contains no CpG islands (CGIs) or DNase I hypersensitive sites (DHSs)), dataset B contains DHSs but no CGIs, and dataset C contains both CGIs and DHSs. aLog-transformed values. bObserved versus expected CpG content.
Summary of parameter estimates and significance levels of the multivariate generalized linear regression analysis for CpG transition rate
| Dataset A | Dataset B | Dataset C | ||||
|---|---|---|---|---|---|---|
| Estimate | -log10(p) | Estimate | -log10(p) | Estimate | -log10(p) | |
| Divergence | 0.1832 | > 15.70 | 0.1574 | > 15.70 | 0.1722 | > 15.70 |
| Methylation level | 0.0385 | 9.50 | 0.1047 | > 15.70 | 0.3454 | > 15.70 |
| GC content | -0.1036 | > 15.70 | -0.1382 | > 15.70 | -0.1849 | > 15.70 |
| Transcription level | -0.0145 | 1.95 | -0.0094 | 3.01 | -0.0114 | 1.46 |
| Female recombination rate | 0.0031 | 0.22 | 0 | 0 | 0.0158 | 2.29 |
| Male recombination rate | -0.0054 | 0.43 | -0.0084 | 2.29 | -0.0167 | 2.42 |
| Explained deviance | 1,243 | 5,310 | 8,038 | |||
| Residual deviance | 5,894 | 10,496 | 3,451 | |||
| na | 13,038 | 21,636 | 3,871 | |||
| AICb | 37,300 | 78,030 | 17,172 | |||
Dataset A contains no CpG islands (CGIs) or DNase I hypersensitive sites (DHSs)), dataset B contains DHSs but no CGIs, and dataset C contains both CGIs and DHSs. aSample size. bAkaike Information Criterion.
Summary of parameter estimates and significance levels of the multivariate generalized linear regression analysis for CpG transversion rate
| Dataset A | Dataset B | Dataset C | ||||
|---|---|---|---|---|---|---|
| Estimate | -log10(p) | Estimate | -log10(p) | Estimate | -log10(p) | |
| Divergence | 0.2044 | > 15.70 | 0.1823 | > 15.70 | 0.1414 | > 15.70 |
| Methylation level | 0.0222 | 1.14 | 0.0534 | 14.88 | 0.0781 | > 15.70 |
| GC content | -0.0918 | > 15.70 | -0.1401 | > 15.70 | -0.1076 | > 15.70 |
| Transcription level | -0.0440 | 3.84 | -0.0322 | 8.15 | -0.0336 | 3.64 |
| Female recombination rate | 0.0071 | 0.26 | -0.0084 | 0.83 | 0.0044 | 0.19 |
| Male recombination rate | 0.0022 | 0.07 | -0.0006 | 0.04 | -0.0011 | 0.04 |
| Explained deviance | 356 | 1,421 | 519 | |||
| Residual deviance | 9,979 | 15,544 | 2,986 | |||
| na | 13,038 | 21,636 | 3,871 | |||
| AICb | 21,460 | 50,718 | 12,684 | |||
Dataset A contains no CpG islands (CGIs) or DNase I hypersensitive sites (DHSs)), dataset B contains DHSs but no CGIs, and dataset C contains both CGIs and DHSs. aSample size. bAkaike Information Criterion.
Summary of parameter estimates and significance levels of the multivariate linear regression analysis for the CpG transition/transversion rate ratio (κ)
| Dataset A | Dataset B | Dataset C | ||||
|---|---|---|---|---|---|---|
| Estimate | -log10(p) | Estimate | -log10(p) | Estimate | -log10(p) | |
| Divergence | 0.2574 | 15.20 | 0.1450 | 9.54 | 0.2093 | 8.46 |
| Methylation level | 0.1666 | 6.79 | 0.3116 | > 15.70 | 0.8127 | > 15.70 |
| GC content | 0.2478 | 13.98 | 0.0257 | 0.57 | -0.1815 | 5.84 |
| Transcription level | -0.1744 | 7.31 | -0.0294 | 0.69 | 0.0871 | 1.77 |
| Female recombination rate | -0.0321 | 0.48 | 0.0533 | 1.60 | 0.0210 | 0.25 |
| Male recombination rate | 0.0971 | 2.49 | 0.0642 | 2.13 | -0.0130 | 0.14 |
| Explained variance | 3.97% | 1.73% | 17.73% | |||
| na | 5,550 | 14,165 | 3,376 | |||
Dataset A contains no CpG islands (CGIs) or DNase I hypersensitive sites (DHSs)), dataset B contains DHSs but no CGIs, and dataset C contains both CGIs and DHSs. aSample size.
Summary of parameter estimates and significance levels of the multivariate generalized linear regression analysis for CpH transition rate
| Dataset A | Dataset B | Dataset C | ||||
|---|---|---|---|---|---|---|
| Estimate | -log10(p) | Estimate | -log10(p) | Estimate | -log10(p) | |
| Divergence | 0.2559 | > 15.70 | 0.2205 | > 15.70 | 0.2094 | > 15.70 |
| Methylation level | 0 | 0 | 0.0036 | 1.34 | 0.0032 | 0.52 |
| GC content | -0.1366 | > 15.70 | -0.1389 | > 15.70 | -0.1681 | > 15.70 |
| Transcription level | 0.0084 | 3.08 | 0.0065 | 6.09 | 0.0063 | 1.59 |
| Female recombination rate | 0.0048 | 1.15 | 0.0002 | 0.04 | 0.0019 | 0.27 |
| Male recombination rate | 0.0083 | 2.71 | 0.0046 | 2.91 | 0.0014 | 0.19 |
| Explained deviance | 10,790 | 24,947 | 6,639 | |||
| Residual deviance | 7,407 | 12,664 | 2,427 | |||
| na | 13,038 | 21,636 | 3,871 | |||
| AICb | 60,968 | 113,273 | 21,181 | |||
Dataset A contains no CpG islands (CGIs) or DNase I hypersensitive sites (DHSs)), dataset B contains DHSs but no CGIs, and dataset C contains both CGIs and DHSs. aSample size. bAkaike Information Criterion.
Summary of parameter estimates and significance levels of the multivariate generalized linear regression analysis for the set of concatenated introns
| CpG transition rate | CpG transversion rate | CpH transition rate | ||||
|---|---|---|---|---|---|---|
| Estimate | -log10(p) | Estimate | -log10(p) | Estimate | -log10(p) | |
| Divergence | 0.1182 | > 15.70 | 0.1549 | > 15.70 | 0.1854 | > 15.70 |
| Methylation level | 0.0413 | 10.21 | 0.0261 | 1.44 | 0.0035 | 0.61 |
| GC content | -0.1229 | > 15.70 | -0.1404 | > 15.70 | -0.1423 | > 15.70 |
| Transcription level | -0.0150 | 2.44 | -0.0320 | 2.82 | 0.0107 | 5.53 |
| Female recombination rate | 0.0059 | 0.61 | 0.0073 | 0.32 | 0.0056 | 1.72 |
| Male recombination rate | -0.0047 | 0.44 | -0.0066 | 0.29 | 0.0085 | 3.57 |
| Explained deviance | 880 | 333 | 6,538 | |||
| Residual deviance | 2,390 | 3,614 | 3,387 | |||
| na | 5,454 | 5,454 | 5,454 | |||
| AICb | 21,263 | 14,384 | 30,905 | |||
aSample size. bAkaike Information Criterion.
Figure 1Covariation between the observed versus expected CpG content (CpG. The three panels show the covariation for datasets A, B and C, respectively (see text). Each grey dot represents one intronic region. The black dashed line is the linear regression line of a simple linear regression model.
Figure 2Covariation between CpG transition rate and the observed versus expected CpG content (CpG. The three panels show the covariation for datasets A, B and C, respectively. Each grey dot represents one intronic region.
Figure 3Covariation between CpG transition rate and methylation level for intronic regions of the human genome. The three panels show the covariation for datasets A, B and C, respectively. Each grey dot represents one intronic region. The black dashed line is the linear regression line of a simple linear regression model.
Comparison of the CpG transition rate between the X chromosome and the 22 autosomes
| CpG specific transition rate | CpG transition rate | |||
|---|---|---|---|---|
| Levels of the one-way ANOVA | Estimate | P-value | Estimate | P-value |
| Intercept | -0.0708 | 1.04e-03** | -0.1730 | 6.82e-15*** |
| Chromosome 1 | 0.0638 | 1.58e-02* | 0.1531 | 1.83e-08*** |
| Chromosome 2 | 0.0565 | 3.32e-02* | 0.1567 | 9.46e-09*** |
| Chromosome 3 | 0.0870 | 1.21e-03** | 0.1809 | 5.97e-11*** |
| Chromosome 4 | 0.0644 | 2.46e-02* | 0.1759 | 2.37e-09*** |
| Chromosome 5 | 0.0437 | 1.20e-01 | 0.1374 | 2.03e-06*** |
| Chromosome 6 | 0.0579 | 4.94e-02* | 0.1666 | 3.86e-08*** |
| Chromosome 7 | 0.0330 | 2.97e-01 | 0.1293 | 6.84e-05*** |
| Chromosome 8 | -0.0036 | 9.08e-01 | 0.1382 | 1.53e-05*** |
| Chromosome 9 | 0.0942 | 2.71e-03** | 0.1890 | 4.94e-09*** |
| Chromosome 10 | 0.0871 | 2.95e-03** | 0.1864 | 6.22e-10*** |
| Chromosome 11 | 0.0988 | 1.44e-03** | 0.1911 | 2.09e-09*** |
| Chromosome 12 | 0.0555 | 6.11e-02 | 0.1345 | 1.03e-05*** |
| Chromosome 13 | 0.0625 | 8.53e-02 | 0.1501 | 5.88e-05*** |
| Chromosome 14 | 0.0005 | 8.88e-01 | 0.0900 | 9.89e-03** |
| Chromosome 15 | 0.0333 | 9.87e-01 | 0.1376 | 2.74e-05*** |
| Chromosome 16 | 0.0630 | 1.13e-01 | 0.1707 | 2.92e-05*** |
| Chromosome 17 | -0.0011 | 9.74e-01 | 0.1144 | 1.33e-03** |
| Chromosome 18 | 0.0862 | 1.92e-02* | 0.1901 | 5.02e-07*** |
| Chromosome 19 | -0.0118 | 8.68e-01 | 0.1157 | 1.15e-01 |
| Chromosome 20 | -0.0077 | 8.46e-01 | 0.0756 | 6.28e-02 |
| Chromosome 21 | 0.1033 | 5.49e-02 | 0.2319 | 2.76e-05*** |
| Chromosome 22 | 0.0288 | 6.00e-01 | 0.1228 | 2.99e-02* |
CpG-specific transition rate is corrected for among-chromosomes variation in divergence, GC content, DNA methylation and transcription level; CpG transition rate is only corrected for variation in GC content, DNA methylation and transcription level among chromosomes. Estimates and P-values of a one-way ANOVA are listed, where the contrasts are based on the X chromosome. P-values are marked with asterisks to highlight their significance level, where single, double and triple asterisks indicate P-values below a threshold of 0.05, 0.01 and 0.001, respectively.
Figure 4Box plots of the CpG-specific transition rate (see Materials and methods) and CpG transition rate per chromosome. The black dashed line represents the overall mean over all chromosomes.