| Literature DB >> 25355807 |
Ning Jiang1, Lin Wang2, Jing Chen3, Luwen Wang2, Lindsey Leach4, Zewei Luo5.
Abstract
DNA methylation in the genome plays a fundamental role in the regulation of gene expression and is widespread in the genome of eukaryotic species. For example, in higher vertebrates, there is a "global" methylation pattern involving complete methylation of CpG sites genome-wide, except in promoter regions that are typically enriched for CpG dinucleotides, or so called "CpG islands." Here, we comprehensively examined and compared the distribution of CpG sites within ten model eukaryotic species and linked the observed patterns to the role of DNA methylation in controlling gene transcription. The analysis revealed two distinct but conserved methylation patterns for gene promoters in human and mouse genomes, involving genes with distinct distributions of promoter CpGs and gene expression patterns. Comparative analysis with four other higher vertebrates revealed that the primary regulatory role of the DNA methylation system is highly conserved in higher vertebrates.Entities:
Keywords: CpG sites within promoters; comparative phylogenetic analysis; conservation and divergence in DNA methylation; eukaryotes; genome-wide CpG site distribution
Mesh:
Year: 2014 PMID: 25355807 PMCID: PMC4255770 DOI: 10.1093/gbe/evu238
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
GC Content and Distribution of CpG Sites in Vertebrate, Invertebrate, and Plant Genome Features
| Higher Vertebrates (Mammals, Birds) | Lower Vertebrate, Invertebrates, and Plant | |||||
|---|---|---|---|---|---|---|
| GC % | Expected CpG % | Observed CpG % | GC % | Expected CpG % | Observed CpG % | |
| Genome-wide | 37.95–42.39 | 3.61–4.49 | 0.95–2.08 | 35.44–41.24 | 3.14–4.25 | 3.48–8.11 |
| Intron | 40.37–42.46 | 4.07–4.50 | 1.75–2.21 | 32.14–39.91 | 2.58–3.98 | 2.65–7.35 |
| Exon | 48.88–51.57 | 5.97–6.65 | 5.35–6.78 | 42.42–50.00 | 4.50–6.25 | 5.89–11.69 |
| Transcripts | 48.47–51.72 | 5.87–6.69 | 4.94–6.82 | 42.59–50.10 | 4.54–6.27 | 5.93–11.71 |
| Promoter | 52.21–57.29 | 6.81–8.21 | 7.66–11.98 | 32.42–41.55 | 2.63–4.32 | 4.19–9.08 |
aRange among six mammalian higher vertebrate species (human, mouse, rat, cow, dog, and chicken).
bRange among four lower vertebrate, invertebrate, or plant species (zebrafish, Drosophila, Arabidopsis, Caenorhabditis elegans).
cExpected CpG percentage calculated based on the observed GC percentage.
FObserved and expected proportions of CpGs across the entire genome or in gene promoters of ten model species.
Observed Proportion and Expected Poisson Probability of Promoters Classified into Each of the Six CpG Density Categories in the Human Genome
| Number of CpG Sites per 1,000 Bases of Promoter Sequence | Total | ||||||
|---|---|---|---|---|---|---|---|
| 0–25 | 26–40 | 41–50 | 51–60 | 61–75 | >75 | ||
| Observed number of promoters | 10,028 | 3,306 | 2,205 | 2,674 | 4,888 | 11,056 | 34,157 |
| Observed proportion of promoters (%) | 29.36 | 9.68 | 6.46 | 7.83 | 14.41 | 32.37 | 100 |
| Expected number of promoters | 5 | 2,272 | 14,155 | 14,493 | 3,187 | 38 | 34,157 |
| Expected proportion of promoters (%) | 0.01 | 6.65 | 41.44 | 42.43 | 9.33 | 0.11 | 100 |
| Pearson’s chi-square statistic | 861.42 | 0.01 | 0.30 | 0.28 | 0.03 | 94.61 | 956.65 |
aExpected proportion is calculated based on a Poisson distribution with mean parameter equal to 51.
FProportion of CpG sites in gene promoters across ten model species. The horizontal axis represents the proportion of CpG sites in gene promoters, whereas the vertical axis represents the number of promoters for each model species. Color is used to distinguish HCP (black), LCP (green), and ICP (red).
FDistribution of CpGs with respect to the TSS. The horizontal axis represents the distance from the TSS, whereas the vertical axis represents the CpG fraction.
Conservation of Two Classes of Promoter in Higher Vertebrates
| Proportion of Conserved HCP (%) | Proportion of Conserved LCP (%) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | Mouse | Rat | Cow | Dog | Chicken | Human | Mouse | Rat | Cow | Dog | Chicken | |
| Human | 7,139 | 97.6 | 97.4 | 96.9 | 88.8 | 85.7 | 2,895 | 86.7 | 87.3 | 79.5 | 89.8 | 42.1 |
| Mouse | 93.7 | 8,097 | 96.7 | 93.7 | 85.8 | 84.7 | 85.1 | 4,365 | 89.9 | 83.2 | 87.4 | 48.2 |
| Rat | 91.9 | 92.6 | 5,634 | 90.4 | 83.4 | 89.7 | 85.0 | 94.6 | 2,596 | 86.9 | 77.9 | 56.3 |
| Cow | 93.6 | 94.1 | 95.2 | 1,536 | 84.5 | 84.3 | 90.6 | 96.0 | 79.3 | 577 | 88.9 | 54.8 |
| Dog | 82.2 | 80.5 | 84.9 | 86.3 | 435 | 80.6 | 81.1 | 90.6 | 87.2 | 97.1 | 187 | 40.0 |
| Chicken | 89.6 | 87.0 | 89.7 | 84.4 | 82.5 | 913 | 47.3 | 53.6 | 51.6 | 53.7 | 33.3 | 251 |
Note.—The diagonal cells show the number of genes with HCP or LCP in each species. The upper and lower triangles show the percentage of genes in the column species also given the same classification for the row species.
Means and Standard Errors for Two Substitution Rate Statistics of Homologous Genes with Conserved Promoter Status
| Homologous Genes with Conserved HCP | Homologous Genes with Conserved LCP | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | Mouse | Rat | Cow | Dog | Chicken | Human | Mouse | Rat | Cow | Dog | Chicken | |
| Human | 0.63 ± 0.005 | 0.62 ± 0.006 | 0.63 ± 0.011 | 0.54 ± 0.019 | 0.76 ± 0.008 | 0.74 ± 0.012 | 0.71 ± 0.016 | 0.72 ± 0.029 | 0.63 ± 0.063 | 0.89 ± 0.028 | ||
| Mouse | 0.10 ± 0.001 | 0.21 ± 0.004 | 0.71 ± 0.008 | 0.63 ± 0.040 | 0.91 ± 0.010 | 0.22 ± 0.005 | 0.34 ± 0.007 | 0.77 ± 0.013 | 0.77 ± 0.040 | 1.05 ± 0.013 | ||
| Rat | 0.10 ± 0.001 | 0.09 ± 0.001 | 0.70 ± 0.009 | 0.67 ± 0.080 | 0.90 ± 0.010 | 0.20 ± 0.007 | 0.24 ± 0.005 | 0.79 ± 0.009 | 0.83 ± 0.041 | 1.03 ± 0.018 | ||
| Cow | 0.13 ± 0.004 | 0.10 ± 0.003 | 0.11 ± 0.003 | 0.45 ± 0.046 | 0.83 ± 0.025 | 0.29 ± 0.015 | 0.22 ± 0.007 | 0.19 ± 0.008 | 0.62 ± 0.015 | 1.01 ± 0.033 | ||
| Dog | 0.08 ± 0.007 | 0.05 ± 0.004 | 0.06 ± 0.005 | 0.07 ± 0.013 | 0.78 ± 0.049 | 0.16 ± 0.031 | 0.15 ± 0.018 | 0.16 ± 0.019 | 0.20 ± 0.044 | 0.99 ± 0.025 | ||
| Chicken | 0.09 ± 0.003 | 0.09 ± 0.003 | 0.08 ± 0.003 | 0.11 ± 0.009 | 0.07 ± 0.009 | 0.18 ± 0.020 | 0.15 ± 0.011 | 0.15 ± 0.011 | 0.19 ± 0.023 | 0.13 ± 0.040 | ||
Note.—The upper triangles show the rates of nucleotide substitution under the K80 in promoter regions for paired homologous genes with conserved promoter status (mean ± standard error). The lower triangles show the ratio of nonsynonymous and synonymous substitution rates (Ka/Ks) in protein-coding regions for paired homologous genes with conserved promoter status (mean ± standard error).
FPhylogenies of six higher vertebrate species reconstructed either from DNA and protein sequence data (A) or from conservation level of HCP or LCP status in gene promoters (B).
FMethylation and gene expression patterns across 28 human tissues. Methylation levels of CpG sites in all promoters (A), and in HCP and LCP (B), across 28 different human tissues. The average methylation levels with respect to the TSS, with each point representing the average methylation level in an interval of 10 bp (C). The correlation of methylation levels between all pairwise CpGs sites in the same promoter, with each point showing the average correlation in 10-bp intervals according to the distance between CpG sites (D). The correlation coefficient between methylation and gene expression level with increasing distance from the TSS (E). Distribution of the number of tissues in which HCP and LCP genes are expressed. Each bar is labeled with the corresponding number of expressed housekeeping genes as identified in Zhu et al. (2008) (F).
Conserved and Overrepresented GO Terms for Genes with HCP and LCP in Six Higher Vertebrates
| GO ID | Conservation | Subontology | GO Term Description |
|---|---|---|---|
| Overrepresented among genes with HCP | |||
| 0000122 | 4 | BP | Regulation of transcription from RNA polymerase promoter |
| 0003676 | 4 | MF | Nucleic acid binding |
| 0003677 | 4 | MF | DNA binding |
| 0003723 | 4 | MF | RNA binding |
| 0004672 | 4 | MF | Protein kinase activity |
| 0004930 | 4 | MF | G-protein coupled receptor activity |
| 0005634 | 4 | CC | Nucleus |
| 0005730 | 4 | CC | Nucleolus |
| 0006915 | 4 | BP | Apoptotic process |
| 0016021 | 4 | CC | Integral to membrane |
| 0016301 | 4 | MF | Kinase activity |
| 0043234 | 4 | CC | Protein complex |
| 0043565 | 5 | MF | Sequence-specific DNA binding |
| 0044212 | 4 | MF | Transcription regulatory region DNA binding |
| 0045892 | 4 | BP | Negative regulation of transcription, DNA-dependent |
| 0045893 | 4 | BP | Positive regulation of transcription, DNA-dependent |
| Overrepresented among genes with LCP | |||
| 0004869 | 4 | MF | Cysteine-type endopeptidase inhibitor activity |
| 0004984 | 4 | MF | Olfactory receptor activity |
| 0006955 | 4 | BP | Immune response |
| 0006958 | 5 | BP | Complement activation, classical pathway |
| 0006974 | 4 | BP | Response to DNA damage stimulus |
| 0007596 | 4 | BP | Blood coagulation |
| 0007601 | 4 | BP | Visual perception |
| 0008009 | 4 | MF | Chemokine activity |
| 0008270 | 4 | MF | Zinc ion binding |
| 0009897 | 4 | CC | External side of plasma membrane |
| 0015711 | 4 | BP | Organic anion transport |
| 0032729 | 4 | BP | Positive regulation of interferon-gamma production |
Note.—CC, cellular component; BP, biological process; MF, molecular function.
aThe number of higher vertebrate species for which the corresponding GO term is overrepresented.