| Literature DB >> 29587630 |
Ingo Bulla1,2, Benoît Aliaga3, Virginia Lacal4, Jan Bulla5, Christoph Grunau3, Cristian Chaparro3.
Abstract
BACKGROUND: DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG.Entities:
Keywords: CpG o/e ratio; CpN o/e ratio; DNA methylation; Epigenetics; Kernel density estimation
Mesh:
Year: 2018 PMID: 29587630 PMCID: PMC5870242 DOI: 10.1186/s12859-018-2115-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow. Steps: 1. CpGo/e ratios are calculated for the sequences to be analyzed (in our case dbEST) using CpGoe.pl. 2. Removal of outliers (first step of KDEanalysis.r). 3. Mode detection (second step of KDEanalysis.r)
Fig. 2Step 1: data cleaning of a sample of CpG o/e ratios from the locust Locusta migratoria. The left panel a shows the original data. The middle panel b displays the data after removal of all values equal to zero. The blue vertical line corresponds to the sample median. Red vertical lines indicate the possible thresholds for excluding outliers and extreme observations. The selected threshold (k=2) is solid, alternative thresholds are dotted. The right panel c shows the cleaned data with the sample median and the selected threshold
Fig. 3Step 2: kernel density estimation for samples of CpG o/e ratios from four species. The red line corresponds to the density estimated via KDE. Full vertical blue lines indicate modes with PM ≥ 0.1. Shaded blue areas around the modes correspond to bootstrap confidence intervals with a default level of 95%. From top to bottom, the panels show results for Locusta migratoria (a), Alligator mississippiensis (b), Antheraea mylitta (c), and Citrus clementina (d)
This table shows the number of modes selected by different approaches and methods for 17 selected species: the test of Silverman (2nd column), model-based approaches, based on the criteria AIC, BIC, and ICL (3rd to 5th column) and Notos (last column). The maximum number of modes is limited to ten, all mixture models were estimated by the R-package mclust
| Species | Silv. | AIC | BIC | ICL | Notos |
|---|---|---|---|---|---|
|
| 1 | 10 | 5 | 1 | 2 |
|
| 1 | 8 | 8 | 1 | 1 |
|
| 1 | 7 | 2 | 1 | 1 |
|
| 2 | 6 | 3 | 1 | 2 |
|
| 1 | 7 | 4 | 1 | 1 |
|
| 4 | 6 | 3 | 1 | 1-2 |
|
| 1 | 5 | 1 | 1 | 1 |
|
| 2 | 5 | 3 | 1 | 2 |
|
| 1 | 8 | 4 | 1 | 1-2 |
|
| 1 | 5 | 3 | 1 | 1 |
|
| 1 | 8 | 8 | 1 | 1 |
|
| 1 | 9 | 5 | 1 | 1 |
|
| 2 | 8 | 3 | 1 | 1 |
|
| 2 | 9 | 9 | 1 | 2 |
|
| 2 | 9 | 6 | 1 | 2 |
|
| 1 | 10 | 4 | 1 | 2 |
|
| 1 | 10 | 8 | 1 | 1 |
Fig. 4Examples for model-based clustering and model selection with Gaussian mixtures of CpG o/e ratios. The red line corresponds to the estimated density via KDE. Full vertical blue lines indicate the location of means belonging to each component of the mixture distribution (estimated by the R-package mclust). The top panel a shows the model selected by the AIC for Locusta migratoria, while the lowest panel c displays the corresponding ICL solution. The middle panel b displays the model selected by the BIC for Alligator mississippiensis
Fig. 5CpN o/e analyzed by Notos for Neurospora crassa. The red line corresponds to the estimated density via KDE. Full vertical blue lines indicate modes with PM ≥ 0.1. Shaded blue areas around the modes correspond to bootstrap confidence intervals with a default level of 95%. The panels show kernels of transcripts for CpG o/e (a) and CpA o/e (b), and for repeats (c and d), respectively. In this case CpG and CpA o/e ratios were calculated for spliced exons and repeat regions of the N. crassa genome. Both o/e frequency distributions are clearly unimodal, but for the CpA o/e in repeats there is a shift towards 0.5 which is concordant with DNA methylation only in this context (repeats and CpA) in this species