| Literature DB >> 26366945 |
Zhifu Sun1, Julie Cunningham2, Susan Slager1, Jean-Pierre Kocher1.
Abstract
Bisulfite treatment-based methylation microarray (mainly Illumina 450K Infinium array) and next-generation sequencing (reduced representation bisulfite sequencing, Agilent SureSelect Human Methyl-Seq, NimbleGen SeqCap Epi CpGiant or whole-genome bisulfite sequencing) are commonly used for base resolution DNA methylome research. Although multiple tools and methods have been developed and used for the data preprocessing and analysis, confusions remains for these platforms including how and whether the 450k array should be normalized; which platform should be used to better fit researchers' needs; and which statistical models would be more appropriate for differential methylation analysis. This review presents the commonly used platforms and compares the pros and cons of each in methylome profiling. We then discuss approaches to study design, data normalization, bias correction and model selection for differentially methylated individual CpGs and regions.Entities:
Keywords: DNA methylation; bisulfite sequencing; differential methylation; methylation 450K array; normalization; reduced representation bisulfite sequencing; study design
Mesh:
Substances:
Year: 2015 PMID: 26366945 PMCID: PMC4790440 DOI: 10.2217/epi.15.21
Source DB: PubMed Journal: Epigenomics ISSN: 1750-192X Impact factor: 4.778
Common base resolution methylation sequencing platforms.
| Sequence regions | Whole genome needs at least 1 b reads | Preselected and designed | Preselected and designed | |
| Genome coverage | Highest (28 million CpGs) | 84 Mb design covering 3.7 million CpGs | 80.5 Mb, ˜5.5 million CpG sites | Lowest (8–10% CpGs) |
| CGI coverage | Intermediate | High | High | High |
| Cost per sample | Most expensive (50-fold, US$5–7000) | + capture kit cost; two/lane | + capture kit cost (four samples/lane) | Least expensive (US$400–500 per sample at four samples/lane) |
| Resolution | Single base; quantitative | Single base; quantitative | Single base; quantitative | Single base; quantitative |
| Information | Most comprehensive, both methylated and unmethylated | More in CGI, shores, promoters and known DMRs | More in CGI, shores, promoters and known DMRs | CpG rich regions like CGI, promoters |
| DNA input | 10 ng–5 µg† | 3 µg | 1 µg | 100 ng–2µg |
†Required amount varies depending on protocols.
CGI: CpG island; DMR: Differentially methylated region; RRBS: Reduced representation bisulfite sequencing; WGBS: Whole-genome bisulfite sequencing.
Reduced representation bisulfite sequencing mechanism and flow diagram.
(A) Original DNA with CCGG motif at both ends. In human cytosine methylation occurs at CpG site (marked with m) and non-CpG cytosine is generally not methylated. The arrows point to the MSP1 cut sites, which is methylation independent. (B) After MSP1 digestion, DNA fragments are generated with sticky ends. Fragments in right sizes (generally 40–250 bps) for sequencing are selected. (C) The end repair adds CG (in blue, generally not methylated) from media that are not part of human sequence and needs to be removed in the analysis step. (D) The bisulfite treatment converts unmethylated cytosine to uracil but the methylated cytosine remains as cytosine. (E) The PCR amplification step converts/interprets uracil (U) as thymine (T). The amplification is based upon the original top and bottom strands, which are no longer complementary and generate their respective offspring sequences. For single end sequencing, only OT and OB sequences are used, however, for pair end sequencing all four strands are generated. Analysis need to group them correctly.
OB: Original bottom strand; OBC: Original bottom strand complementary; OT: Original top strand; OTC: Original top strand complementary.
Comparison of 450K data preprocessing methods/algorithms.
| Raw data processing | Y | N | N | Y | N | N |
| Background/control | Y; optional | Y | N | Y | NA | NA |
| Color bias adjustment | N | Y | Y | Y | N | N |
| Design I and II bias correction | N | Y | Y | N | Y | Peak shift |
| Across sample normalization | N or background control norm | Y | N | Y (Pool/QN) | N | Optional QN |
BMIQ: Beta mixture quantile dilation; IMA: Illumina methylation analyzer; N: No; NA: Not applicable; QN: Quantile normalization; SQN: Subset quantile normalization; SWAN: Subset quantile within array normalization; Y: Yes.
Differentially methylated region detection method comparisons.
| BSmooth | WGBS RRBS? | R | Smooth/t test | DMR | Designed for WGBS, customization needed for RRBS; DMRs detected automatically; no covariates | [ |
| BiSeq | RRBS | R | Smooth/beta regression | DMR | More specifically for targeted RRBS data; identify DMR (CpG cluster) automatically; allow covariates | [ |
| methylKit | WGBS RRBS | R | Logistic regression | DMR | Fisher's test for a pair of samples and logistic regression for more samples with covariates; tiling window or predefined region for testing | [ |
| | | | Fisher's test | Annotation | | |
| Bump hunting | Array RRBS | R | Linear regression | DMR | Only for ratio data; allow covariates; auto DMR detection | [ |
| MOABS | BS data | C++ | Beta-binomial hierarchical model | DMC/DMR | Group DMCs to DMR by a Hidden Markov Model | [ |
| Methylsig | RRBS WGBS | R | Beta-binomial | DMC/DMR | Tiling window for DMRs (default 25 bps, likely too fragmented) | [ |
| Radmeth | WGBS RRBS | C++ | Beta-binomial | DMC/DMR | Merge DMCs to DMR by weighted Z test for p-values | [ |
BS: Bisulfite sequencing; DMC: Differentially methylated CpG; DMR: Differentially methylated region; MOABS: Model-based analysis of bisulfite sequencing; RRBS: Reduced representation bisulfite sequencing; WGBS: Whole-genome bisulfite sequencing.