| Literature DB >> 36123620 |
Heesun Kim1, Mikang Sim1, Nayoung Park1, Kisang Kwon1, Junyoung Kim1, Jaebum Kim2.
Abstract
BACKGROUND: DNA methylation is an important epigenetic modification that is known to regulate gene expression. Whole-genome bisulfite sequencing (WGBS) is a powerful method for studying cytosine methylation in a whole genome. However, it is difficult to obtain methylation profiles using the WGBS raw reads and is necessary to be proficient in all types of bioinformatic tools for the study of DNA methylation. In addition, recent end-to-end pipelines for DNA methylation analyses are not sufficient for addressing those difficulties.Entities:
Keywords: DNA methylation; Next generation sequencing; Pipeline; Whole-genome bisulfite sequencing
Mesh:
Substances:
Year: 2022 PMID: 36123620 PMCID: PMC9487059 DOI: 10.1186/s12859-022-04925-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Comparison of methylation analysis pipelines
| Pipeline | Installation | Quality control | Alignment | Methylation calling | DMC/DMR analysis | HMR analysis | Gene function analysis | Reference setting |
|---|---|---|---|---|---|---|---|---|
| msPIPE | Docker Manual | Cutadapt Trim Galore! MultiQC | Bismark BS-Seeker2 | Bismark BS-Seeker2 | methylKit BSmooth | MethylSeekR | g:Profiler | Automatica |
| BAT [ | Docker Manual | BAT | segemehl | haarz | metilene | NA | NA | Manual |
| bicycle [ | Manual Docker Live CD | bicycle | bicycle | bicycle | bicycle | NA | NA | Manual |
| ENCODE pipeline [ | DNAnexus | Trim Galore! SAMtools Bismark | Bismark | Bismark | NA | NA | NA | Manual |
| Msuite [ | Manual | Msuite | Msuite | Msuite | NA | NA | NA | Manual |
| Nextflow methylseq (Bismark) [ | Nextflow | Trim Galore! MultiQC | Bismark | Bismark | NA | NA | NA | Automaticb |
| Nextflow methylseq (bwa-meth) [ | Nextflow | Trim Galore! MultiQC | bwa-meth | MethylDackel | NA | NA | NA | Automaticb |
| PiGx BS-seq [ | GNU guix | Trim Galore! MultiQC | Bismark | methylKit | methylKit | NA | NA | Manual |
| snakePipes [ | Bioconda | Cutadapt Trim Galore! Fastp MultiQC | bwa-meth | MethylDackel | dmrseq DSS metilene | NA | NA | Partially automaticc |
| wg-blimp [ | Bioconda Docker | MultiQC | bwa-meth | MethylDackel | bsseq camel metilene | MethylSeekR | NA | Manual |
NA not available
aAll required files of a reference can be automatically prepared and set if the data exists in the UCSC Genome Browser database [40], and manual setting is also supported
bAll required files of a reference can be automatically prepared and set if the data exists in the iGenomes database [48], and manual setting is also supported
cAll required files of a reference can be automatically prepared and set if the reference is one of five species (human, mouse, zebrafish, fruit fly, and fission yeast), and manual setting is also supported
Fig. 1Overview of the msPIPE workflow. Using WGBS read files and UCSC assembly name of a reference as input, the msPIPE automates the entire DNA methylation analysis starting from input data pre-processing to methylation analysis. The reference genome sequences and annotation files of input species are collected from the UCSC genome browser. The trimmed reads are mapped to the bisulfite-converted genome sequences, and methylation calls are made. Based on these methylation calls, methylation profiling, hypomethylated regions analysis, differential methylation analysis, and the function analysis for methylation-related genes are performed
Fig. 2Sample results of msPIPE using the human WGBS dataset. a The read quality and statistics of all processed input samples were reported to the MultiQC html file. b The average CpG methylation levels in each genomic context, including promoter, gene, exon, intron, and intergenic regions of the old sperm sample are represented by a bar plot. The methylation levels (%) of c CpG, d CHG, e CHH context in the old sperm sample are shown. The bin size of the histogram is 10%. f The average levels of CpG, CHG, and CHH methylation for each given sample. g Genome-wide CpG methylation levels as well as UMR and LMR distribution in the old sperm sample are presented as the Circos plot. The red bar plot on the outermost track represents the average methylation level for 100 Kbp bin. In the absence of data, it was represented by a gray shadow. The dot plots on the inner two tracks represent UMR region shown in light green and the LMR region in light blue. The height of the graph indicates the methylation level of each region. A zero average methylation of the UMR (or LMR) is indicated by a red dot
Functional enrichment analysis results for 393 differentially methylated genes in human sperm samples
| Source | Term name | Term id | Adjusted p value* |
|---|---|---|---|
| GO:MF | Metal ion binding | GO:0046872 | 8.215E−03 |
| GO:MF | Cation binding | GO:0043169 | 1.289E−02 |
| TF | Factor: SRY; motif: TCAATAMCATTGA | TF:M04557 | 9.270E−10 |
| TF | Factor: SRY; motif: AACAATNNNCATTGTT | TF:M04556 | 7.598E−07 |
| TF | Factor: SRY; motif: AACAATNNNCATTGTT; match class: 1 | TF:M04556_1 | 5.787E−05 |
| TF | Factor: SRY; motif: TCAATAMCATTGA; match class: 1 | TF:M04557_1 | 6.900E−05 |
| TF | Factor: SRY; motif: AACAATANCATTGTT | TF:M04555 | 2.568E−04 |
| TF | Factor: SRY; motif: TTGTTT; match class: 1 | TF:M03854_1 | 8.544E−04 |
| TF | Factor: SRY; motif: AACAATNR; match class: 1 | TF:M08976_1 | 1.131E−03 |
| TF | Factor: SRY; motif: AACAATANCATTGTT; match class: 1 | TF:M04555_1 | 2.291E−03 |
| TF | Factor: SRY; motif: AACAATNR | TF:M08976 | 1.300E−02 |
| TF | Factor: SRY; motif: TTGTTT | TF:M03854 | 1.750E−02 |
*Adjusted p value was calculated by the g:SCS method in g:Profiler