| Literature DB >> 32660416 |
Yadollah Shahryary1,2, Rashmi R Hazarika1,2, Frank Johannes3,4.
Abstract
BACKGROUND: Whole-Genome Bisulfite Sequencing (WGBS) is a Next Generation Sequencing (NGS) technique for measuring DNA methylation at base resolution. Continuing drops in sequencing costs are beginning to enable high-throughput surveys of DNA methylation in large samples of individuals and/or single cells. These surveys can easily generate hundreds or even thousands of WGBS datasets in a single study. The efficient pre-processing of these large amounts of data poses major computational challenges and creates unnecessary bottlenecks for downstream analysis and biological interpretation.Entities:
Keywords: DNA methylation; NGS; Pipeline; Single cell; Whole genome bisulfite sequencing
Mesh:
Substances:
Year: 2020 PMID: 32660416 PMCID: PMC7359584 DOI: 10.1186/s12864-020-06886-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Basic workflow of MethylStar showing the pipeline architecture. The left panel shows a standard BS-Seq workflow and on the right are the different components of the MethylStar pipeline integrated as 3 different layers viz. Python, Shell and R. All steps of the pipeline have been parallelized using GNU parallel. MethylStar offers the option for “Quick run” (indicated in red) which runs all steps sequentially in one go or each component can be executed separately. MethylStar incorporates all pre-processing steps of a standard BS-Seq workflow and generates standard outputs that can be used for input into several downstream analysis tools
Table showing different features of MethylStar as compared to other BS-seq pipelines
| Methylpy | MethylStar | methylseq | gemBS | Bicycle | |
|---|---|---|---|---|---|
| Pipeline Features | |||||
| Multi-threading | |||||
| language | Python | Python, shell, R | Java | C, Python | Java |
| distribution | github, PyPI | GitHub | Github | GitHub | Github |
| (Apache license) | (GNU GPL3) | (MIT license) | (GNU GPL3) | (GNU GPL3) | |
| Installation & | pip install, install | Docker, install | Docker, | Docker, | Docker |
| configuration | dependencies | dependencies | Singularity, | Singularity | |
| Conda | |||||
| User-interface | - | - | - | - | |
| Single/paired-end | |||||
| Input data | Single-cell, WGBS, | WGBS, Single-cell | WGBS | RRBS, WGBS, | WGBS |
| singlecell NOMe-seq, PBAT | (PBAT) | PBAT | |||
| Pipe steps | |||||
| adapter trimming | Cutadapt | Trimmomatic | TrimGalore | - | bicycle analyzemethylation |
| alignment | bowtie/bowtie2 | Bismark | Bismark, | gem3 | bicycle align/ |
| bwa-meth | bowtie/bowtie2 | ||||
| remove PCR | Picard | Bismark | Bismark, Picard | Bscall | bicycle analyzemethylation |
| duplicates | |||||
| methylation | ProcessBismarAln, | Bismark, | Bscall | bicycle | |
| calling | Bismark | MethylDackel | analyzemethylation, GATK | ||
| imputation of | - | METHimpute | - | - | - |
| missing cytosines | |||||
| DMR calling | - | - | - | bicycle analyze | |
| differential | |||||
| methylation | |||||
| SNP calling | - | - | - | Bscall | - |
| Alignment QC | - | Bismark | Qualimap | ||
| summary reports | FastQC | Bismark, | |||
| MultiQC, Preseq | |||||
| Methylation | BigWig | BigWig, bedGraph | - | BigWig, | BigWig |
| visualization | bedGraph | ||||
Fig. 2Performance of MethylStar as compared with other BS-Seq analysis pipelines viz. Methylpy, nf-core/methylseq and gemBS in (a) A. thaliana (b) Maize (c) H1 cell line and (d) scBS-Seq samples. CPU processing time taken by METHimpute was not included in the current benchmarking process as there is no equivalent method in the other pipelines to compare with. Because of the very long run times observed for the A. thaliana data, Methylpy and Methylseq were no longer considered for benchmarking of speed in Maize and H1 cell line samples. All pipelines were run using 32 jobs. (e) Peak memory usage as a function of time for 10 random A. thaliana samples. (f) Time taken by each component of MethylStar. X-axis shows the individual components of MethylStar where the dot with lighter shade of orange indicates -without parallel and darker shade of orange indicates - with parallel implementation of MethylStar. On the y-axis is the time in mins. The size of the dot indicates the peak memory usage in MB by each component