| Literature DB >> 24497972 |
Fang Liang1, Bixia Tang1, Yanqing Wang1, Jianfeng Wang1, Caixia Yu1, Xu Chen1, Junwei Zhu1, Jiangwei Yan1, Wenming Zhao1, Rujiao Li1.
Abstract
Whole-Genome Bisulfite Sequencing (WGBS) and genome-wide Reduced Representation Bisulfite Sequencing (RRBS) are widely used to study DNA methylation. However, data analysis is complicated, lengthy, and hampered by a lack of seamless analytical pipelines. To address these issues, we developed a convenient, stable, and efficient web service called Web Service for Bisulfite Sequencing Data Analysis (WBSA) to analyze bisulfate sequencing data. WBSA focuses on not only CpG methylation, which is the most common biochemical modification in eukaryotic DNA, but also non-CG methylation, which have been observed in plants, iPS cells, oocytes, neurons and stem cells of human. WBSA comprises three main modules as follows: WGBS data analysis, RRBS data analysis, and differentially methylated region (DMR) identification. The WGBS and RRBS modules execute read mapping, methylation site identification, annotation, and advanced analysis, whereas the DMR module identifies actual DMRs and annotates their correlations to genes. WBSA can be accessed and used without charge either online or local version. WBSA also includes the executables of the Portable Batch System (PBS) and standalone versions that can be downloaded from the website together with the installation instructions. WBSA is available at no charge for academic users at http://wbsa.big.ac.cn.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24497972 PMCID: PMC3907392 DOI: 10.1371/journal.pone.0086707
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of WBSA's WGBS module with six pipelines.
| Functions | WBSA-WGBS | CyMATE | CpG PatternFinder | GBSA | COHCAP | methylKit | BSmooth |
| Read Quality analysis | Y | N | N | N | N | N | Y |
| Filter adaptor & low quality | Y | N | N | N | N | N | N |
| Computation of conversion rate | Y | N | Y | N | N | N | N |
| Alignment | Y | N | N | N | N | N | Y |
| Focus on non-CGs | Y | Y | N | Y | N | N | N |
| Methylation level | Y | Y | Y | Y | Y | Y | Y |
| Methylation distribution | Y | Y | Y | N | N | N | N |
| Relationship of methylation and CpG islands | Y | N | N | Y | Y | Y | N |
| Gene annotation | Y | N | N | Y | N | Y | N |
| Functional analysis of genes with high or low methylation | Y | N | N | N | N | N | N |
| Sequence preference | Y | N | N | N | N | N | N |
| Correlation between methylation and gene expression | Y | N | N | N | Y | N | N |
| Online version | Y | Y | Y | Y | Y | Y | Y |
| Standalone version | Y | N | N | N | N | N | N |
| PBS version | Y | N | N | N | N | N | N |
only support single-end data.
Comparison of WBSA's DMR module with four pipelines.
| Functions | WBSA-DMR | RRBS-analyser | BSmooth | methylKit | QDMR |
| Focus on CGs | Y | Y | Y | Y | Y |
| Focus on non-CGs | Y | Y | N | N | N |
| Correlation between DMR and genes | Y | Y | N | Y | Y |
| The functional analysis of correlative genes | Y | N | N | N | Y |
| More than one method of DMR identification | Y | N | N | N | N |
| Online version | Y | Y | Y | Y | Y |
| Standalone version | Y | N | N | N | N |
| PBS version | Y | N | N | N | N |
Figure 1Flowchart of data analysis.
a. Flowchart of data analysis for WGBS and RRBS. WGBS and RRBS include four parts as follows: pre-processing of reads and the reference sequence, mapping to the reference genome, mC identification, and methylation annotation. The sequencing reads, reference sequences, and the lambda sequence should be used as input data, and all the results can be previewed and downloaded. b. Flowchart of DMR identification. The DMR analysis module includes DMR identification and annotation.
Figure 2WBSA system architecture and workflow.
When the user chooses one analysis module from the web page such as WGBS, the user must input several parameters according to the instructions provided. The web process, which was developed using a Struts and Spring framework, will then proceed with the user's request and generate an XML file to store the parameters provided by the user. At the same time, it will insert a record into the database to identify the new job. The workflow monitor process BIG Workflow will continually monitor the job's status from the database and will activate the data processing procedure if a new job is found. When the user previews the result on the web page, the web process will indicate the status of the job and show the appropriate results to the user.
Comparison of mapping times and accuracies among WBSA, BSMAP, and Bismark for simulated WGBS data.
| Read length (bp) | Species | Software | Alignment Parameters | Mapping Time (hours) | RAM (Gb) | Mapped Reads | Correctly Mapped Reads | False Positive | False Negative | ||||
| Num. (pairs) | % | Num. (pairs) | % | Num. (pairs) | % | Num. (pairs) | % | ||||||
| 80 | Zebrafish | Bismark (v0.8.1) | -q –phred33-quals -n 3 -l 16 | 47.80 | ∼5.5 | 78,801,150 | 88.26 | 77,891,346 | 87.25 | 0 | 0 | 5,985,422 | 6.70 |
| BSMAP (v2.74) | -s 16 -v 3 -p 1 -r 1 -R -u | 7.60 | ∼4.3 | 84,439,556 | 94.58 | 70,308,940 | 78.75 | 0 | 0 | 347,016 | 0.39 | ||
| WBSA | -n 3 -l 16 -k 3 | 24.07 | ∼4.3 | 84,776,394 | 94.96 | 80,698,421 | 90.39 | 0 | 0 | 10,178 | 0.01 | ||
| Rice | Bismark (v0.8.1) | -q –phred33-quals -n 3 -l 16 | 12.57 | ∼1.5 | 21570946 | 87.41 | 21266096 | 86.18 | 0 | 0 | 1871224 | 7.58 | |
| BSMAP (v2.74) | -s 16 -v 3 -p 1 -r 1 -R -u | 1.18 | ∼1.7 | 23416611 | 94.89 | 20235903 | 82.00 | 0 | 0 | 25559 | 0.10 | ||
| WBSA | -n 3 -l 16 -k 3 | 5.47 | ∼1.2 | 23442162 | 94.99 | 23289124 | 94.37 | 0 | 0 | 8 | 0 | ||
| 70 | Zebrafish | Bismark (v0.8.1) | -q –phred33-quals -n 3 -l 16 | 40.37 | ∼4.3 | 78160397 | 87.55 | 77067467 | 86.32 | 0 | 0 | 6626175 | 7.42 |
| BSMAP (v2.74) | -s 16 -v 3 -p 1 -r 1 -R -u | 11.45 | ∼4.3 | 84383101 | 94.52 | 72790003 | 81.53 | 0 | 0 | 403471 | 0.45 | ||
| WBSA | -n 3 -l 16 -k 3 | 25.45 | ∼4.3 | 84786567 | 94.97 | 84697662 | 94.87 | 0 | 0 | 5 | 0 | ||
| Rice | Bismark (v0.8.1) | -q –phred33-quals -n 3 -l 16 | 10.72 | ∼1.2 | 21390366 | 86.68 | 21034061 | 85.24 | 0 | 0 | 2051804 | 8.31 | |
| BSMAP (v2.74) | -s 16 -v 3 -p 1 -r 1 -R -u | 1.03 | ∼1.8 | 23422665 | 94.92 | 19760196 | 80.07 | 0 | 0 | 19505 | 0.08 | ||
| WBSA | -n 3 -l 16 -k 3 | 4.92 | ∼1.2 | 23442166 | 94.99 | 23121395 | 93.69 | 0 | 0 | 4 | 0 | ||
| 60 | Zebrafish | Bismark (v0.8.1) | -q –phred33-quals -n 2 -l 14 | 39.77 | ∼5.1 | 77325014 | 86.61 | 76000508 | 85.13 | 0 | 0 | 7461558 | 8.36 |
| BSMAP (v2.74) | -s 14 -v 2 -p 1 -r 1 -R -u | 8.05 | ∼4.3 | 84242377 | 94.36 | 70017299 | 78.43 | 0 | 0 | 544228 | 0.61 | ||
| WBSA | -n 2 -l 14 -k 2 | 15.93 | ∼4.3 | 84786571 | 94.97 | 84068061 | 94.16 | 0 | 0 | 1 | 0 | ||
| Rice | Bismark (v0.8.1) | -q –phred33-quals -n 2 -l 14 | 9.53 | ∼1.5 | 21158772 | 85.74 | 20741988 | 84.05 | 0 | 0 | 2283398 | 9.25 | |
| BSMAP (v2.74) | -s 14 -v 2 -p 1 -r 1 -R -u | 0.77 | ∼1.7 | 23412528 | 94.87 | 19161765 | 77.65 | 0 | 0 | 29642 | 0.12 | ||
| WBSA | -n 2 -l 14 -k 2 | 3.94 | ∼1.1 | 23442168 | 94.99 | 22910455 | 92.84 | 0 | 0 | 2 | 0 | ||
Comparison of mapping times and accuracies between WBSA, BSMAP, and Bismark for actual bisulfite sequencing data.
| Data type | Species | Software | Alignment Parameters | Mapping Time (hours) | RAM (Gb) | Mapped Reads | Uniquely Mapped Reads | ||
| Num. | % | Num. | % | ||||||
| WGBS | Human | Bismark(v0.8.1) | -q –phred33-quals -n 3 -l 16 | 303.9 | ∼10.6 | 166849837 | 37.33 | 153969814 | 34.45 |
| BSMAP(v2.74) | -s 16 -v 3 -p 1 -r 1 -R -u | 42.73 | ∼8.0 | 238134054 | 53.28 | 220938793 | 49.43 | ||
| WBSA | -n 3 -l 16 -k 3 | 113.20 | ∼9.2 | 240834825 | 53.88 | 222198832 | 49.71 | ||
| RRBS | Mouse | Bismark(v0.8.1) | -q –phred33-quals -n 2 -l 14 | 22.65 | ∼9.1 | 17609963 | 85.30 | 12893165 | 62.45 |
| BSMAP(v2.74) | -s 14 -v 2 -p 1 -r 1 -R -u | 3.93 | ∼6.8 | 12489362 | 60.50 | 9137791 | 44.26 | ||
| WBSA | -n 2 -l 14 -k 2 | 5.14 | ∼8.0 | 13250668 | 64.19 | 9533829 | 46.18 | ||
Comparison of mapping times and accuracies between WBSA, BSMAP, and Bismark for simulated RRBS data.
| Species | Software | Alignment Parameters | Mapping Time (hours) | RAM (Gb) | Mapped Reads | Correctly Mapped Reads | False Positive | False Negative | ||||
| Num. | % | Num. | % | Num. | % | Num. | % | |||||
| Human | Bismark (v0.8.1) | -q –phred33-quals -n 2 -l 14 | 5.54 | ∼10.5 | 10930929 | 67.63 | 10849359 | 67.13 | 795 | 0 | 5303277 | 31.04 |
| BSMAP (v2.74) | -s 14 -v 2 -p 1 -r 1 -R -u | 1.22 | ∼7.5 | 16161772 | 94.58 | 12489088 | 73.09 | 23 | 0 | 71662 | 0.42 | |
| WBSA | -n 2 -l 14 -k 2 | 1.42 | ∼6.3 | 16228389 | 94.97 | 12302379 | 72.00 | 264 | 0 | 5286 | 0.03 | |
| Mouse | Bismark (v0.8.1) | -q –phred33-quals -n 2 -l 14 | 1.52 | ∼7.1 | 5099599 | 68.3 | 5065633 | 67.87 | 206 | 0.06 | 1990768 | 26.67 |
| BSMAP (v2.74) | -s 14 -v 2 -p 1 -r 1 -R -u | 0.28 | ∼6.8 | 7054102 | 94.52 | 5603328 | 75.08 | 5 | 0 | 36064 | 0.48 | |
| WBSA | -n 2 -l 14 -k 2 | 0.63 | ∼6.1 | 7087675 | 94.97 | 5594941 | 74.97 | 51 | 0.01 | 2537 | 0.03 | |
Figure 3The performance of WBSA compared with a published study.
a. The percentage of methylcytosine identified in each sequence context. b. The methylcytosine density in Chr1. Each dot indicates the methylation density in a 10-kb window. c. Logo plots of sequences proximal to sites of DNA methylation in each sequence context. Logos are presented for all methylcytosines. Three or four bases flanking each methylcytosine context were analyzed to show the local sequence preference. d. Distribution of the methylation level in the CG context. The vertical axis indicates the fraction of methylated CGs for a corresponding methylation level (horizontal-axis) where the methylation level is defined as the mCG∶CG ratio at each reference cytosine in the CG context (at least 10× coverage is required).
Comparison of WBSA's RRBS module with three pipelines.
| Functions | WBSA-RRBS | SAAP-RRBS | RRBS-analyser | methylKit |
| Read-quality analysis | Y | Y | Y | N |
| Filter adaptor & low quality | Y | Y | Y | N |
| Computation of conversion rate | Y | N | Y | N |
| Alignment | Y | Y | Y | N |
| Focus on non-CGs | Y | N | Y | N |
| Methylation level | Y | Y | Y | Y |
| Methylation distribution | Y | N | Y | N |
| Relationship of methylation and CpG islands | Y | N | N | Y |
| Gene annotation | Y | Y | N | Y |
| Functional analysis of genes with high or low methylation | Y | N | N | N |
| Sequence preference | Y | N | N | N |
| Correlation between methylation and gene expression | Y | N | N | N |
| Online version | Y | Y | N | Y |
| Standalone version | Y | N | Y | N |
| PBS version | Y | N | N | N |