| Literature DB >> 21565797 |
Pavlo Lutsik1, Lars Feuerbach, Julia Arand, Thomas Lengauer, Jörn Walter, Christoph Bock.
Abstract
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21565797 PMCID: PMC3125748 DOI: 10.1093/nar/gkr312
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.BiQ Analyzer HT workflow. Bisulfite sequencing data are generated either for the entire genome or selectively for a defined set of genomic loci using commercially available high-throughput sequencers (A). To reduce sequencing cost, bisulfite-converted DNA from several samples and/or loci is typically barcoded and combined into a single sequencing run. The multiplexed read data are separated and converted into FASTA or BAM files using vendor-provided software and/or custom scripts (B), before they are loaded into BiQ Analyzer HT (C). Once the data have been loaded, the user sets alignment and quality control parameters (D), runs the analysis and inspects the inferred DNA methylation data (E) and adjusts the parameters until satisfactory results are obtained. Finally, the DNA methylation measurements can be visualized graphically (F) and exported as tab-separated tables (G) for in-depth analysis using spreadsheets such as Excel, statistical software such as R/Bioconductor and biomarker development tools such as MethMarker (H).
Analysis results generated by BiQ Analyzer HT
| Category | Title | Access | Format | Description |
|---|---|---|---|---|
| Tabular | Project summary | GUI | TSV | Basic information summarizing the analysis |
| Sample summary | GUI | TSV | DNA methylation summary for each locus in each sample | |
| Results table | OD | TSV | Alignment quality, estimated bisulfite conversion rate and DNA methylation summary for each sequencing read | |
| Methylation pattern table | GUI | TSV | DNA methylation patterns for each sequencing read. Columns correspond to DNA methylation sites (typically CpG positions) | |
| Project results table | GUI | TSV | Combined results table for all samples and loci | |
| Graphical | Methylation pattern map | OD, GUI | PNG | Heatmap-style representation of DNA methylation patterns for each sequencing read. Columns correspond to DNA methylation sites |
| Methylation profile | OD, GUI | PNG, SVG | Diagram visualizing the frequency of methylated, unmethylated and missing-value observations for each DNA methylation site | |
| Project methylation heatmap | GUI | PNG | Heatmap of mean DNA methylation levels for each locus in each sample | |
| Methylation profile heatmap | GUI | PNG | Heatmap of mean DNA methylation levels for each DNA methylation site at a specific locus | |
| Sequence | Alignment | OD | FASTA | Multiple alignment of sequencing reads for each locus in each sample |
| Filtered reads | OD | FASTA | Sequences of all reads that passed quality filtering |
aThe data table from which the heatmap is generated can also be exported for follow-up analysis.
OD (‘output directory’)–the item is written to the project output directory tree; GUI (‘graphical user interface’)–the item can be exported via ‘Save as … ’ or ‘Copy to clipboard’ in the corresponding context menu; FASTA–sequencing reads in multiple-sequence text format; PNG–images in Portable Network Graphics format; SVG–images in Scalable Vector Graphics format; TSV–tab-separated value tables.
Performance comparison of software packages for locus-specific analysis of bisulfite sequencing data
| Region | Read count | Performance | |||||||
|---|---|---|---|---|---|---|---|---|---|
| BiQ Analyzer 2.0 | QUMA | BiQ Analyzer HT | BiQ Analyzer HT | ||||||
| Memory | ET | Memory | ET | Memory | ET | Memory | ET | ||
| RE1 | 400 | 350 | 300 | NA | 10 | 95 | 30 | 1000 | 6 |
| RE2 | 1054 | 500 | 911 | NA | 25 | 200 | 50 | 1000 | 9 |
| RE3 | 3150 | >1000 | 3455 | NA | 70 | 200 | 95 | 1000 | 16 |
| RE3 | 10 000 | NA | NA | 323 | 300 | 285 | 1500 | 50 | |
| RE3 | 100 000 | NA | NA | NA | 3500 | 440 | |||
| RE3 | 1 000 000 | NA | NA | NA | 10 000 | 1940 | |||
All tests, except for the cases noted explicitly, were run on a standard laptop with dual-core processor and 2 GB main memory. The values of peak memory usage are given in MB. ET (‘execution time’) denotes the total duration of the analysis in seconds.
aThe QUMA web-server running on a high-performance machine (8 dual-core processors, 16 GB main memory).
bBiQ Analyzer HT running on a high-performance machine (8 dual-core processors, 16 GB main memory).
cMemory usage of the web-server does not affect performance for the end user.
dThe data set was obtained by concatenating multiple copies of the initial set of reads obtained for RE3.
eThe calculation could not be finished due to an error.
fThe calculation failed because it exceeded the web-server's maximum read threshold.
gTests with BiQ Analyzer HT for the last two read sets were performed only on the high-performance computer.