Literature DB >> 21493656

Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.

Felix Krueger1, Simon R Andrews.   

Abstract

SUMMARY: A combination of bisulfite treatment of DNA and high-throughput sequencing (BS-Seq) can capture a snapshot of a cell's epigenomic state by revealing its genome-wide cytosine methylation at single base resolution. Bismark is a flexible tool for the time-efficient analysis of BS-Seq data which performs both read mapping and methylation calling in a single convenient step. Its output discriminates between cytosines in CpG, CHG and CHH context and enables bench scientists to visualize and interpret their methylation data soon after the sequencing run is completed.
AVAILABILITY AND IMPLEMENTATION: Bismark is released under the GNU GPLv3+ licence. The source code is freely available from www.bioinformatics.bbsrc.ac.uk/projects/bismark/.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21493656      PMCID: PMC3102221          DOI: 10.1093/bioinformatics/btr167

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Cytosine methylation of DNA serves as an important epigenetic mechanism to control gene expression, silencing or genomic imprinting both during development and in the adult (Law and Jacobsen, 2010). Aberrant methylation has been associated with a variety of diseases, including cancer (Robertson, 2005). Current massively parallel sequencing methods to study DNA methylation include enrichment-based methods such as methylated DNA immunoprecipitation (MeDIP-Seq) or methylated DNA binding domain sequencing (MBD-Seq), as well as direct sequencing of sodium bisulfite-treated DNA (BS-Seq) [methods compared in (Harris )]. Bisulfite treatment of DNA leaves methylated cytosines unaffected, while non-methylated cytosines are converted into uracils. Subsequent PCR amplification converts these uracils into thymines. For any given genomic locus, bisulfite treatment and subsequent PCR amplification give rise to four individual strands of DNA which can potentially all end up in a sequencing experiment (Supplementary Material). Mapping of bisulfite-treated sequences to a reference genome constitutes a significant computational challenge due to the combination of: (i) the reduced complexity of the DNA code; (ii) up to four DNA strands to be analysed; and (iii) the fact that each read can theoretically exist in all possible methylation states. Even though there are a number of excellent short read mapping tools available, e.g. Bowtie (Langmead ), these do not perform bisulfite mapping themselves.

2 SOFTWARE DESCRIPTION AND DISCUSSION

Bisulfite libraries are of two distinct types (Chen ): in the first scenario the sequencing library is generated in a directional manner, i.e. the actual sequencing reads will correspond to a bisulfite converted version of either the original forward or reverse strand (Lister ). In a second scenario, strand specificity is not preserved, which means all four possible bisulfite DNA strands are sequenced at roughly the same frequency (Cokus ; Popp ). As the strand identity of a bisulfite read is a priori unknown, our bisulfite mapping tool Bismark aims to find a unique alignment by running four alignment processes simultaneously. First, bisulfite reads are transformed into a C-to-T and G-to-A version (equivalent to a C-to-T conversion on the reverse strand). Then, each of them is aligned to equivalently pre-converted forms of the reference genome using four parallel instances of the short read aligner Bowtie (Fig. 1A). This read mapping enables Bismark to uniquely determine the strand origin of a bisulfite read. Consequently, Bismark can handle BS-Seq data from both directional and non-directional libraries. Since residual cytosines in the sequencing read are converted in silico into a fully bisulfite-converted form before the alignment takes place, mapping performed in this manner handles partial methylation accurately and in an unbiased manner.
Fig. 1.

Bismark's approach to bisulfite mapping and methylation calling. (A) Reads from a BS-Seq experiment are converted into a C-to-T and a G-to-A version and are then aligned to equivalently converted versions of the reference genome. A unique best alignment is then determined from the four parallel alignment processes [in this example, the best alignment has no mismatches and comes from thread (1)]. (B) The methylation state of positions involving cytosines is determined by comparing the read sequence with the corresponding genomic sequence. Depending on the strand a read mapped against this can involve looking for C-to-T (as shown here) or G-to-A substitutions.

Bismark's approach to bisulfite mapping and methylation calling. (A) Reads from a BS-Seq experiment are converted into a C-to-T and a G-to-A version and are then aligned to equivalently converted versions of the reference genome. A unique best alignment is then determined from the four parallel alignment processes [in this example, the best alignment has no mismatches and comes from thread (1)]. (B) The methylation state of positions involving cytosines is determined by comparing the read sequence with the corresponding genomic sequence. Depending on the strand a read mapped against this can involve looking for C-to-T (as shown here) or G-to-A substitutions. A similar approach was demonstrated to work well for single-end reads with the tool BS Seeker, which was developed independently of Bismark (Chen ). BS Seeker outperformed earlier generation BS-Seq mapping programs such as BSMAP, RMAP-bs or MAQ in terms of mapping efficiency, accuracy and required CPU time. Even though the principle of both tools is similar, Bismark offers a number of advantages over BS Seeker which are summarized in Table 1. For a test dataset [15 million reads taken from SRR020138 (Lister ), trimmed to 50 bp, mapped to the human genome build NCBI36, one mismatch allowed], a direct comparison of the two tools returned a very similar number of alignments in a similar time scale [aligned reads/mapping efficiency/CPU time: 9 633 448/64.2%/42 min (Bismark); 9 664 184/64.4%/29 min (BS Seeker)]. Due to the way Bismark determines uniquely best alignments, it is less likely to report non-unique alignments; however, this comes at the cost of a slightly increased run time (for details see Supplementary Material).
Table 1.

Feature comparison of Bismark and BS Seeker

FeatureBismarkBS Seeker
Bowtie instances (directional/non-directional)42/4
Single-end (SE)/paired-end (PE) supportYes/yesyes/no
Variable read length (SE/PE)Yes/yesno/NA
Adjustable insert size (PE)YesNA
Uses basecall qualities for FastQ mappingYesNo
Adjustable mapping parameters52
Directional/non-directional library supportYes/yesYes/yesa

aRequires library to be constructed with an initial sequence tag (Cokus ). NA: not available.

Feature comparison of Bismark and BS Seeker aRequires library to be constructed with an initial sequence tag (Cokus ). NA: not available. Many previous BS-Seq programs were solely mapping applications, which meant that extracting the underlying methylation data required a lot of post-processing and computational knowledge. Bismark aims to generate a bisulfite mapping output that can be readily explored by bench scientists. Thus, in addition to the alignment process Bismark determines the methylation state of each cytosine position in the read (Fig. 1B). DNA methylation in mammals is thought to occur predominantly at CpG dinucleotides; however, a certain amount of non-CpG methylation has been shown in embryonic stem cells (Lister ). In plants, methylation is quite common in both the symmetric CpG or CHG, and asymmetric CHH context (whereby H can be either A, T or C) (Feng ; Law and Jacobsen, 2010). To enable methylation analysis in different sequence contexts and/or model organisms, methylation calls in Bismark take the surrounding sequence context into consideration and discriminate between cytosines in CpG, CHG and CHH context. The primary mapping output of Bismark contains one line per read and shows a number of useful pieces of information such as mapping position, alignment strand, the bisulfite read sequence, its equivalent genomic sequence and a methylation call string (Supplementary Material). This mapping output can be subjected to post-processing (Supplementary Material) or can be used to extract the methylation information at individual cytosine positions. This secondary methylation-state output can be generated using a flexible methylation extractor component that accompanies Bismark. The methylation output discriminates between sequence context (CpG, CHG or CHH) and can be obtained in either a comprehensive (all alignment strands merged) or alignment strand-specific format. The latter can be very useful to study asymmetric methylation (hemi- or CHH methylation) in a strand-specific manner. The output of the methylation extractor will create one entry (or line) per cytosine, whereby the strand information is used to encode its methylation state: ‘+’ indicates a methylated and ‘−’ a non-methylated cytosine. This output can be converted into other alignment formats such as SAM/BAM, or imported into genome browsers, such as SeqMonk, where it can be visualized and further explored by the researcher without requiring additional computational expertise.

3 CONCLUSIONS

We present Bismark, a software package to map and determine the methylation state of BS-Seq reads. Bismark is easy to use, very flexible and is the first published BS-Seq aligner to seamlessly handle single- and paired-end mapping of both directional and non-directional bisulfite libraries. The output of Bismark is easy to interpret and is intended to be analysed directly by the researcher performing the experiment. Funding: Biotechnology and Biological Sciences Research Council (BBSRC). Conflict of Interest: none declared.
  9 in total

1.  Conservation and divergence of methylation patterning in plants and animals.

Authors:  Suhua Feng; Shawn J Cokus; Xiaoyu Zhang; Pao-Yang Chen; Magnolia Bostick; Mary G Goll; Jonathan Hetzel; Jayati Jain; Steven H Strauss; Marnie E Halpern; Chinweike Ukomadu; Kirsten C Sadler; Sriharsa Pradhan; Matteo Pellegrini; Steven E Jacobsen
Journal:  Proc Natl Acad Sci U S A       Date:  2010-04-15       Impact factor: 11.205

Review 2.  DNA methylation and human disease.

Authors:  Keith D Robertson
Journal:  Nat Rev Genet       Date:  2005-08       Impact factor: 53.242

Review 3.  Establishing, maintaining and modifying DNA methylation patterns in plants and animals.

Authors:  Julie A Law; Steven E Jacobsen
Journal:  Nat Rev Genet       Date:  2010-03       Impact factor: 53.242

4.  BS Seeker: precise mapping for bisulfite sequencing.

Authors:  Pao-Yang Chen; Shawn J Cokus; Matteo Pellegrini
Journal:  BMC Bioinformatics       Date:  2010-04-23       Impact factor: 3.169

5.  Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency.

Authors:  Christian Popp; Wendy Dean; Suhua Feng; Shawn J Cokus; Simon Andrews; Matteo Pellegrini; Steven E Jacobsen; Wolf Reik
Journal:  Nature       Date:  2010-02-25       Impact factor: 49.962

6.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors:  Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-03-04       Impact factor: 13.583

7.  Human DNA methylomes at base resolution show widespread epigenomic differences.

Authors:  Ryan Lister; Mattia Pelizzola; Robert H Dowen; R David Hawkins; Gary Hon; Julian Tonti-Filippini; Joseph R Nery; Leonard Lee; Zhen Ye; Que-Minh Ngo; Lee Edsall; Jessica Antosiewicz-Bourget; Ron Stewart; Victor Ruotti; A Harvey Millar; James A Thomson; Bing Ren; Joseph R Ecker
Journal:  Nature       Date:  2009-10-14       Impact factor: 49.962

8.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.

Authors:  R Alan Harris; Ting Wang; Cristian Coarfa; Raman P Nagarajan; Chibo Hong; Sara L Downey; Brett E Johnson; Shaun D Fouse; Allen Delaney; Yongjun Zhao; Adam Olshen; Tracy Ballinger; Xin Zhou; Kevin J Forsberg; Junchen Gu; Lorigail Echipare; Henriette O'Geen; Ryan Lister; Mattia Pelizzola; Yuanxin Xi; Charles B Epstein; Bradley E Bernstein; R David Hawkins; Bing Ren; Wen-Yu Chung; Hongcang Gu; Christoph Bock; Andreas Gnirke; Michael Q Zhang; David Haussler; Joseph R Ecker; Wei Li; Peggy J Farnham; Robert A Waterland; Alexander Meissner; Marco A Marra; Martin Hirst; Aleksandar Milosavljevic; Joseph F Costello
Journal:  Nat Biotechnol       Date:  2010-09-19       Impact factor: 54.908

9.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning.

Authors:  Shawn J Cokus; Suhua Feng; Xiaoyu Zhang; Zugen Chen; Barry Merriman; Christian D Haudenschild; Sriharsa Pradhan; Stanley F Nelson; Matteo Pellegrini; Steven E Jacobsen
Journal:  Nature       Date:  2008-02-17       Impact factor: 49.962

  9 in total
  1677 in total

1.  RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing.

Authors:  Yuanxin Xi; Christoph Bock; Fabian Müller; Deqiang Sun; Alexander Meissner; Wei Li
Journal:  Bioinformatics       Date:  2011-12-06       Impact factor: 6.937

2.  Loss of H3K27me3 Imprinting in Somatic Cell Nuclear Transfer Embryos Disrupts Post-Implantation Development.

Authors:  Shogo Matoba; Huihan Wang; Lan Jiang; Falong Lu; Kumiko A Iwabuchi; Xiaoji Wu; Kimiko Inoue; Lin Yang; William Press; Jeannie T Lee; Atsuo Ogura; Li Shen; Yi Zhang
Journal:  Cell Stem Cell       Date:  2018-07-19       Impact factor: 24.633

3.  Oryza sativa RNA-Dependent RNA Polymerase 6 Contributes to Double-Strand Break Formation in Meiosis.

Authors:  Changzhen Liu; Yi Shen; Baoxiang Qin; Huili Wen; Jiawen Cheng; Fei Mao; Wenqing Shi; Ding Tang; Guijie Du; Yafei Li; Yufeng Wu; Zhukuan Cheng
Journal:  Plant Cell       Date:  2020-07-30       Impact factor: 11.277

4.  DNA methylation in promoter regions of genes involved in the reproductive and metabolic function of children born to women with PCOS.

Authors:  Bárbara Echiburú; Fermín Milagro; Nicolás Crisosto; Francisco Pérez-Bravo; Cristian Flores; Ana Arpón; Francisca Salas-Pérez; Sergio E Recabarren; Teresa Sir-Petermann; Manuel Maliqueo
Journal:  Epigenetics       Date:  2020-04-20       Impact factor: 4.528

5.  Neuronal brain-region-specific DNA methylation and chromatin accessibility are associated with neuropsychiatric trait heritability.

Authors:  Lindsay F Rizzardi; Peter F Hickey; Varenka Rodriguez DiBlasi; Rakel Tryggvadóttir; Colin M Callahan; Adrian Idrizi; Kasper D Hansen; Andrew P Feinberg
Journal:  Nat Neurosci       Date:  2019-01-14       Impact factor: 24.884

6.  Bromate-induced Changes in p21 DNA Methylation and Histone Acetylation in Renal Cells.

Authors:  Ramya T Kolli; Travis C Glenn; Bradley T Brown; Sukhneeraj P Kaur; Lillie M Barnett; Lawrence H Lash; Brian S Cummings
Journal:  Toxicol Sci       Date:  2019-04-01       Impact factor: 4.849

7.  Inferring and modeling inheritance of differentially methylated changes across multiple generations.

Authors:  Pascal Belleau; Astrid Deschênes; Marie-Pier Scott-Boyer; Romain Lambrot; Mathieu Dalvai; Sarah Kimmins; Janice Bailey; Arnaud Droit
Journal:  Nucleic Acids Res       Date:  2018-08-21       Impact factor: 16.971

Review 8.  Integrating omics technologies to study pulmonary physiology and pathology at the systems level.

Authors:  Ravi Ramesh Pathak; Vrushank Davé
Journal:  Cell Physiol Biochem       Date:  2014-04-28

9.  Peripubertal serum dioxin concentrations and subsequent sperm methylome profiles of young Russian adults.

Authors:  J Richard Pilsner; Alex Shershebnev; Yulia A Medvedeva; Alexander Suvorov; Haotian Wu; Andrey Goltsov; Evgeny Loukianov; Tatiana Andreeva; Fedor Gusev; Andrey Manakhov; Luidmila Smigulina; Maria Logacheva; Victoria Shtratnikova; Irina Kuznetsova; Peter Speranskiy-Podobed; Jane S Burns; Paige L Williams; Susan Korrick; Mary M Lee; Evgeny Rogaev; Russ Hauser; Oleg Sergeyev
Journal:  Reprod Toxicol       Date:  2018-03-14       Impact factor: 3.143

10.  DNA methylome and transcriptome alterations and cancer prevention by curcumin in colitis-accelerated colon cancer in mice.

Authors:  Yue Guo; Renyi Wu; John M Gaspar; Davit Sargsyan; Zheng-Yuan Su; Chengyue Zhang; Linbo Gao; David Cheng; Wenji Li; Chao Wang; Ran Yin; Mingzhu Fang; Michael P Verzi; Ronald P Hart; Ah-Ng Kong
Journal:  Carcinogenesis       Date:  2018-05-03       Impact factor: 4.944

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.