Christian Otto1, Peter F Stadler, Steve Hoffmann. 1. Interdisciplinary Center for Bioinformatics and Bioinformatics Group, Department of Computer Science, University Leipzig, 04107 Leipzig, Germany.
Abstract
MOTIVATION: Cytosine DNA methylation is one of the major epigenetic modifications and influences gene expression, developmental processes, X-chromosome inactivation, and genomic imprinting. Aberrant methylation is furthermore known to be associated with several diseases including cancer. The gold standard to determine DNA methylation on genome-wide scales is 'bisulfite sequencing': DNA fragments are treated with sodium bisulfite resulting in the conversion of unmethylated cytosines into uracils, whereas methylated cytosines remain unchanged. The resulting sequencing reads thus exhibit asymmetric bisulfite-related mismatches and suffer from an effective reduction of the alphabet size in the unmethylated regions, rendering the mapping of bisulfite sequencing reads computationally much more demanding. As a consequence, currently available read mapping software often fails to achieve high sensitivity and in many cases requires unrealistic computational resources to cope with large real-life datasets. RESULTS: In this study, we present a seed-based approach based on enhanced suffix arrays in conjunction with Myers bit-vector algorithm to efficiently extend seeds to optimal semi-global alignments while allowing for bisulfite-related substitutions. It outperforms most current approaches in terms of sensitivity and performs time-competitive in mapping hundreds of millions of sequencing reads to vertebrate genomes. AVAILABILITY: The software segemehl is freely available at http://www.bioinf.uni-leipzig.de/Software/segemehl.
MOTIVATION: Cytosine DNA methylation is one of the major epigenetic modifications and influences gene expression, developmental processes, X-chromosome inactivation, and genomic imprinting. Aberrant methylation is furthermore known to be associated with several diseases including cancer. The gold standard to determine DNA methylation on genome-wide scales is 'bisulfite sequencing': DNA fragments are treated with sodium bisulfite resulting in the conversion of unmethylated cytosines into uracils, whereas methylated cytosines remain unchanged. The resulting sequencing reads thus exhibit asymmetric bisulfite-related mismatches and suffer from an effective reduction of the alphabet size in the unmethylated regions, rendering the mapping of bisulfite sequencing reads computationally much more demanding. As a consequence, currently available read mapping software often fails to achieve high sensitivity and in many cases requires unrealistic computational resources to cope with large real-life datasets. RESULTS: In this study, we present a seed-based approach based on enhanced suffix arrays in conjunction with Myers bit-vector algorithm to efficiently extend seeds to optimal semi-global alignments while allowing for bisulfite-related substitutions. It outperforms most current approaches in terms of sensitivity and performs time-competitive in mapping hundreds of millions of sequencing reads to vertebrate genomes. AVAILABILITY: The software segemehl is freely available at http://www.bioinf.uni-leipzig.de/Software/segemehl.
Authors: Helene Kretzmer; Stephan H Bernhart; Wei Wang; Andrea Haake; Marc A Weniger; Anke K Bergmann; Matthew J Betts; Enrique Carrillo-de-Santa-Pau; Gero Doose; Jana Gutwein; Julia Richter; Volker Hovestadt; Bingding Huang; Daniel Rico; Frank Jühling; Julia Kolarova; Qianhao Lu; Christian Otto; Rabea Wagener; Judith Arnolds; Birgit Burkhardt; Alexander Claviez; Hans G Drexler; Sonja Eberth; Roland Eils; Paul Flicek; Siegfried Haas; Michael Humme; Dennis Karsch; Hinrik H D Kerstens; Wolfram Klapper; Markus Kreuz; Chris Lawerenz; Dido Lenzek; Markus Loeffler; Cristina López; Roderick A F MacLeod; Joost H A Martens; Marta Kulis; José Ignacio Martín-Subero; Peter Möller; Inga Nage; Simone Picelli; Inga Vater; Marius Rohde; Philip Rosenstiel; Maciej Rosolowski; Robert B Russell; Markus Schilhabel; Matthias Schlesner; Peter F Stadler; Monika Szczepanowski; Lorenz Trümper; Hendrik G Stunnenberg; Ralf Küppers; Ole Ammerpohl; Peter Lichter; Reiner Siebert; Steve Hoffmann; Bernhard Radlwimmer Journal: Nat Genet Date: 2015-10-05 Impact factor: 38.330