Jacob Schreiber1, Kevin Karplus1. 1. Nanopore Group, Department of Biomolecular Engineering, University of California Santa Cruz, CA 95064, USA.
Abstract
MOTIVATION: Nanopore-based sequencing techniques can reconstruct properties of biosequences by analyzing the sequence-dependent ionic current steps produced as biomolecules pass through a pore. Typically this involves alignment of new data to a reference, where both reference construction and alignment have been performed by hand. RESULTS: We propose an automated method for aligning nanopore data to a reference through the use of hidden Markov models. Several features that arise from prior processing steps and from the class of enzyme used can be simply incorporated into the model. Previously, the M2MspA nanopore was shown to be sensitive enough to distinguish between cytosine, methylcytosine and hydroxymethylcytosine. We validated our automated methodology on a subset of that data by automatically calculating an error rate for the distinction between the three cytosine variants and show that the automated methodology produces a 2-3% error rate, lower than the 10% error rate from previous manual segmentation and alignment. AVAILABILITY AND IMPLEMENTATION: The data, output, scripts and tutorials replicating the analysis are available at https://github.com/UCSCNanopore/Data/tree/master/Automation.
MOTIVATION: Nanopore-based sequencing techniques can reconstruct properties of biosequences by analyzing the sequence-dependent ionic current steps produced as biomolecules pass through a pore. Typically this involves alignment of new data to a reference, where both reference construction and alignment have been performed by hand. RESULTS: We propose an automated method for aligning nanopore data to a reference through the use of hidden Markov models. Several features that arise from prior processing steps and from the class of enzyme used can be simply incorporated into the model. Previously, the M2MspA nanopore was shown to be sensitive enough to distinguish between cytosine, methylcytosine and hydroxymethylcytosine. We validated our automated methodology on a subset of that data by automatically calculating an error rate for the distinction between the three cytosine variants and show that the automated methodology produces a 2-3% error rate, lower than the 10% error rate from previous manual segmentation and alignment. AVAILABILITY AND IMPLEMENTATION: The data, output, scripts and tutorials replicating the analysis are available at https://github.com/UCSCNanopore/Data/tree/master/Automation.
Authors: Jacob Schreiber; Zachary L Wescoe; Robin Abu-Shumays; John T Vivian; Baldandorj Baatar; Kevin Karplus; Mark Akeson Journal: Proc Natl Acad Sci U S A Date: 2013-10-28 Impact factor: 11.205
Authors: Gerald M Cherf; Kate R Lieberman; Hytham Rashid; Christopher E Lam; Kevin Karplus; Mark Akeson Journal: Nat Biotechnol Date: 2012-02-14 Impact factor: 54.908
Authors: Elizabeth A Manrao; Ian M Derrington; Mikhail Pavlenok; Michael Niederweis; Jens H Gundlach Journal: PLoS One Date: 2011-10-04 Impact factor: 3.240
Authors: Shinsuke Ito; Li Shen; Qing Dai; Susan C Wu; Leonard B Collins; James A Swenberg; Chuan He; Yi Zhang Journal: Science Date: 2011-07-21 Impact factor: 47.728
Authors: Jacob H Forstater; Kyle Briggs; Joseph W F Robertson; Jessica Ettedgui; Olivier Marie-Rose; Canute Vaz; John J Kasianowicz; Vincent Tabard-Cossa; Arvind Balijepalli Journal: Anal Chem Date: 2016-11-15 Impact factor: 6.986
Authors: Prabhat Tripathi; Morgan Chandler; Christopher Michael Maffeo; Ali Fallahi; Amr Makhamreh; Justin Halman; Aleksei Aksimentiev; Kirill A Afonin; Meni Wanunu Journal: Nanoscale Date: 2022-05-16 Impact factor: 8.307
Authors: Pay Giesselmann; Björn Brändl; Etienne Raimondeau; Rebecca Bowen; Christian Rohrandt; Rashmi Tandon; Helene Kretzmer; Günter Assum; Christina Galonska; Reiner Siebert; Ole Ammerpohl; Andrew Heron; Susanne A Schneider; Julia Ladewig; Philipp Koch; Bernhard M Schuldt; James E Graham; Alexander Meissner; Franz-Josef Müller Journal: Nat Biotechnol Date: 2019-11-18 Impact factor: 54.908