| Literature DB >> 28868335 |
Abram Magner1, Jarosław Duda2, Wojciech Szpankowski3, Ananth Grama4.
Abstract
Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.Entities:
Year: 2016 PMID: 28868335 PMCID: PMC5575792 DOI: 10.1109/TMBMC.2016.2630056
Source DB: PubMed Journal: IEEE Trans Mol Biol Multiscale Commun ISSN: 2332-7804