| Literature DB >> 30005597 |
Franka J Rang1, Wigard P Kloosterman2, Jeroen de Ridder3.
Abstract
Nanopore sequencing is a rapidly maturing technology delivering long reads in real time on a portable instrument at low cost. Not surprisingly, the community has rapidly taken up this new way of sequencing and has used it successfully for a variety of research applications. A major limitation of nanopore sequencing is its high error rate, which despite recent improvements to the nanopore chemistry and computational tools still ranges between 5% and 15%. Here, we review computational approaches determining the nanopore sequencing error rate. Furthermore, we outline strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30005597 PMCID: PMC6045860 DOI: 10.1186/s13059-018-1462-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Timeline of reported MinION read accuracies and Oxford Nanopore Technologies (ONT) technological developments. Nanopore chemistry updates and advances in base-caller software are represented as colored bars. The plotted accuracies are ordered on the basis of the chemistry and base-calling software used, not according to publication date. Based on data from 1 [9]; 2 [10]; 3 [50]; 4 [51]; 5 [33]; 6 [28]; 7 [52]; 8 [53]; 9 [54]; 10 [29]; 11 [31]; 12 [48]; 13 [46]; 14 [55]; 15 [11]; 16 [5]; 17 [13]; 18 [3]. HMM Hidden Markov Model, RNN Recurrent Neural Network
Fig. 2Overview of MinION nanopore sequencing. The left panel shows sources of errors during MinION sequencing and base calling. The right panel shows computational strategies that have been used to improve accuracy. HMM Hidden Markov Model, RNN Recurrent Neural Network
Explanation of technical terms
| Term | Description | Reference(s) |
|---|---|---|
| Beam search | A heuristic search algorithm. In Chiron, the beam search decoder with beam width | [ |
| Connectionist Temporal Classification (CTC) decoder | A type of neural network output and scoring for labeling sequence data with RNNs. It does not require presegmented training data and postprocessed outputs. | [ |
| Convolutional Neural Network (CNN) | A type of neural network often used for image analysis. It can recognize patterns by applying different filters to an image. | [ |
| Forward algorithm | An algorithm that computes the probability | [ |
| Hidden Markov Model (HMM) | A stochastic model that models a sequence of unobserved events underlying a sequence of observations. HMMs assume that an event only depends on the previous event. | [ |
| Long-short-term memory (LSTM) unit | A type of RNN that can be used as a building block in bigger networks. It has specific input, output, and forgot gates that allow it to retain or discard information that was passed on from a previous state. | [ |
| Partial Order Alignment (POA) graph | A graph representation of a multiple alignment that allows each base in the alignment to have multiple predecessors. Different paths through the graph represent different alignments. | [ |
| Recurrent Neural Network (RNN) | A type of neural network that takes information passed on from previous states into account. | [ |
| Training data | A dataset that is used to optimize (i.e., train) the parameters of a model. Training is required for both HMMs and RNNs. The training dataset thus determines the performance of the model. | [ |
| Viterbi decoding | An algorithm that finds the most likely sequence of events given a certain HMM. | [ |
Fig. 3Schematic overview of the algorithms underlying nanopore base callers. a Nanocall uses a Hidden Markov Model (HMM) for base calling. b DeepNano was the first base caller to use Recurrent Neural Networks (RNN). h1–h3 represent three hidden layers in the RNN. c BasecRAWller uses two RNNs, one to segment the raw measurements and one to infer k-mer probabilities. d Chiron makes use of a Convolutional Neural Network (CNN) to detect patterns in the data, followed by an RNN to predict k-mer probabilities, which are evaluated by a Connectionist Temporal Classification (CTC) decoder. LSTM long-short-term memory