| Literature DB >> 28582401 |
Vladimír Boža1, Broňa Brejová1, Tomáš Vinař1.
Abstract
The MinION device by Oxford Nanopore produces very long reads (reads over 100 kBp were reported); however it suffers from high sequencing error rate. We present an open-source DNA base caller based on deep recurrent neural networks and show that the accuracy of base calling is much dependent on the underlying software and can be improved by considering modern machine learning methods. By employing carefully crafted recurrent neural networks, our tool significantly improves base calling accuracy on data from R7.3 version of the platform compared to the default base caller supplied by the manufacturer. On R9 version, we achieve results comparable to Nanonet base caller provided by Oxford Nanopore. Availability of an open source tool with high base calling accuracy will be useful for development of new applications of the MinION device, including infectious disease detection and custom target enrichment during sequencing.Entities:
Mesh:
Year: 2017 PMID: 28582401 PMCID: PMC5459436 DOI: 10.1371/journal.pone.0178751
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Raw signal from MinION and its segmentation to events.
The plot was generated from the E. coli data (http://www.ebi.ac.uk/ena/data/view/ERR1147230).
Accuracy of base callers on two R7.3 testing data sets.
The results of base calling were aligned to the reference using BWA-MEM [28]. The accuracy was computed as the number of matches in the alignment divided by the length of the alignment.
| Metrichor | 71.3% | 68.1% |
| Nanocall | 68.3% | 67.5% |
| DeepNano | 77.9% | 76.3% |
| Metrichor | 71.4% | 69.5% |
| Nanocall | 68.5% | 68.4% |
| DeepNano | 76.4% | 75.7% |
| Metrichor | 86.8% | 84.8% |
| DeepNano | 88.5% | 86.7% |
Fig 2Schematics of a bidirectional recurrent neural network.
Sizes of experimental data sets.
The sizes differ between strands because only base calls mapping to the reference were used. Note that the counts of 2D events are based on the size of the alignment.
| # of template reads | 3,803 | 3,942 | 13,631 |
| # of template events | 26,403,434 | 26,860,314 | 70,827,021 |
| # of complement reads | 3,820 | 3,507 | 13,734 |
| # of complement events | 24,047,571 | 23,202,959 | 67,330,241 |
| # of 2D reads | 10,278 | 9,292 | 14,550 |
| # of 2D events | 84,070,837 | 75,998,235 | 93,571,823 |
Fig 3DeepNano reduces bias in 6-mer composition.
Comparison of 6-mer content in Klebsiella reference genome and base-called reads by Metrichor (left) and DeepNano (right). From top to bottom: template, complement, 2D.
Fig 4Abudances for repetitive 6-mers.
Accuracy and running time on R9 data.
The results of base calling were aligned to the reference using BWA-MEM [28]. The first column reports the percentage of reads that aligned to the reference on at least 90% of their length. The accuracy was computed as the number of matches in the alignment divided by the length of the alignment. The speed is measured in events per second.
| Aligned reads | Accuracy | Speed | |
|---|---|---|---|
| Nanonet | 83.2% | 83.2% | 2057 ev/s |
| DeepNano (100 hidden units) | 81.1% | 81.0% | 4716 ev/s |
| DeepNano (50 hidden units) | 79.3% | 78.5% | 7142 ev/s |