| Literature DB >> 35443610 |
Don Neumann1, Anireddy S N Reddy2, Asa Ben-Hur3.
Abstract
BACKGROUND: Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart.Entities:
Keywords: Convolutional networks; Long read sequencing; Oxford nanopore; RNA basecalling
Mesh:
Substances:
Year: 2022 PMID: 35443610 PMCID: PMC9020074 DOI: 10.1186/s12859-022-04686-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1The RODAN architecture. The normalized signal is passed through a succession of convolutional blocks which gradually incorporate surrounding information. Each block is composed of several processing steps (convolution, activation, batch normalization etc.), which are standard building blocks in the construction of deep neural networks. The final output is passed through a fully connected layer to produce the decoded sequence of nucleotides
Basecalling accuracy computed using percent identity and number of unaligned reads across datasets for Guppy 4.4.0, Taiyaki 5.0, and RODAN 1.0
| Dataset | Basecaller | Median Accuracy | Unaligned |
|---|---|---|---|
| Human [ | Guppy | 90.60 | N/A |
| Taiyaki | 91.16 | 900 | |
| RODAN | 1307 | ||
| Mouse [ | Guppy | 87.65 | N/A |
| Taiyaki | 86.25 | 3079 | |
| RODAN | 2819 | ||
| Arabidopsis [ | Guppy | 91.59 | N/A |
| Taiyaki | 91.10 | 957 | |
| RODAN | 1001 | ||
| Poplar [ | Guppy | 90.16 | N/A |
| Taiyaki | 89.72 | 1598 | |
| RODAN | 1652 | ||
| Yeast [ | Guppy | 91.35 | N/A |
| Taiyaki | 90.01 | 2721 | |
| RODAN | 3035 |
For each dataset, the best result is shown in bold
Each dataset contains 100, 000 reads. Only reads alignable by Guppy were used to build each dataset, hence the N/A for unaligned reads. Additional basecalling statistics are provided in Additional file 1: Table S2
Fig. 2Read statistics. For each of the five datasets we show histograms of read length in (a), and basecalling calling accuracy as a function of read length