| Literature DB >> 32321433 |
Yao-Zhong Zhang1, Arda Akdemir1, Georg Tremmel1, Seiya Imoto1, Satoru Miyano1, Tetsuo Shibuya1, Rui Yamaguchi2,3,4.
Abstract
BACKGROUND: Nanopore sequencing is a rapidly developing third-generation sequencing technology, which can generate long nucleotide reads of molecules within a portable device in real-time. Through detecting the change of ion currency signals during a DNA/RNA fragment's pass through a nanopore, genotypes are determined. Currently, the accuracy of nanopore basecalling has a higher error rate than the basecalling of short-read sequencing. Through utilizing deep neural networks, the-state-of-the art nanopore basecallers achieve basecalling accuracy in a range from 85% to 95%. RESULT: In this work, we proposed a novel basecalling approach from a perspective of instance segmentation. Different from previous approaches of doing typical sequence labeling, we formulated the basecalling problem as a multi-label segmentation task. Meanwhile, we proposed a refined U-net model which we call UR-net that can model sequential dependencies for a one-dimensional segmentation task. The experiment results show that the proposed basecaller URnano achieves competitive results on the in-species data, compared to the recently proposed CTC-featured basecallers.Entities:
Keywords: Deep learning; Nanopore basecalling; UR-net
Mesh:
Substances:
Year: 2020 PMID: 32321433 PMCID: PMC7178565 DOI: 10.1186/s12859-020-3459-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overall pipeline of URnano basecaller. Block ① is the UR-net deep neural network. Block ② is the post-processing part that transforms the UR-net’s output to final basecalls
Fig. 2a is the histogram of homopolymer repeats from 400 E. coli and λ-phage reads. b, c andd are histograms of homopolymer repeats for real E.coli, λ-phage and Human Chr11 reference genome. Here, the single nucleotide is also treated as a “homopolymer" for reference
Fig. 3An example on merging basecalls of overlapped slide window using soft merging
NED of URnano with different network architectures for the non-overlapping window in the test set
| U-net | 0.3528 | 0.2448 |
| 3GRU | 0.2808 | 0.1631 |
| U-net+3GRU | 0.1800 | 0.1296 |
| UR-net | 0.1329 |
Results of read accuracy on the test set
| basecaller | |||||||
|---|---|---|---|---|---|---|---|
| E. coli | Chiron | 0.0692 | 0.0465 | 0.0600 | 0.8709 | 7/2000 | 0.8243 |
| URnano | 0.0533 | 8/2000 | 0.8476 | ||||
| Guppy _taiyaki | 0.0585 | 0.0436 | 0.8978 | 7/2000 | |||
| Chiron | 0.0799 | 0.0467 | 0.0641 | 0.8559 | 9/2000 | 0.8093 | |
| URnano | 0.0662 | 0.0455 | 10/2000 | ||||
| Guppy _taiyaki | 0.0481 | 0.8864 | 6/2000 | 0.8467 | |||
| Human | Chiron | 0.0983 | 0.0866 | 0.8151 | 385/1000 | 0.7464 | |
| URnano | 0.0957 | 0.0788 | 0.8316 | 375/1000 | 0.7528 | ||
| Guppy _taiyaki | 0.0748 | 0.0756 | 352/1000 |
Assembly results on the test set
| basecaller | |||
|---|---|---|---|
| E. coli | Chiron | 97.1995 | 99.0589 |
| URnano | 97.4994 | ||
| Guppy _taiyaki | 99.3644 | ||
| Chiron | 90.8901 | 99.8352 | |
| URnano | |||
| Guppy _taiyaki | 99.1734 | 99.3223 | |
| Human | Chiron | 92.0374 | |
| URnano | 91.2176 | 101.553 | |
| Guppy _taiyaki | 99.7087 |
The mean values of 10-round polished results are reported
Fig. 4A segmentation example of the URnano basecall for the sequence of “TACTTACTCAACAATGCGTTAAATTTCGACTGTTTA”. A dotted vertical line indicates the start position of a nucleotide segment
A brief summary of related deep-learning-based basecallers
| Raw | Raw | Raw | |
| CNN+RNN+CTC | RGRGR+CTC | UR-net | |
| bases | bases | base masks | |
| N/A | N/A | label transform |