| Literature DB >> 31234903 |
Ryan R Wick1, Louise M Judd2, Kathryn E Holt2,3.
Abstract
BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.Entities:
Keywords: Basecalling; Long-read sequencing; Oxford Nanopore
Mesh:
Year: 2019 PMID: 31234903 PMCID: PMC6591954 DOI: 10.1186/s13059-019-1727-y
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Read accuracy, consensus accuracy and speed performance for each basecaller version, plotted against the release date (version numbers specified in Additional file 2: Table S3). Accuracies are expressed as qscores (also known as Phred quality scores) on a logarithmic scale where Q10 = 90%, Q20 = 99%, Q30 = 99.9%, etc. Each basecaller was run using its default model, except for Guppy v2.2.3 which was also run with its included flip-flop model and our two custom-trained models
Fig. 2Read and consensus accuracy from Guppy v2.2.3 for a variety of genomes using different models: the default RGRGR model, the included flip-flop model and the two custom models we trained for this study. Both custom models used the same training set which focused primarily on K. pneumoniae, secondarily on the Enterobacteriaceae family and lastly on the Proteobacteria phylum
Fig. 3Consensus errors per basecaller for the K. pneumoniae benchmarking set, broken down by type. Dcm refers to errors occurring in the CCAGG/CCTGG Dcm motif. Homopolymer errors are changes in the length of a homopolymer three or more bases in length (in the reference). This plot is limited to basecallers/versions with less than 1.2% consensus error and excludes redundant results from similar versions
Fig. 4Consensus accuracy before (red) and after Nanopolish (blue) for the assemblies of K. pneumoniae benchmarking set