| Literature DB >> 33176691 |
Nick Vereecke1,2, Jade Bokma3, Freddy Haesebrouck4, Hans Nauwynck5,6, Filip Boyen4, Bart Pardon3, Sebastiaan Theuns5,6.
Abstract
BACKGROUND: Implementation of Third-Generation Sequencing approaches for Whole Genome Sequencing (WGS) all-in-one diagnostics in human and veterinary medicine, requires the rapid and accurate generation of consensus genomes. Over the last years, Oxford Nanopore Technologies (ONT) released various new devices (e.g. the Flongle R9.4.1 flow cell) and bioinformatics tools (e.g. the in 2019-released Bonito basecaller), allowing cheap and user-friendly cost-efficient introduction in various NGS workflows. While single read, overall consensus accuracies, and completeness of genome sequences has been improved dramatically, further improvements are required when working with non-frequently sequenced organisms like Mycoplasma bovis. As an important primary respiratory pathogen in cattle, rapid M. bovis diagnostics is crucial to allow timely and targeted disease control and prevention. Current complete diagnostics (including identification, strain typing, and antimicrobial resistance (AMR) detection) require combined culture-based and molecular approaches, of which the first can take 1-2 weeks. At present, cheap and quick long read all-in-one WGS approaches can only be implemented if increased accuracies and genome completeness can be obtained.Entities:
Keywords: Basecalling; Genome assembly; Long-read sequencing; Mycoplasma bovis; Nanopore sequencing
Mesh:
Year: 2020 PMID: 33176691 PMCID: PMC7661149 DOI: 10.1186/s12859-020-03856-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Implementation of custom-pg45 Bonito training in reference-based (a) and de novo genome assembly bioinformatic workflows (b). The newly generated custom-pg45 Bonito training model was implemented in both reference-based (a) and de novo assembly (b) bioinformatics workflow. While for MiSeq short reads (purple), UniCycler (SPAdes-based) de novo assembler was used, long reads were used in a bioinformatics pipeline with either the Canu (orange) or Flye (blue) de novo assembler, supplemented with or without four rounds of Racon polishing. c MiSeq sequencing results in a highly gapped de novo assembled M. bovis PG45 genome as compared to MinION long read assemblies. A completeness of 100% indicates all genomic markers (n = 226 for Mycoplasma spp.) were present. A 100% Genome Fraction indicates the full M. bovis PG45 type strain genome was covered
Fig. 2Validation of the specificity of taxon-specific custom-pg45 Bonito basecalling using a E. coli ATCC 25922 and b nine additional M. bovis field strains. a Performance of the M. bovis PG45 custom-trained Bonito model was tested in comparison with E. coli ATCC 25922, only showing taxon-specific superior performance. Dotted lines represent predicted gene numbers in M. bovis PG45 and E. coli ATCC 25922 reference genomes, respectively. b Extrapolation of the custom-pg45 implementation to nine additional M. bovis field strains, shows overall increased performance for all strains in comparison to default Guppy basecalling. As a matter of validation of the UniCycler consensus genomes as “true” references, the M. bovis PG45 (NC_014760.1) was included in the analysis and indicated in grey. A completeness of 100% indicates all genomic markers (n = 226 and n = 1212 for Mycoplasma spp. and Escherichia spp., respectively) were present. A 100% Genome Fraction indicates the full UniCycler MiSeq M. bovis field strain genome was covered
Fig. 3Comparative analysis for the implementation of cheaper single-use Flongle flow cells, using the custom-pg45 model. a Overall Flongle consensus accuracy Q-scores are lower as compared to the MinION sequencing platform. This is also reflected in lowered genome completeness and predicted gene numbers. b In-depth analyses of decreased Flongle performance is suggested to be related to increased numbers of indels (per 100 kbp), showing little differences in insertion Q-score, though a decrease of accuracy for reported deletions is observed. A completeness of 100% indicates all genomic markers (n = 226 for Mycoplasma spp.) were present. A 100% Genome Fraction indicates the full UniCycler MiSeq M. bovis field strain genome was covered