| Literature DB >> 34172000 |
Valentine Murigneux1, Leah W Roberts2,3,4, Brian M Forde5, Minh-Duy Phan6, Nguyen Thi Khanh Nhu6, Adam D Irwin5,7, Patrick N A Harris5,8, David L Paterson5, Mark A Schembri6, David M Whiley5,7, Scott A Beatson9,10.
Abstract
BACKGROUND: Oxford Nanopore Technology (ONT) long-read sequencing has become a popular platform for microbial researchers due to the accessibility and affordability of its devices. However, easy and automated construction of high-quality bacterial genomes using nanopore reads remains challenging. Here we aimed to create a reproducible end-to-end bacterial genome assembly pipeline using ONT in combination with Illumina sequencing.Entities:
Keywords: Assembly; Bacteria; Nanopore; ONT; Pipeline; Polishing; Sequence
Mesh:
Year: 2021 PMID: 34172000 PMCID: PMC8235852 DOI: 10.1186/s12864-021-07767-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1overall diagram of assembly stages and tool comparisons
Basecalling comparison: run-times, read accuracy and overall assembly accuracy
| Guppy3.4.3_hac | Guppy3.4.3_fast | Guppy3.4.3_hac_modbases | Guppy3.6.1_hac | Guppy3.6.1_hac_ modbases | |
|---|---|---|---|---|---|
| | 49,707,952 | 176,906,144 | 57,479,661 | 57,977,178 | 46,296,565 |
| | 13.81 | 49.14 | 15.96 | 16.10 | 12.86 |
| | GPU | CPU | GPU | GPU | GPU |
| | 4 | 16 | 8 | 8 | 8 |
| | 91.0 | 88.9 | 90.6 | 93.7 | 91.0 |
| | 11.4 | 10.4 | 11.3 | 13.3 | 11.4 |
| | 240,766 | 233,802 | 238,847 | 244,830 | 240,156 |
| | 99.99 | 99.99 | 99.99 | 99.99 | 99.99 |
| | 23 | 35 | 3 | 4 | 5 |
| | 45 | 39 | 31 | 25 | 27 |
| | 48.10 | 48.08 | 50.99 | 52.27 | 51.83 |
| | 0.44 | 0.67 | 0.06 | 0.08 | 0.10 |
| | 0.88 | 0.76 | 0.63 | 0.50 | 0.53 |
Fig. 2Assembly comparison: long horizonal bars (in greyscale and red) represent contiguous sequences generated by each assembler. The chromosome and plasmids 1 and 2 are coloured according to their overall nucleotide identity when compared to the EC958 reference genome standard (indicated by the scale on the left). Plasmid 3 was only recovered when assembling with Flye and Canu, as indicated. The “other” column refers to contigs that were generated by assemblers but were redundant to the assembly (coloured red). The additional blue horizontal bars in the Canu and Redbean assemblies represent the increased size of the contigs from these assemblers. Contigs that were not reported as circular are marked with a red asterisk(*), while contigs that required manual trimming for circularisation are marked with a blue asterisk. Misassemblies are marked with a red vertical line at their approximate position. The phage tail protein inversion is marked with a blue hourglass
Fig. 3Polishing results for EC958 ONT Flye assembly: Comparative analysis of (i) long read polishing only, (ii) short read polishing only, and (iii) sequential long read and short read polishing, using various tool combinations. Comparison metrics were the number of SNPs/indels to the EC958 reference genome standard (by DNAdiff), run time and quality score (by Poxomis assess_assembly)
Fig. 4Overall pipeline: Stages and default tools in MicroPIPE. Stages in bold and italics are mandatory. All other pipeline steps are optional (users can start from fast5 or basecalled fastq files). Time for running each step is provided based on running 12 multiplexed E. coli samples with MicroPIPE v0.8. Basecalling (Guppy) and long-read polishing (Racon and Medaka) can be run on a GPU node. The rest of the pipeline is run using CPU resources. Fast = Guppy fast basecalling mode, hac = Guppy high accuracy basecalling mode. h = hour, min = minute
Fig. 5ST131 Phylogeny to assess quality of ONT assemblies: A Phylogenetic tree created using assemblies generated with MicroPIPE v0.8 (Guppy v3.4.3) and other ST131 genomes for context [17]. Branches are coloured based on the ST131 clade they belong to, as per [17] (Red = clade A, Orange = clade B, Green = clade C). dark blue: Complete polished assemblies from the MicroPIPE pipeline next to their Illumina assembly counterpart in the tree, light blue: assemblies with incomplete polishing (i.e. Illumina only, Nanopore only or no polishing) clustered with their Illumina counterpart, red: discrepant clustering of Nanopore assemblies. B Phylogenetic tree created using assemblies generated with MicroPIPE v0.9 (Guppy v3.6.1). Annotations are same as in A. C Position of alternative (alt) and reference (ref) alleles compared to the EC958 reference standard chromosome present on branch leading to discrepant ONT assemblies as indicated by the star in (A)
MicroPIPE v0.9 results for public datasets
| Reference | Strain | Reference genome assembly method and coverage | Chromosome/plasmid | Reference genome size (bps) | Assembly size (bps) | GC content (%) | Circular? | Nucleotide Identity (%) | DNAdiff SNPs | DNAdiff Indels | QUAST misassemblies |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Clement et al. [ | Canu using Nanopore + Illumina 37x | Chromosome pLC0541_17 | 4,679,033 90,558 | 4,679,747 90,578 | 52.2 | Yes Yes | 99.97 | 510 390 | 758 | 0 | |
| Sydenham et al. [ | Unicycler using Nanopore + Illumina 200x | Chromosome pBFO42_1 pBFO42_2 | 5,141,257 8306 5594 | 5,141,261 8316 5629 | 43.3 | Yes Yes Yes | 99.99 | 65 9 | 14 | 0 | |
| Sydenham et al. [ | Unicycler using Nanopore + Illumina 200x | Chromosome pBF9343 | 5,205,133 36,560 | 5,205,138 36,559 | 43.1 | Yes Yes | 99.99 | 25 5 | 22 | 1 (inversion) | |
| Walker et al. [ | Pacbio 105x | Chromosome | 1,878,827 | 1,878,922 | 38.5 | Yes | 99.99 | 8 6 | 96 | 0 | |
| Wick et al. [ | Unicycler using Nanopore + Illumina 133x | Chromosome | 5,111,537 | 5,111,663 | 57.6 | Yes | 99.99 | 137 72 | 172 | 0 | |
| Taylor et al. [ | strain FSIS11705876 | Unicycler using Nanopore + Illumina 692x | Chromosome pO157 | 5,483,434 94,581 | 5,483,452 94,593 | 50.4 | Yes Yes | 99.99 | 52 2 | 103 | 0 |
| Taylor et al. [ | Unicycler using Nanopore + Illumina 599x | Chromosome Plasmid | 4,724,806 81,814 | 4,724,797 81,815 | 52.2 | Yes Yes | 99.99 | 32 21 | 34 | 0 | |
SMRT Analysis v. 1.3.3 using PacBio RS 80x | Chromosome Plasmid | 4,730,612 78,193 | 99.99 | 0 0 | 15 | 3 | |||||
| Bessonov et al. [ | Unicycler using Nanopore + Illumina 50x | Chromosome Plasmid Plasmid | 4,640,729 | 4,640,715 105,679 98,127 | 51.7 | Yes Yes Yes | 99.99 | 53 15 | 22 | 0 | |
| Pitt et al [ | Unicycler using Nanopore + Illumina 40x | Chromosome | 5,592,065 | 5,592,075 | 62.8 | Yes | 99.99 | 28 5 | 9 | 0 | |
| Pitt et al. [ | Unicycler using Nanopore + Illumina 20x | Chromosome | 5,592,064 | 5,591,941 | 62.8 | Yes | 99.99 | 81 15 | 102 | 0 | |
| Sieber et al. [ | SPAdes using Nanopore + Illumina 334x | Chromosome Plasmid unnamed | 2,918,239 2473 | 2,918,243 3356 | 32.7 | Yes Yes | 99.99 | 6 6 | 5 | 0 | |
| Sieber et al. [ | SPAdes using Nanopore + Illumina 219x | Chromosome Plasmid unnamed | 2,877,083 2473 | 2,877,086 | 32.7 | Yes | 99.99 | 4 1 | 4 | 0 |
Flye was run using the --asm-coverage 100 parameter in order to reduce the computational run time. Only circular contigs are reported (as identified by Flye). For further details on all public data, see Supplementary dataset 1.