| Literature DB >> 28327913 |
Sion C Bayliss, Vicky L Hunt, Maho Yokoyama, Harry A Thorpe, Edward J Feil.
Abstract
Background: The Oxford Nanopore Technologies MinION(TM) is a mobile DNA sequencer that can produce long read sequences with a short turn-around time. Here we report the first demonstration of single contig genome assembly using Oxford Nanopore native barcoding when applied to a multiplexed library of 12 samples and combined with existing Illumina short read data. This paves the way for the closure of multiple bacterial genomes from a single MinION(TM) sequencing run, given the availability of existing short read data. The strain we used, MHO_001, represents the important community-acquired methicillin-resistant Staphylococcus aureus lineage USA300. Findings: Using a hybrid assembly of existing short read and barcoded long read sequences from multiplexed data, we completed a genome of the S. aureus USA300 strain MHO_001. The long read data represented only ∼5% to 10% of an average MinION(TM) run (∼7x genomic coverage), but, using standard tools, this was sufficient to complete the circular chromosome of S. aureus strain MHO_001 (2.86 Mb) and two complete plasmids (27 Kb and 3 Kb). Minor differences were noted when compared to USA300 reference genome, USA300_FPR3757, including the translocation, loss, and gain of mobile genetic elements.Entities:
Keywords: MinION; Staphylococcus aureus; Whole genome sequencing; bacterial genomics; hybrid assembly; long read; multiplexing; native barcoding
Mesh:
Substances:
Year: 2017 PMID: 28327913 PMCID: PMC5467021 DOI: 10.1093/gigascience/gix001
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1.Figure summarizing read statistics for the 2D nanopore pass (red) and fail (green) reads. (A) Read length distributions of pass and fail reads. Data were binned every 500 bp. (B) Box and whisker plot of the sequence similarity of nanopore reads to the genome of MHO_001 as determined by BLASR. Only the alignment with the highest percentage similarity was considered for each read. The lower and upper “hinges” correspond to the first and third quartiles. The upper and lower whiskers extend from the hinge to the most extreme value that is within 1.5× interquartile range. Data beyond the end of the whiskers are outliers and plotted as points. (C) The distribution of BLASR alignment lengths of nanopore reads as a percentage of the original read length. Only the alignment with the highest percentage similarity was considered for each read. Nanopore 2D reads with a phred score >8 were classified by Metrichor as pass reads (blue), and all other 2D reads were classified as fail reads (blue).
Table summarizing the BLASR analysis of semultiplexed 2D pass and fail nanopore long reads assigned to sample MHO_001. Reads were aligned to the assembled MHO_001 reference genome using BLASR with default parameters. Only the alignment with the highest percentage similarity was considered for each read. The average alignment length was calculated from the length of the top BLASR alignment relative to the length of the input read
| Pass | Fail | |
|---|---|---|
| # Reads | 1324 | 1499 |
| # BLASR hits (% # reads) | 1320 (99.70) | 1292 (86.19) |
| Mean alignment length (%) | 96.79 | 92.90 |
| Mean similarity (%) | 85.87 | 77.76 |
| # Hits <75% read length (%) | 11 (0.83) | 93 (7.20) |
| # Hits ≧75% read length (%) | 1309 (99.17) | 1199 (92.80) |
Figure 2.Alignment of MHO_001 chromosome (A), plasmid A (B), and plasmid B (C) to the USA300_FPR3757 genome and reference plasmids alongside long and short read coverage. The bottom panels show alignments between MHO_001 and the reference sequences. Contiguous sequences are shown by connecting red lines and inversions are depicted in blue. Coding sequences (CDS) are annotated as blue rectangles with the exception of ribosomal RNA operons, which are represented by red rectangles. Those above the line represent open reading frames on the forward strand and those under the line on the reverse strand. Notable mobile genetic elements or genomic features are annotated. A scale bar in bp is presented underneath each contig. The middle panels represent per base read coverage of short reads across the MHO_001 genome. The data was binned every 1000 bp. The y-axis, representing per bin read coverage, has been constrained to 200, 350, and 8000 reads per bin for the MHO_001 chromosome, plasmid A, and plasmid B, respectively. The top panel represents the per base read coverage of nanopore long reads across the MHO_001 genome. The data was binned every 1000 bp. The y-axis, representing per bin read coverage, has been constrained to 20 reads per bin for each contig.