Literature DB >> 28334395

poRe GUIs for parallel and real-time processing of MinION sequence data.

Robert D Stewart1, Mick Watson1,2.   

Abstract

MOTIVATION: Oxford Nanopore's MinION device has matured rapidly and is now capable of producing over one million reads and several gigabases of sequence data per run. The nature of the MinION output requires new tools that are easy to use by scientists with a range of computational skills and which enable quick and simple QC and data extraction from MinION runs.
RESULTS: We have developed two GUIs for the R package poRe that allow parallel and real-time processing of MinION datasets. Both GUIs are capable of extracting sequence- and meta- data from large MinION datasets via a friendly point-and-click interface using commodity hardware.
AVAILABILITY AND IMPLEMENTATION: The GUIs are packaged within poRe which is available on SourceForge: https://sourceforge.net/projects/rpore/files/ . CONTACT: mick.watson@roslin.ed.ac.uk.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28334395      PMCID: PMC5870607          DOI: 10.1093/bioinformatics/btx136

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Nanopore sequencing is the only sequencing technology that measures an actual single molecule of DNA, rather than incorporation events into a template strand (Goodwin ; Loman and Watson, 2015). Early access to Oxford Nanopore’s MinION, a portable DNA sequencer approximately six inches in length, began in 2014. The MinION may be considered a mature platform, having been used to sequence bacterial genomes (Loman ; Risse ); resolve repeats in the human genome (Jain ); study cDNA structure (Hargreaves and Mulley, 2015; Bolisetty ); detect base modifications (Rand ; Karlsson ; Stoiber ); detect antibiotic resistance (Ashton ); perform real-time enrichment (‘read until’; Loose ) and provide surveillance in a human disease outbreak (Quick ). The latest chemistry release, R9.4, has seen the first high-coverage human genome data released (https://github.com/nanopore-wgs-consortium/NA12878; https://github.com/nanoporetech/ONT-HG1), with several MinION flowcells from the two projects producing over 4 gigabases (Gb) of sequence data. The MinION has been designed to enable mobile, real-time sequencing. As soon as a sequencing library is placed onto the device, the MinION begins sequencing. Each channel/nanopore reports asynchronously, creating a single file per channel per read. These are created in HDF5, a compressed binary hierarchical data format (https://www.hdfgroup.org/). Depending on the sequencer and chemistry version, these HDF5 files include raw or event-level signal data, recorded as a DNA molecule passed through the pore. There are a range of base-calling options, including cloud-based Metrichor, local MinKNOW base-calling and open-source alternatives (David ; Boža ), that will convert the signal data into DNA sequences. With 512 pores and a sequencing speed of several hundred bases-per-second, each MinION flowcell has the capacity to produce several million reads in a 48-hour run. As each read presents as two files (one raw, one base-called) MinION runs represent huge challenges for researchers without sufficient computational skills. Tools exist, such as poRe(Watson ) and poretools (Loman and Quinlan, 2014), to assist with this, but many are command-line based, and there is a need for easy-to-use, GUI-based tools for MinION data QC and analysis.

2 Materials and methods

We have designed and built two graphical-user-interfaces (GUIs) for MinON data processing, organization and extraction. Both are built as Shiny apps and released as part of the package poRe (Watson ). At present the original poRe and the new GUII code are separate, but we envisage merging the functions over time. Both are available through the R package poRe. The poRe real-time GUI is designed to extract data (FASTQ, FASTA and metadata) during a run, or during base-calling. A source and destination folder are required. The software then monitors the source folder for new FAST5 files; as FAST5 files arrive in the folder, they are processed, data are extracted and output to the destination folder. The poRe real-time GUI saves researchers a huge amount of time as data can be extracted while the MinION is running. The poRe real-time GUI is accessed by running the command pore_rt(). The pore parallel GUI (Fig. 1) is designed to extract data from runs that have already finished. Again, the software expects a source and destination folder; in addition, the user can select which data to extract, and the number of cores to use. The software then extracts FASTQ, FASTA and metadata from all files in the source folder into files in the destination folder; using the number of cores specified by the user, via the parallel package. The poRe parallel GUI is accessed via the pore_parallel() command.
Fig. 1.

Screenshot of the pore parallel GUI, which as a Shiny App will open in the user’s browser

Screenshot of the pore parallel GUI, which as a Shiny App will open in the user’s browser

3 Results

The poRe parallel GUI was able to simultaneously extract FASTQ, FASTA and metadata from 209 819 FAST5 files downloaded from the ‘cliveome’ project in just 37 min on our 16-core Linux server, at a rate of approx. 90 FAST5 files per second.

Funding

This work was supported by The Biotechnology and Biological Sciences Research Council (BBSRC) including institute strategic support to The Roslin Institute (BB/M020037/1, BB/J004243/1, BB/J004235/1, BBS/E/D/20310000). Conflict of Interest: The authors have received free flowcells and reagents from Oxford Nanopore as part of the MAP. Mick Watson has attended Oxford Nanopore events and had his travel paid for by ONT.
  15 in total

1.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.

Authors:  Philip M Ashton; Satheesh Nair; Tim Dallman; Salvatore Rubino; Wolfgang Rabsch; Solomon Mwaigwisya; John Wain; Justin O'Grady
Journal:  Nat Biotechnol       Date:  2014-12-08       Impact factor: 54.908

2.  Successful test launch for nanopore sequencing.

Authors:  Nicholas J Loman; Mick Watson
Journal:  Nat Methods       Date:  2015-04       Impact factor: 28.547

Review 3.  Coming of age: ten years of next-generation sequencing technologies.

Authors:  Sara Goodwin; John D McPherson; W Richard McCombie
Journal:  Nat Rev Genet       Date:  2016-05-17       Impact factor: 53.242

4.  Scaffolding of a bacterial genome using MinION nanopore sequencing.

Authors:  E Karlsson; A Lärkeryd; A Sjödin; M Forsman; P Stenberg
Journal:  Sci Rep       Date:  2015-07-07       Impact factor: 4.379

5.  poRe: an R package for the visualization and analysis of nanopore sequencing data.

Authors:  Mick Watson; Marian Thomson; Judith Risse; Richard Talbot; Javier Santoyo-Lopez; Karim Gharbi; Mark Blaxter
Journal:  Bioinformatics       Date:  2014-08-29       Impact factor: 6.937

6.  Poretools: a toolkit for analyzing nanopore sequence data.

Authors:  Nicholas J Loman; Aaron R Quinlan
Journal:  Bioinformatics       Date:  2014-08-20       Impact factor: 6.937

7.  Real-time selective sequencing using nanopore technology.

Authors:  Matthew Loose; Sunir Malla; Michael Stout
Journal:  Nat Methods       Date:  2016-07-25       Impact factor: 28.547

8.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads.

Authors:  Vladimír Boža; Broňa Brejová; Tomáš Vinař
Journal:  PLoS One       Date:  2017-06-05       Impact factor: 3.240

9.  A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data.

Authors:  Judith Risse; Marian Thomson; Sheila Patrick; Garry Blakely; Georgios Koutsovoulos; Mark Blaxter; Mick Watson
Journal:  Gigascience       Date:  2015-12-04       Impact factor: 6.524

10.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data.

Authors:  Matei David; L J Dursi; Delia Yao; Paul C Boutros; Jared T Simpson
Journal:  Bioinformatics       Date:  2016-09-10       Impact factor: 6.937

View more
  3 in total

1.  NanoR: A user-friendly R package to analyze and compare nanopore sequencing data.

Authors:  Davide Bolognini; Niccolò Bartalucci; Alessandra Mingrino; Alessandro Maria Vannucchi; Alberto Magi
Journal:  PLoS One       Date:  2019-05-09       Impact factor: 3.240

2.  MinIONQC: fast and simple quality control for MinION sequencing data.

Authors:  R Lanfear; M Schalamun; D Kainer; W Wang; B Schwessinger
Journal:  Bioinformatics       Date:  2019-02-01       Impact factor: 6.937

3.  Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery.

Authors:  Robert D Stewart; Marc D Auffret; Amanda Warr; Alan W Walker; Rainer Roehe; Mick Watson
Journal:  Nat Biotechnol       Date:  2019-08-02       Impact factor: 54.908

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.