Literature DB >> 30052755

MinIONQC: fast and simple quality control for MinION sequencing data.

R Lanfear¹, M Schalamun¹, D Kainer¹, W Wang¹, B Schwessinger¹.

Abstract

Summary: MinIONQC provides rapid diagnostic plots and quality control data from one or more flowcells of sequencing data from Oxford Nanopore Technologies' MinION instrument. It can be used to assist with the optimisation of extraction, library preparation, and sequencing protocols, to quickly and directly compare the data from many flowcells, and to provide publication-ready figures summarising sequencing data. Availability and implementation: MinIONQC is implemented in R and released under an MIT license. It is available for all platforms from https://github.com/roblanf/minion_qc.

Entities: Gene Species

Mesh：

Year: 2019 PMID： 30052755 PMCID： PMC6361240 DOI： 10.1093/bioinformatics/bty654

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Oxford Nanopore Technologies’ (ONT) small and portable MinION instrument is revolutionising DNA sequencing. It allows users to go from sample to sequence in hours, it can sequence extremely long DNA molecules, and it provides many gigabases of data from each flowcell. Because of this, many research groups and companies are adopting the instrument for in-house and in-field sequencing. Here we present MinIONQC: a fast, lightweight, and non-interactive script to provide quality control and diagnostic analyses of sequencing data from the MinION. MinIONQC differs from related tools (De Coster ; Loman and Quinlan, 2014; Stewart and Watson, 2017; Watson ) in that it is focussed primarily on the rapid and replicable comparison of large volumes of sequencing data from multiple flowcells. MinIONQC will assist with cases where the rapid and repeated comparison of data from multiple flowcells is required, including the application of MinION sequencing in new use cases (e.g. with new tissues or in new settings), and in completing large genome projects which require the aggregation of data from many flowcells (Austin ; Jain ; Jansen ; Schmidt ; Tan ).

2 Software description

MinIONQC is written in R and designed to be run non-interactively from the command line. This facilitates automation of the script on all platforms, including in bioinformatics pipelines run on remote servers. MinIONQC is packaged as a single lightweight script that will work on all platforms that run R. It requires minimal installation and has just a small number of dependencies that can be installed in under a minute (Davis, 2018; Dowle and Srinivasan, 2018; Garnier, 2018; Lee and Rowe, 2016; Stephens ; Wickham, 2007, 2009, 2011, 2017; Wickham ). It has extensive documentation, a full test suite, and example input and output files available at https://github.com/roblanf/minion_qc. On a standard desktop computer with four processors, it is capable of analysing output from 24 flowcells, which produced a combined 107GB of sequencing data, in 25 min.

3 Quality control of a single flowcell

For each flowcell, MinIONQC outputs a human- and machine-readable summary file in YAML format. This file contains information on the total number of sequenced bases and reads, as well as a number of widely-used statistics of read lengths and quality scores, including the number of reads and bases from ‘ultra-long’ reads, defined as the largest set of reads with an N50 greater than 100 KB (Jain ). All statistics are calculated for the complete dataset, as well as for the subset of reads that pass a user-defined quality score cutoff. MinIONQC produces ten plots for each flowcell. These include standard plots such as the distributions of read lengths and quality scores, the number of reads generated per hour, and the total yield of bases over time. MinIONQC also produces plots designed to assist with optimisation of laboratory procedures for subsequent sequencing runs such as a physical map of the flowcell including every sequenced read, which facilitates rapid diagnosis of common issues such as bubbles introduced during library loading, and the presence of contaminants which block pores on the flowcell during sequencing (Fig. 1A).

Fig. 1.

Three example plots from MinionQC. (A) A physical map of the flowcell with each of the 512 pores shown in their physical location. The sub-plot for each pore shows a single point for each read, with the length on the y axis (log scale), the number of hours into the run on the x-axis, and the quality score of the read as the colour. This plot clearly shows the presence of a bubble causing many of the pores on the right-hand-side of the plot to produce little or no data, as well as the presence of contaminants blocking the pores, leading to the production of a large number of small, low-quality signals as the run progresses; (B) Yield in bases (y-axis) against run time (x-axis) for two flowcells (each in a different colour), with the yield of all reads shown in the upper panel, and the yield of reads with a mean Q score above the user-specified threshold of 7 in the lower panel, vertical red dashed lines indicate the timing of group changes (also known as muxes); (C) Yield in bases (y-axis) for a given minimum read length (x-axis), for two flowcells (each in a different colour), panels are as in B (Color version of this figure is available at Bioinformatics online.)

4 Comparing and combining data from multiple flowcells

Many projects, such as those that seek to assemble large or repeat-rich genomes, require the aggregation of data from many flowcells. MinIONQC simplifies the assessment of such data by allowing users to run the script on a single parent directory that contains multiple ‘sequencing_summary.txt’ files (produced by ONT’s Albacore and Guppy basecallers) in sub-directories. The resulting diagnostics simplify the management of larger projects by making it easy to assess the point at which sufficient data have been generated to move from sequencing to downstream analyses such as genome assembly. MinIONQC produces two kinds of plots when given multiple flowcells as input: plots of the combined data that are directly comparable to those produced for a single flowcell (see above); and plots designed to compare the flowcells to each other. The six comparison plots include distributions of read lengths and quality scores, the changes in both quantities over the course of each sequencing run, the total yield of bases over time (Fig. 1B), and the total yield of bases by minimum read length (Fig. 1C). The latter plot is particularly useful in comparing the effects of different DNA extraction, cleanup, and library-preparation methods on the final distribution of read lengths. For example, Figure 1C shows data from one flowcell (RB7_A2, in red) in which DNA was size-selected using a Blue Pippin instrument, and another (RB7_D3, in blue) in which DNA was size selected using a bead-based protocol (Schalamun and Schwessinger, 2017). Both approaches produced similar total yields of high-quality reads (roughly 3.5 gigabases, as shown by the point at which each line in Fig. 1C crosses the y-axis) but the yield of reads greater than 20KB was clearly higher when using the Blue Pippin, as shown by the red line in Figure 1C being higher than the blue line at a value of 20KB on the x axis.

5 Conclusion

MinIONQC is a fast and efficient script to analyse the output from ONT’s MinION instrument. We hope that it will be useful to the community, and will facilitate further improvements and developments in the ways that the MinION is used.

Funding

This work was supported by Australian Research Council grants to R.M.L and B.S. Conflict of Interest: none declared.

9 in total

1. De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing.

Authors: Maximilian H-W Schmidt; Alexander Vogel; Alisandra K Denton; Benjamin Istace; Alexandra Wormit; Henri van de Geest; Marie E Bolger; Saleh Alseekh; Janina Maß; Christian Pfaff; Ulrich Schurr; Roger Chetelat; Florian Maumus; Jean-Marc Aury; Sergey Koren; Alisdair R Fernie; Dani Zamir; Anthony M Bolger; Björn Usadel
Journal: Plant Cell Date: 2017-10-12 Impact factor: 11.277

2. poRe: an R package for the visualization and analysis of nanopore sequencing data.

Authors: Mick Watson; Marian Thomson; Judith Risse; Richard Talbot; Javier Santoyo-Lopez; Karim Gharbi; Mark Blaxter
Journal: Bioinformatics Date: 2014-08-29 Impact factor: 6.937

3. Poretools: a toolkit for analyzing nanopore sequence data.

Authors: Nicholas J Loman; Aaron R Quinlan
Journal: Bioinformatics Date: 2014-08-20 Impact factor: 6.937

4. Rapid de novo assembly of the European eel genome from nanopore sequencing reads.

Authors: Hans J Jansen; Michael Liem; Susanne A Jong-Raadsen; Sylvie Dufour; Finn-Arne Weltzien; William Swinkels; Alex Koelewijn; Arjan P Palstra; Bernd Pelster; Herman P Spaink; Guido E van den Thillart; Ron P Dirks; Christiaan V Henkel
Journal: Sci Rep Date: 2017-08-03 Impact factor: 4.379

5. Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors: Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal: Nat Biotechnol Date: 2018-01-29 Impact factor: 54.908

6. De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read.

Authors: Christopher M Austin; Mun Hua Tan; Katherine A Harrisson; Yin Peng Lee; Laurence J Croft; Paul Sunnucks; Alexandra Pavlova; Han Ming Gan
Journal: Gigascience Date: 2017-08-01 Impact factor: 6.524

7. poRe GUIs for parallel and real-time processing of MinION sequence data.

Authors: Robert D Stewart; Mick Watson
Journal: Bioinformatics Date: 2017-07-15 Impact factor: 6.937

8. Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly.

Authors: Mun Hua Tan; Christopher M Austin; Michael P Hammer; Yin Peng Lee; Laurence J Croft; Han Ming Gan
Journal: Gigascience Date: 2018-03-01 Impact factor: 6.524

9. NanoPack: visualizing and processing long-read sequencing data.

Authors: Wouter De Coster; Svenn D'Hert; Darrin T Schultz; Marc Cruts; Christine Van Broeckhoven
Journal: Bioinformatics Date: 2018-08-01 Impact factor: 6.937

9 in total

31 in total

1. Nanopore Sequencing Reveals Novel Targets for Detection and Surveillance of Human and Avian Influenza A Viruses.

Authors: Cyril Chik-Yan Yip; Wan-Mui Chan; Jonathan Daniel Ip; Claudia Win-May Seng; Kit-Hang Leung; Rosana Wing-Shan Poon; Anthony Chin-Ki Ng; Wai-Lan Wu; Hanjun Zhao; Kwok-Hung Chan; Gilman Kit-Hang Siu; Timothy Ting-Leung Ng; Vincent Chi-Chung Cheng; Kin-Hang Kok; Kwok-Yung Yuen; Kelvin Kai-Wang To
Journal: J Clin Microbiol Date: 2020-04-23 Impact factor: 5.948

2. Novel thermophilic polyhydroxyalkanoates producing strain Aneurinibacillus thermoaerophilus CCM 8960.

Authors: Jana Musilova; Xenie Kourilova; Iva Pernicova; Matej Bezdicek; Martina Lengerova; Stanislav Obruca; Karel Sedlar
Journal: Appl Microbiol Biotechnol Date: 2022-06-27 Impact factor: 4.813

Review 3. Nanopore sequencing technology, bioinformatics and applications.

Authors: Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal: Nat Biotechnol Date: 2021-11-08 Impact factor: 54.908

4. Phage Genome Annotation: Where to Begin and End.

Authors: Anastasiya Shen; Andrew Millard
Journal: Phage (New Rochelle) Date: 2021-12-16

5. BoardION: real-time monitoring of Oxford Nanopore sequencing instruments.

Authors: Aimeric Bruno; Jean-Marc Aury; Stefan Engelen
Journal: BMC Bioinformatics Date: 2021-05-13 Impact factor: 3.169

6. Complete Assembly of Escherichia coli Sequence Type 131 Genomes Using Long Reads Demonstrates Antibiotic Resistance Gene Variation within Diverse Plasmid and Chromosomal Contexts.

Authors: Arun Gonzales Decano; Catherine Ludden; Theresa Feltwell; Kim Judge; Julian Parkhill; Tim Downing
Journal: mSphere Date: 2019-05-08 Impact factor: 4.389

7. Rapid and Cost-Efficient Enterovirus Genotyping from Clinical Samples Using Flongle Flow Cells.

Authors: Carole Grädel; Miguel Angel Terrazos Miani; Maria Teresa Barbani; Stephen L Leib; Franziska Suter-Riniker; Alban Ramette
Journal: Genes (Basel) Date: 2019-08-29 Impact factor: 4.096

Review 8. Towards population-scale long-read sequencing.

Authors: Wouter De Coster; Matthias H Weissensteiner; Fritz J Sedlazeck
Journal: Nat Rev Genet Date: 2021-05-28 Impact factor: 53.242

9. Identification of nsp1 gene as the target of SARS-CoV-2 real-time RT-PCR using nanopore whole-genome sequencing.

Authors: Wan-Mui Chan; Jonathan Daniel Ip; Allen Wing-Ho Chu; Cyril Chik-Yan Yip; Lap-Sum Lo; Kwok-Hung Chan; Anthony Chin-Ki Ng; Rosana Wing-Shan Poon; Wing-Kin To; Owen Tak-Yin Tsang; Wai-Shing Leung; Mike Yat-Wah Kwan; Gilbert T Chua; Tom Wai-Hin Chung; Ivan Fan-Ngai Hung; Kin-Hang Kok; Vincent Chi-Chung Cheng; Jasper Fuk-Woo Chan; Kwok-Yung Yuen; Kelvin Kai-Wang To
Journal: J Med Virol Date: 2020-06-19 Impact factor: 20.693

10. Draft Genome Sequence of Streptomyces sp. Strain C8S0, Isolated from a Highly Oligotrophic Sediment.

Authors: S Gallegos-Lopez; P M Mejia-Ponce; L A Gonzalez-Salazar; L Rodriguez-Orduña; V Souza-Saldivar; C Licona-Cassani
Journal: Microbiol Resour Announc Date: 2020-04-02