| Literature DB >> 32442274 |
Yifei Xu1,2, Fan Yang-Turner1,2, Denis Volk1,2, Derrick Crook1,2.
Abstract
Metagenomic sequencing combined with Oxford Nanopore Technology has the potential to become a point-of-care test for infectious disease in public health and clinical settings, providing rapid diagnosis of infection, guiding individual patient management and treatment strategies, and informing infection prevention and control practices. However, publicly available, streamlined, and reproducible pipelines for analyzing Nanopore metagenomic sequencing data are still lacking. Here we introduce NanoSPC, a scalable, portable and cloud compatible pipeline for analyzing Nanopore sequencing data. NanoSPC can identify potentially pathogenic viruses and bacteria simultaneously to provide comprehensive characterization of individual samples. The pipeline can also detect single nucleotide variants and assemble high quality complete consensus genome sequences, permitting high-resolution inference of transmission. We implement NanoSPC using Nextflow manager within Docker images to allow reproducibility and portability of the analysis. Moreover, we deploy NanoSPC to our scalable pathogen pipeline platform, enabling elastic computing for high throughput Nanopore data on HPC cluster as well as multiple cloud platforms, such as Google Cloud, Amazon Elastic Computing Cloud, Microsoft Azure and OpenStack. Users could either access our web interface (https://nanospc.mmmoxford.uk) to run cloud-based analysis, monitor process, and visualize results, as well as download Docker images and run command line to analyse data locally.Entities:
Year: 2020 PMID: 32442274 PMCID: PMC7319573 DOI: 10.1093/nar/gkaa413
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A schematic overview of NanoSPC. NanoSPC is a scalable, portable, and cloud compatible pipeline that can analyse the Nanopore metagenomic sequencing data. NanoSPC can identify potentially pathogenic viruses and bacteria simultaneously to provide comprehensive characterization of individual samples. The pipeline can also detect single nucleotide variants and assemble high quality complete consensus genome sequences. NanoSPC uses Nextflow pipeline manager and packs all the software dependencies within Docker images (red). NanoSPC can be deployed into the scalable pathogen pipeline platform, enabling elastic computing for high throughput Nanopore data on multiple cloud platforms (green). NanoSPC can be accessed via a web interface to run cloud-based analysis as well as Docker images to analyze data locally (blue).
Figure 2.Example showing cloud-based analysis of Nanopore metagenomic sequencing data via NanoSPC. (A) and (B) Web interfaces for executing data analysis and real-time monitoring of the progress. (C) Statistical summary of the data quality. (D) Taxonomic assignment of sequencing reads, percentage of bacterial and viral reads. (E) Genome coverage by mapping sequencing reads to reference sequences. (F) Execution time for each process in the pipeline.