| Literature DB >> 28369462 |
Abstract
Background: Virus discovery using high-throughput next-generation sequencing has become more commonplace. However, although analysis of deep next-generation sequencing data allows us to identity potential pathogens, the entire analytical procedure requires competency in the bioinformatics domain, which includes implementing proper software packages and preparing prerequisite databases. Simple and user-friendly bioinformatics pipelines are urgently required to obtain complete viral genome sequences from metagenomic data.Entities:
Keywords: bioinformatics; detect; metagenomics; next-generation sequencing (NGS); reconstruct
Mesh:
Year: 2017 PMID: 28369462 PMCID: PMC5466706 DOI: 10.1093/gigascience/gix003
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1.A schematic flowchart of drVM. CreateDB.py processes viral sequences to produce SNAP and BLAST databases. drVM.py analyzes NGS reads to produce viral genomes.
The feasibility of drVM in various types of clinical samples and public datasets
| drVM result (No. of run) | |||||
|---|---|---|---|---|---|
| Study description | Sequencing method | No. of runs | Ref | Detection | Reconstruction |
| Bioinformatics pipeline for ultrarapid pathogen identification | Illumina | 13 | [ | 6 | 0 |
| Complete viral RNA genome sequencing of ultra-low copy samples | Illumina | 34 | [ | 9 | 25 |
| Faecal virome of red foxes from peri-urban areas | Illumina | 8 | [ | 0 | 1 |
| Full genome virus detection in fecal samples | Illumina | 20 | [ | 0 | 11 |
| Human papillomavirus community in healthy persons | Illumina | 20[ | [ | 9 | 5 |
| Identification of hepatotropic viruses from plasma | Illumina | 14 | [ | 1 | 4 |
| Intestinal bacterial and RNA viral communities from sentinel birds | Ion Torrent | 8 | [ | 7 | 0 |
| Intestinal virome in healthy and diarrhoeic neonatal piglets | Ion Torrent | 29 | [ | 3 | 2 |
| Metagenomic analysis for severe acute respiratory infection | Illumina | 4 | [ | 0 | 4 |
| Metagenomic identification of viral pathogens in clinical samples | Illumina | 3 | [ | 1 | 1 |
| New viral sequences identified in Asian citrus psyllid | Illumina | 4 | [ | 1 | 3 |
| Novel adenovirus associated with baboon acute respiratory outbreak | Illumina | 4 | [ | 3 | 1 |
| Novel human pegivirus associated with hepatitis C virus co-infection | Illumina | 15 | [ | 11 | 2 |
| RNA sequencing in influenza virus-positive respiratory samples | Illumina | 49 | [ | 16 | 13 |
| RNA-seq of RNA viruses from faecal and blood samples | Illumina | 82 | [ | 6 | 76 |
| Simultaneous sequencing of multiple RNA virus genomes | Ion Torrent | 40 | [ | 33 | 4 |
| Viral genome-targeted assembly pipeline | Illumina | 1 | [ | 1 | 0 |
| Virus identification in unknown tropical febrile illness cases | Illumina | 1[ | [ | 1 | 0 |
aSelected runs with assembled contigs >1000 bp [21].
bA selected run with mapped viral reads >5000 [16].
Figure 2.Input and output explanation of drVM. (A) Input associated with the command-line interface. (B) Input associated with the graphical user interface. (C) Output file structure generated by drVM. (D) Coverage profiles produced by drVM.
Figure 3.drVM results for read mixture simulations of human enterovirus (CA16), human rhinovirus (HRV), human respiratory syncytial virus (HRSV), and reads from ERR690488. (A-D) 20X-40X-80X: 741 CA16, 1430 HRV, and 6090 HRSV reads. (E-H) 20X-20X-20X: 741 CA16, 715 HRV, and 1522 HRSV reads.
Various viral genomes reconstructed by drVM based on the corresponding NGS reads
| Virus type | Viral genome reconstruction by drVM |
|---|---|
| dsDNA | Human adenovirus (ERR233414), Human papillomavirus (ERR233428), Simian adenovirus (SRR766764) |
| ssDNA | Adeno-associated virus (SRR766763), Bovine parvovirus (SRR2010686), Fox circovirus (SRR1920168), Human bocavirus (SRR2010686), Torque teno mini virus (SRR2040553), Torque teno virus (SRR2010686) |
| dsRNA | Human picobirnavirus (ERR227885) |
| ssRNA (+) | Bovine viral diarrhea virus (SRR1170806), Chikungunya virus (SRR3180716), GB virus C (SRR544883), Hepatitis C virus (SRR2940645), Human pegivirus (SRR2940645), Human rhinovirus (SRR2010685), Norovirus (ERR233428), Pepino mosaic virus (ERR227848), Pepper mild mottle virus (ERR233431), Porcine kobuvirus (ERR1097471), Tobacco mild green mosaic virus (ERR233421), Tobacco mosaic virus (ERR233421), Tomato mosaic virus (ERR233424), West Nile virus (SRR527705) |
| ssRNA (−) | Human parainfluenza virus (SRR2010686), Human respiratory syncytial virus (ERR690491), Respiratory syncytial virus (SRR527708), Influenza A virus (ERR690510) |
| Retro-transcribing | Hepatitis B virus (ERR233420), Human immunodeficiency virus (SRR513075) |
Performance comparison between drVM, SURPI, VIP, and VirusTAP
| Target virus (run accession) | Read bases (Mbp) | drVM | SURPI (comprehensive) | VIP (sense) | VirusTAP | |
|---|---|---|---|---|---|---|
| Bovine viral diarrhea virus (SRR1170797) | 12.5 | Run time[ | 149 sec | 52 | 1683 sec | 71 sec |
| Result[ |
| 262 bp | 9078 bp | 353 bp | ||
| Human immunodeficiency virus (SRR1106548)[ | 600.9 | Run time | 598 sec | 32 | 25 | 1388 sec |
| Result | 3055 bp | 799 bp | 4632 bp | 2896 bp | ||
| Human papillomavirus (SRR062073) | 5000 | Run time | 608 sec | 18 | 8603 sec | 519 sec |
| Result |
| 3515 bp | 2164 bp | 555 bp | ||
| Human rotavirus A (DRR049387) | 266.1 | Run time | 464 sec | 59 | 6259 sec | 925 sec |
| Result | 13 contigs | 13 contigs | 41 |
| ||
| Influenza A virus (ERR690519) | 3300 | Run time | 12 | 16 | 86 | 4504 sec |
| Result |
| 11 contigs | 2673 reads | 34 viral contigs | ||
aThe analyses were executed on a quad-core CPU with 128 GB RAM, except for VirusTAP (executed on a 120-core CPU with 1 TB RAM). Please note that drVM is able to run these datasets with 8 GB RAM (Supplementary information).
bThe result is summarized as the largest length of viral contig (bold: close to the genome size of target virus), read number mapped to the target virus, or contig number depending on the output of tools and the target virus (Human rotavirus A: 11 segments, Influenza A virus: 8 segments).
cThe results were produced by analyzing reads with barcode: GCCAAT, except for SURPI, which recognizes multiple barcodes.