| Literature DB >> 28506246 |
Jake Lin1, Lenka Kramna2, Reija Autio3, Heikki Hyöty4,5, Matti Nykter6, Ondrej Cinek7.
Abstract
BACKGROUND: Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols.Entities:
Keywords: Assembly; Metagenomics; NGS analysis; Parallel processing; Viral dark matter; Viromes; Virus; Visualization
Mesh:
Year: 2017 PMID: 28506246 PMCID: PMC5430618 DOI: 10.1186/s12864-017-3721-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Comparison of the existing virome pipelines tools
| Pipeline Tool | Vipie | ViromeScan [ | VirusTAP [ | Virome [ | Metavir [ | Taxonomer [ | MetaShot [ |
|---|---|---|---|---|---|---|---|
| Primary goal | Parallel analysis of multiple viral metagenomes from web and suited for molecular epidemiology studies. | To profile viromes using databases of existing eukaryotic viruses without assembly. | Identification of viruses in a sample, after a thorough elimination of known non-viral sequences. | Classification of all putative ORF found in a viral metagenome, characterization of viral communities. | Analysis of virome, diversity metrics and marker gene phylogenies. | Ultra fast metagenomics analysis focusing on detection of microorganisms, including virus and bacterial. | Highly accurate and comprehensive workflow for host-associate microbiome classification on multiple samples. |
| Web based | Yes. | No. | Yes. | Yes (Flash required). | Yes. | Yes. | No. |
| Outputs | Interactive table, plots and raw downloads. Clustered heatmaps with dynamic group assignment re-plots. | Static population pie charts. Sample based clustered heatmaps. | Contig based hits and seamless web BLAST interface. | Rich collection of sample source virome ORF and sequence categories. | Comparative analysis of viromes and annotations including networks, nonmetric distance and tree maps. | Interactive pie charts with kingdoms in bins and also impressive sunburst flare sub classifiers. | A Krona graph and Interactive Taxonomy HTML table along with csv file. |
| Source data | Paired-end reads; | Sinle-end or paired-end reads; | Paired-end reads. |
| Reads (>300 bases) or assembled contigs. | Paired-end reads in fastq and fasta formats. | Paired-end reads in fastq format. |
| Trimming and filtering | YES, as the first step. | YES, after selection of viral reads, at the level of a | YES, as the first step. | YES: quality based; duplicate filtering; contamination | Not specified. | Not specified. | YES, as the first step. |
| De-novo assembly | YES, a choice of assemblers. | No. | YES, a choice of assemblers; done after subtraction steps. | No. | No. | No. | No. |
| Subtraction of human ref. and bacterial ribosomal sequences | Optional, only for the output of dark matter sequences. | YES, using Human Best Match Tagger. No for ribosomal. | YES, also other host databases available (mouse etc.). | Not specified for human. Ribosome is removed using BLAST against rDNA db. | Not specified. | Not subtracted but reported as part of detection. | Yes, reports identification of human host reads and bacterial mappings. |
| Means of virus identification | (a) BLAST against a pan-viral database. | Mapping to the members of the virus database using | BLAST search against the NCBI nt database. | Protein BLASTP upon two databases. Several tiers of classification of the ORFs. | Not specified. | Taxonomer Binner DB with 21 bp kmers unique identifiers to known viruses. | Custom similarity workflow with hamming distance. |
| Virus database for identification | A custom database containing 20759 human, animal, plant and bacterial viruses. | Eukaryotic viruses only. Four custom databases available for download. | Specificity is maintained by the subtraction steps prior to assembly and BLAST search. | UniRef 100 peptide database, five annotated protein databases, MetaGenomes On-line. | GAAS tool ( | Binner DB needs to be built using KAnalyze [ | TANGO [ |
| Action when a read maps to different viruses | Score is split among the hit reference sequences. | Not specified. | Not specified. | Not specified. | Not specified. | Assigns as ambiguous. | Parsed for human endogenous retrovirus otherwise classify as ambiguous and discarded. |
Most tools use BLAST [23] for initial detection of known references. Vipie uniquely allows web parallel analysis of multi-samples and accounts read hits to multiple viral references for comprehensive population profiling
Fig. 1Vipie web flow chart. For efficiency, sample based paired FASTQ files are uploaded as a zipped archive with optional mapping file. Illumina BaseSpace archive downloads can be used without changes. All pipeline parameters can be entered using the web form. The default values and use case are listed in the user guide available at home page along with example multi-sample archive input
NGS samples used in Vipie validation from Human Microbiome Project, Africa study, and diarrhea sample sourced in Japan gastroenteritis outbreak. ViromeScan listed 20 HMP samples but only Stool types of 4 samples passed QC
| AccessionId | Source | Sample Type | Number of Readsa | Sample used in Vipie-ViromeScan-VirusTAP validation | Vipie Resultsb |
|---|---|---|---|---|---|
| SRS072276 | HMP | Blood | 438,879 | Yes-No-No | 1,2 |
| SRS072318 | HMP | Blood | 753,994 | Yes-No-No | 1,2 |
| SRS019033 | HMP | Retroauricular | 1,285,003 | Yes-No-No | 1 |
| SRS016944 | HMP | Retroauricular | 1,619,439 | Yes-No-No | 1 |
| SRS012902 | HMP | Stool | 2,039,473 | Yes-Yes-No | 1 |
| SRS014923 | HMP | Stool | 2,009,179 | Yes-Yes-No | 1 |
| SRS014466 | HMP | Vagina | 367,077 | Yes-No-No | 1,2 |
| SRS015072 | HMP | Vagina | 495,256 | Yes-No-No | 1,2 |
| SRS072313 | HMP | Nasal | 320,672 | Yes-No-No | 2 |
| SRS072261 | HMP | Nasal | 367,384 | Yes-No-No | 2 |
| SRS072366 | HMP | Nasal | 114,414 | Yes-No-No | 2 |
| S11 | Africa | Stool | 1,634,821 | Yes-No-No | 2 |
| S12 | Africa | Stool | 1,191,427 | Yes-No-No | 2 |
| S14 | Africa | Stool | 1,143,784 | Yes-No-No | 2 |
| DRA004165 | Japan | Diarrheal | 1,108,688 | Yes-No-Yes | 2 |
In addition to those stool samples, Vipie test archive includes 4 other HMP sample types. Result links with performance time are also provided
aInput archive of Result 2 samples (subsampled 20% 225 MB) available at: https://binf.uta.fi/vipie/data/vipie_archive_ssampled.zip
bResults 1: https://binf.uta.fi/vipie/results.html?key=2HSPXukkDS (66 min)
Results 2: https://binf.uta.fi/vipie/results.html?key=eLZPuObVoU (82 min)
Fig. 2Interactive population profile maps and diversity. Vipie results are securely accessed and browser based. a Population chart slices are clickable and their sizes represent relative percentage of relevant taxonomy level. Diarrheal sample is dominated by dsRNA (orange) Rotavirus while African stool samples contain ssRNA (green) and dsDNA viruses. b Alpha diversity is calculated using Shannon entropy. Vipie charts are interactive and can be saved as multiple image formats
Fig. 3Clustered heatmap of HMP, African and Japanese diarrheal samples. Public NGS data from different consortiums provide opportunities for advanced comparative virome analysis. Healthy HMP sample types clustered correctly (nasal, vaginal, blood samples) while a Japanese sample (gastroenteritis dataset from the VirusTAP report) and African samples (known to be positive for multiple viruses) showed different signatures. HMP samples can be identified using the legend on upper right, with olive green for nasal, yellow for vagina and blue for blood. Samples from rural Africa and VirusTAP (Japan) are marked in colors brick and red
Fig. 4QC and distribution of reads including dark viral matter. a The chart shows the number of NGS reads retained per sample through QC, interlacing and de novo assembly. b Sample reads, along the x-axis and their aligned origins are shown as stacked bars. Shown in black, unmapped viral ‘dark matter’ is of high interest across virology studies. Blue bars represent bacterial ribosome, green for human while red is for known viral matches
(A) Read assignment benchmark assessment of MetaShot and Vipie on simulated dataseta consisting of 19 582 500 human (94.5%), 986 114 bacterial (4.8%) and 146 886 viral (0.7%) reads. Vipie percentages are based on random subsampling of 1 000 000 reads and bacterial statistics are not reported as Vipie reports information on bacterial ribosome only (the bacterial genomic DNA is not filtered out, as it might lead to loss of dormant phage sequences). (B) Precision, Recall and F-measure are calculated on the same data. Input reads and assessment script are available on SourceForgeb
| A | Assigned %c | Correctly Assigned %d | ||
| MetaShot | Vipie | MetaShot | Vipie | |
| Human (host) | 99.18 | 99.27 | 99.99 | 99.27 |
| Viruses | ||||
| Family | 97.74 | 99.98 | 98.53 | 93.39 |
| Genus | 97.39 | 98.99 | 99.75 | 93.33 |
| Species | 97.81 | 93.66 | 96.70 | 92.97 |
| B | Human (host) | Virus | ||
| MetaShot | Vipie | MetaShot | Vipie | |
| Precision (%) | 100.00 | 100.00 | 98.30 | 96.85 |
| Recall (%) | 99.97 | 99.96 | 98.19 | 95.36 |
| F-measure (%) | 100.00 | 99.98 | 98.07 | 96.08 |
| Unclassified (%) | 1.04 | 0.73 | 3.94 | 6.73 |
a https://recascloud.ba.infn.it/index.php/s/nw4s9hqnF8QkBsK
b https://sourceforge.net/projects/vipie/files/validation/k
cThe percentage refers to the total number of reads assignable to the specific taxonomic rank
dThe percentage refers to the relevant assigned reads