| Literature DB >> 29740407 |
Sam Nooij1,2, Dennis Schmitz1,2, Harry Vennema1, Annelies Kroneman1, Marion P G Koopmans1,2.
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.Entities:
Keywords: decision tree; pipeline; software; standardization; use case; viral metagenomics
Year: 2018 PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Classification workflows and their reference.
| CaPSID | Borozan et al., | |
| ClassyFlu | Van der Auwera et al., | |
| Clinical PathoScope | Byrd et al., | |
| DUDes | Piro et al., | |
| EnsembleAssembler | Deng et al., | |
| Exhaustive Iterative Assembly (Virus Discovery Pipeline) | Schürch et al., | – |
| FACS | Stranneheim et al., | |
| GenSeed-HMM | Alves et al., | |
| Giant Virus Finder | Kerepesi and Grolmusz, | |
| GOTTCHA | Freitas et al., | |
| IMSA | Dimon et al., | |
| IMSA+A | Cox et al., | |
| Kraken | Wood and Salzberg, | |
| LMAT | Ames et al., | |
| MEGAN 4 | Huson et al., | |
| MEGAN Community Edition | Huson et al., | |
| MePIC | Takeuchi et al., | |
| MetaShot | Fosso et al., | |
| metaViC | Modha, | |
| Metavir | Roux et al., | |
| Metavir 2 | Roux et al., | |
| MetLab | Norling et al., | |
| NBC | Rosen et al., | |
| PathSeq | Kostic et al., | |
| ProViDE | Ghosh et al., | |
| QuasQ | Poh et al., | |
| READSCAN | Naeem et al., | |
| Rega Typing Tool | Kroneman et al., | |
| RIEMS | Scheuch et al., | |
| RINS | Bhaduri et al., | |
| SLIM | Cotten et al., | “Available upon request” |
| SMART | Lee et al., | |
| SRSA | Isakov et al., | “Available upon request” |
| SURPI | Naccache et al., | |
| Taxonomer | Flygare et al., | |
| Taxy-Pro | Klingenberg et al., | |
| “Unknown pathogens from mixed clinical samples” | Gong et al., | – |
| vFam | Skewes-Cox et al., | |
| VIP | Li et al., | |
| ViralFusionSeq | Li et al., | |
| Virana | Schelhorn et al., | |
| VirFind | Ho and Tzanetakis, | |
| VIROME | Wommack et al., | |
| ViromeScan | Rampelli et al., | |
| VirSorter | Roux et al., | |
| VirusFinder | Wang et al., | |
| VirusHunter | Zhao et al., | |
| VirusSeeker | Zhao et al., | |
| VirusSeq | Chen et al., | |
| VirVarSeq | Verbist et al., | |
| VMGAP | Lorenzi et al., | – |
–, No website could be found, the workflow was unavailable.
Figure 1Generic pipeline scheme and breakdown of tools. (A) The process of classifying raw sequencing reads in 5 generic steps. (B) The steps that workflows use (in gray). UPfMCS: “Unknown Pathogens from Mixed Clinical Samples”; MEGAN CE: MEGAN Community Edition.
Technical details of classification workflows.
| CaPSID | Novoalign/Bowtie2 (any) | NCBI Genomes: GenBank viral, bacteria and fungi or custom | Same as search | NCBI human GRCh37/hg19 or custom | – | S | Python 2.7, MongoDB, OpenJDK, BioPython, pysam, Novoalign, BioScope, JBrowse, Groovy-Grails |
| ClassyFlu | HMMER | NCBI Influenza Virus Resource | – | – | – | S | HMMER |
| Clinical PathoScope | Bowtie2 | NCBI Genomes: viral, bacteria | Bowtie2 | NCBI human GRCh37/hg19, GenBank human rRNA | – | SFS | Python, Bowtie2 |
| DUDes | Bowtie2 | DUDesDB | – | – | – | S-pp | Python, Bowtie2 |
| EnsembleAssembler | Megablast | NCBI RefSeq: viral, bacteria | Bowtie2 | NCBI human GRCh37/hg19 | SOAPDenovo, ABySS, MetaVelvet, CAP3 | FPAS | Python, SOAPDenovo, ABySS, MetaVelvet, CAP3, MegaBLAST, Bowtie2, VecScreen |
| Exhaustive Iterative Assembly (Virus Discovery Pipeline) | BLAST, MAFFT, PhyML | NCBI BLAST nt + nr, NCBI viral proteins (per hit), Pfam, custom | BLASTn | NCBI nt: aves, carnivora, primates, rodentia, ruminantia | Newbler, CAP3 | PAFSPh | Python, Newbler, GSMapper, CAP3, BLAST, HMMER, MEME, MAST, Bioconductor, R |
| FACS | Bloom filter | Custom | – | – | – | S | Perl; C |
| GenSeed–HMM | BLASTx | – | – | CAP3, Velvet, Newbler, SOAPDenovo, ABySS | AS | HMMER, BLAST, EMBOSS, CAP3, Velvet, Newbler, SOAPDenovo, ABySS | |
| Giant Virus Finder | BLAST | NCBI BLAST nt, custom reference genomes | – | – | – | S | Perl, Python, BLAST |
| GOTTCHA | BWA (mem) | Custom | – | – | – | PS-pp | Perl, BWA |
| IMSA | BLAST | Custom | Bowtie, BLAT, BLAST | User-defined (human genome) | – | FS | Python, Blast, Bowtie2, Blat |
| IMSA+A | Bowtie2 | “The reference genome” | – | – | Oases, Velvet, Trinity | FAS | Python, IMSA, BLAST+, BLAT, Bowtie2, Oases, Velvet, Trinity |
| Kraken | Kraken | NCBI RefSeq: bacteria, NCBI Genomes: GenBank bacteria + archaea | – | – | – | S-pp | C++, Perl |
| LMAT | Custom | NCBI Genomes: GenBank bacteria | – | – | – | S | gcc, Python, OpenMP, MPI |
| MEGAN CE | DIAMOND | NCBI BLAST nt | – | – | – | S-pp | Java, DIAMOND, InterPro2GO, SEED viewer, eggNOG viewer, KEGG |
| MEGAN4 | BLAST | NCBI BLAST nt + nr | – | – | – | S | Java |
| MePIC | Megablast | NCBI BLAST nt | BWA | NCBI human GRCh37/hg19 | – | PFS | fastq-mcf (ea-utils), BWA, Megablast |
| MetaShot | Bowtie2, TANGO | NCBI GenBank: viral, bacteria fungi (from plantae), protitsta (from invertebrate), and NCBI RefSeq: viral, bacteria, fungi, Protista (from invertebrate) | STAR | NCBI human GRCh37/hg19, 2009 | – | PFS | Python, Bash, FaQCs, STAR, Bowtie2, TANGO |
| metaViC | DIAMOND | NCBI RefSeq: Complete protein | RiboPicker | – | IDBA-UD, SPAdes | PFSAS | Bowtie2, DIAMOND, filter_fastq.pl, GARM, IDBA-UD, Kronatools, prinseq, QUAST, riboPicker, SPAdes, Trim Galore |
| Metavir | BLASTx, MUSCLE, PhyML, HMMER | NCBI RefSeq: viral, Pfam, NCBI BLAST nr | – | – | CAP3 | SASPh | MUSCLE, CAP3, Gblocks, PhyML, Scriptree |
| Metavir 2 | BLAST, custom, FastTree | NCBI RefSeq: viral, Pfam | – | – | – | SPh | Perl, Php, Javascript, Css, R, BLAST, FastTree, MetaGeneAnnotator, HMMScan, Uclust, jackhmmer, RaphaelSVG, Cytoscope-web |
| MetLab | Kraken, HMMER3 | NCBI RefSeq: bacteria, archaea, NCBI GenBank: viral, phage, vFams | Bowtie2 | “Host genome” (human) | SPAdes | P(F)(A)SS | Python (2.7) + libraries, GNU MPFR, Prinseq-Lite, Bowtie2, SAMTOOLS, SPAdes, Krona Tools, Kraken, FragGeneScan, HMMsearch, vFamParse |
| NBC | Custom | “Unique N-mer frequency profiles of 635 microbial genomes” | – | – | – | S | Perl, C++ |
| PathSeq | BLAST | NCBI BLAST nt: viruses, gungi, NCBI Genomes: bacteria, NCBI BLAST nr | MAQ, Megablast, BLASTN | 1000 Genomes Project: female reference, Ensembl: Homo sapiens cDNA, NCBI BLAST: human genome, transcriptome, NCBI human hs_alt_Celera, hs_alt_HuRef, hs_ref_GRC37, NCBI Genomes: Homo sapiens RNA, Ensembl Homo sapiens DNA | Velvet | PFFAS | Python, Java, C++, C, Hadoop, MAQ, Megablast, Blast, Velvet, RepeatMasker |
| ProViDE | BLASTx | NCBI BLAST nr | – | – | – | S | Perl, Python |
| QuasQ | Bowtie2 | “The reference genome” | – | – | – | PS-pp | Perl, Bowtie2 |
| READSCAN | SMALT | ? | SMALT | ? | – | S | Perl, SMALT, Makeflow |
| Rega Typing Tool v3 | BLAST, TreePuzzle | Custom | – | – | – | SPh | Php, Java, R, TreePuzzle |
| RIEMS | GS Mapper, BLAST | NCBI BLAST nt, nr | – | – | Newbler | PASAS(SS) | Bash, 454 genome sequencer software suite, Newbler, GS mapper, sff/fna tools, BLAST, Emboss |
| RINS | BLAT, BLAST | Custom | Bowtie | Human genome | Trinity | SPFAS | Blat, Bowtie, Trinity, BLAST |
| SLIM | BLAST | NCBI GenBank entries of 2,000–500,000 bp long | – | – | SPAdes | PAS | Python, BLAST, QUASR, MEGAN, BWA, SPAdes, MUMmer |
| SMART | Custom | NCBI GenBank: release v209 | – | – | – | S | C++, Ruby, Flash, Sickle, Google SparseHash, GNU parallel |
| SRSA | Megablast | NCBI GenBank: CDS translations, PDB, SwissProt, PIR, PRF | BWA | NCBI human GRCh37/hg19 | Velvet | PFAS | BWA, fastx_toolkit, BLAST, MegaBlast, Velvet |
| SURPI | RAPSearch2, SNAP | “fast”: NCBI RefSeq: bacteria genomic, NCBI BLAST nt + nr - viridae; “comprehensive”: NCBI BLAST nt, nr, | SNAP | NCBI human GRCh37/hg19, NCBI RefSeq rRNA, mRNA, mtRNA (March 2012) | Minimo, ABySS | PFS(AS) | Bash, Python, Perl, fastQValidator, Minomo, ABySS, RAPSearch2, seqtk, SNAP, gt-sequniq, fastq, cutadapt, prinseq-lite, dropcache |
| Taxonomer | KAnalyze + custom k-mer matching | UniRef90 viruses | KAnalyze | Greengenes, UNITE, UniRef50, Ensembl | – | FFSS-pp | Cython, KAnalyze |
| Taxy-Pro | CoMet | Pfam + metagenomes | – | – | – | S | MATLAB, CoMet webserver |
| “Unknown pathogens from mixed clinical samples” | BLAST | NCBI BLAST nt | – | – | CLC Genomics | SAPh-pp | CLC Genomics Workbench (7.5), Blast, Bowtie2, Clustal Omega, MEGA6, SimPlot |
| vFam | HMMER | NCBI RefSeq: viral protein | – | – | – | S | HMMER(, CD-HIT, MCL, MUSCLE, BLAST) |
| VIP | Bowtie2, RAPSearch2 (depending on mode), MAFFT, ETE | “fast”: ViPR/IRD nucleotide DB; “sense”: NCBI RefSeq: viral genomic, viral protein, NCBI GenBank viral neighbor genomes | Bowtie2 | NCBI human GRCh38/hg38, NCBI RefSeq rRNA, RNA, mtDNA (July 2015), GOTTCHA bacterial DB | Velvet-Oases | PFSAPh | Shell, Python, Perl, PICARD, Bowtie2, MAFFT, Velvet-Oases, RAPSearch2, ETE |
| ViralFusionSeq | BWA, BLAST | “Viral sequences and human decoy sequences“ | – | – | – | PS | Perl, BWA, BLAST |
| Virana | STAR(, BLAST, LASTZ) | NCBI RefSeq: “viruses,” Repbase human endogenous retroviruses | STAR | NCBI GRCh37/hg19, Ensembl human cDNA | Trinity, Oases | PS(A)(S)Ph | Python, STAR, BWA-mem, LASTZ, RazerS3, Jalview, Trinity, Oases |
| VirFind | BLAST | NCBI BLAST nt, NCBI RefSeq viral protein | Bowtie2 | Custom | Velvet, CAP3 | PFAS(S) | fastx-toolkit, seq_crumbs, Bowtie2, Velvet, Blast, CAP3, Python |
| VIROME | BLAST | UniRef100, SEED, ACLAME, COG, GO, KEGG, MGOL, CAMERA, UniVec | BLASTn. tRNAscan-SE | ”A rRNA subject database“ | – | PFS | Adobe Flex, MySQL, Blast, tRNA scan SE, MetaGene Annotator, CD-Hit 454 |
| ViromeScan | Bowtie2 | NCBI Genomes: GenBank viral, in-house built reference databases | BMTagger | NCBI human GRCh37/hg19 | – | SPFFS | Bash, R, Perl, Java, Bowtie2, Bmtagger, Picard |
| VirSorter | BLASTp, HMMER3 | Pfam, custom | – | – | – | PS | Perl, HMMER3, MCL, MetaGeneAnnotator, MUSCLE, BLAST |
| VirusFinder | BLAST, BLAT, Bowtie2, BWA | RINS virus DB, or GIB-V | Bowtie2 | NCBI human CRCh37/36-hg19/hg18 | Trinity | FS(AS/S-pp) | Perl, BLAST+, BLAT, Bowtie2, BWA, iCORN, CREST, GATK, SAMtools, SVDetect, Trinity |
| VirusHunter | BLAST | NCBI BLAST nt, nr | BLASTn | Host genome | – | PFS(S) | Perl, MySQL, Blast, CD-HIT, RepeatMasker |
| VirusSeeker | BLAST | NCBI BLAST nt, nr viruses (custom) | BWA-MEM, MegaBlast, BLASTn, BLASTx | NCBI RefSeq: bacteria genomic, NCBI BLAST nt, nr | – | PSF | Perl, SLURM, BLAST, MegaBLAST, BWA-MEM, cutadapt, ea-utils, PRINSEQ, CD-HIT, Tantan, RepeatMasker, Newbler, Phrap |
| VirusSeq | MOSAIK | GIB-V, hg19 Virus (NCBI human GRCh37/hg19 + TCGA cancer-associated viruses) | MOSAIK | NCBI human GRCh37/hg19 | – | FS | Perl, MOSAIK |
| VirVarSeq | BWA | Custom | – | – | – | SS-pp | BWA, Q-cpileup, R, Fortran, Perl |
| VMGAP | BLAST, HMMER | NCBI BLAST nt, env_nt, env_nr, NCBI GenBank CDDDB, UniProtDB, OMNIOMEDB, Pfam, TIGRFAM, ACLAME, pfam2gomappingsDB | – | – | – | SSSSS-pp | HMMER, BLAST (NCBI-toolkit), SignalP, TMHMM, PRIAM |
A, assembly; F, filter; P, pre-process; Ph, phylogeny; pp, post-process; S, search. –: not used/specified.
Usability features of classification workflows.
| Taxonomer | Any (webservice), or Linux, Mac OS | Yes/no (webservice/local installation) | Yes | ”Real-time, interactive“ - <10 min | |
| Rega Typing Tool v3 | Any (webservice) | Yes (webservice) | Yes | 500 seqs in 5 h | |
| NBC | Any (webservice) | Yes (webservice) | Yes | ± 21 h | |
| MePIC | Any (webservice) | Yes (webservice) | Use yes, download upon request | 10 h on 1 CPU, 6 min on 100 CPUs (Megablast only) | |
| Metavir 2 | Any (webservice) | Yes (webservice) | Use yes, download no | Hours–days | |
| VirSorter | Any (webservice) | Yes (webservice) | Yes | Unknown | |
| ClassyFlu | Any (webservice); or Linux, Mac OS | Yes (webservice) | Yes | Supplied with download | Unknown |
| VIROME | Any (webservice) | Yes (webservice) | Use yes, download no | Unknown | |
| VirFind | Any (webservice) | Yes (webservice) | Use yes, download no | – | ±70 h |
| CaPSID | Linux, Mac OS | Yes | Yes | ±20 min | |
| MetLab | Any | Yes | Yes | <40 min | |
| MEGAN Community Edition | Any | Yes | Yes | ±5.5 h | |
| MEGAN4 | Any | Yes | For academic use | Unknown | |
| Kraken | Linux | No (Illumina BaseSpace integration?) | Yes | ±1 h | |
| FACS | Linux | No | Yes | ”±20 times faster than BLAT/SSAHA2“ | |
| EnsembleAssembler | Linux | No | Yes | <5 min (on 8 CPU server) | |
| ViromeScan | Linux, Mac OS | No | Yes | 140 sequences/s/CPU | |
| DUDes | Linux | No | Yes | 15–30 min | |
| MetaShot | Linux | No | Yes | 2-3x slower than Kraken-MetaPhlAn2 | |
| Clinical PathoScope | Any | No | Yes | < 1 h | |
| READSCAN | Linux | No | Yes | <27 min on 16 CPU-HPC - 4 h | |
| Virana | Linux, Mac OS | No | Yes | ±30 min/CPU | |
| SURPI | Linux | No | Yes | ±1 h (fast), ± 5 h (comprehensive) | |
| RINS | Linux | No | Yes | Supplied with download(?) | ±3 h (2CPU), ±15 min (16CPU) |
| IMSA | Linux, Mac OS | No | Yes | hours | |
| GOTTCHA | Linux, Mac OS | No | Yes | ±4 h (”2-5x slower than Kraken“) | |
| Giant Virus Finder | Linux, Mac OS | No | Yes | ±30 CPU hours | |
| VIP | Linux (Ubuntu, Biolinux) | No | Yes | <2 d | |
| VirusFinder | Linux | No | Yes | 3 d | |
| ViralFusionSeq | Linux | No | Yes | supplied with download | >1 week |
| QuasQ | Linux, Mac OS | No | Yes | Unknown | |
| IMSA+A | Linux | No | Yes | Unknown | |
| GenSeed-HMM | Linux | No | Yes | Unknown | |
| VirVarSeq | Linux, Mac OS | No | Yes | Unknown | |
| VirusSeeker | Linux | No | Yes | Unknown | |
| vFam | Linux, Mac OS | No | Yes | – | Unknown |
| metaViC | Linux, Mac OS | No | Yes | – | Unknown |
| PathSeq | Linux, cloud (Amazon EC2, Apache Hadoop) | No | Yes | – | Unknown |
| Taxy-Pro | Any (webservice or MATLAB) | No | Yes | – | ”About three orders of magnitude faster than speed-optimized BLAST“ |
| VirusSeq | Linux, Mac OS | No | Yes | – | >1 week |
| LMAT | Linux | No | Yes | – | 1.3 Mbp/s |
| RIEMS | Linux | No | Yes | – | 10 h (24 CPU-HPC) |
| SMART | Linux | No | For academic use | <10 min or ±2 M reads/min (on HPC - 192 CPUs) | |
| SLIM | Linux, Mac OS | No | Upon request | – | hours of searching, hours for assembly per sample (almost 10x faster than BLAST) |
| ProViDE | Linux (Ubuntu/Fedora) | No | Academic, non-profit | – | Hours (1 h/100,000 reads - slower than MEGAN) |
| SRSA | Linux | No | Upon request | – | Unknown |
| VirusHunter | Linux | No | Non-profit | – | Unknown |
| Metavir | Any (webservice) | Yes (webservice) | Superceded by newer version | Interactive | |
| VMGAP | ? | No | No (only at JCVI) | – | Unknown |
| ”Unknown pathogens from mixed clinical samples“ | Windows? | No | No | – | Interactive |
| Exhaustive Iterative Assembly (Virus Discovery Pipeline) | Linux | No | No | – | Interactive |
Workflows are sorted by: availability of a graphical user-interface (yes-no), runtime (fast-slow), and availability (yes-limited-no). ?: No operating system specified; –: no user manual found.
Validation features of classification workflows.
| Kraken | MetaShot, IMSA+A, Taxonomer, GOTTCHA, RIEMS, MetLab | – | 67 (21) | 92 (6) | 97 (2) | Yes (3-3-2016) | 334 |
| RINS | CaPSID, Virana, ReadScan, developers | PCR + Sanger sequencing | 49 (16) | 100 (4) | 100 (4) | Yes (10-1-2012) | 51 |
| CaPSID | Virana, developers | 66 (8) | 100 (4) | 100 (4) | Yes (2-6-2012) | 26 | |
| MEGAN 4 | MetLab, Bazinet and Cummings, 2012 | – | x | x | x | Yes (new version) | 752 |
| VirSorter | Developers | Manual curation of prophages | 62 (6) | – | 90 (6) | Yes (15-2-2017) | 34 |
| Virana | Developers | FISH, Southern blot | 67 (4) | – | 78 (4) | Yes (1-6-2014) | 9 |
| vFam | Developers | Compared to previous studies | 33 (3) | 99 (3) | 34 (3) | Yes (9-2-2014) | 19 |
| MEGAN Community Edition | IMSA+A | – | x | x | x | Yes (12-7-2017) | 22 |
| NBC | MetLab | – | 100 (1) | 33 (5) | 49 (1) | Yes (28-7-2010) | 125 |
| SURPI | Taxonomer | – | 61 (3) | – | – | Yes (5-6-2015) | 128 |
| PathSeq | Readscan, developers | – | 51 (10) | – | – | Yes (23-11-20164 m) | 158 |
| Metavir 2 | ViromeScan | – | 82 (1) | – | – | Yes (26-7-2016) | 63 |
| Clinical PathoScope | RIEMS | – | 18 (13) | – | – | Yes (21-6-2016) | 21 |
| ProViDE | MetLab | – | 53 (1) | 37 (5) | 73 (1) | No | 19 |
| VirusSeq | – | Serology, colorimetric | – | – | – | Yes (9-8-2013) | 50 |
| ViralFusionSeq | – | Sanger sequencing | – | – | – | Yes (19-2-2017) | 31 |
| VIP | – | ”Independent confirmatory testing results“ | – | – | – | Yes (21-2-2017) | 5 |
| VirusHunter | – | EM, serology (hemagglutanation inhibition) | – | – | – | Unknown | 46 |
| SLIM | – | RT-PCR | – | – | – | Yes | 27 |
| ”Unknown pathogens from mixed clinical samples" | – | PCR, ELISA | – | – | – | Unknown | 1 |
| RIEMS | Developers | – | 91 (13) | 100 (13) | 100 (13) | Yes (10-3-2015) | 11 |
| LMAT | Developers | – | 50 (6) | – | 93 (6) | Yes (17-11-2016) | 64 |
| GOTTCHA | Developers | – | 71 (1) | – | – | Yes (26-6-2017) | 31 |
| IMSA | Developers | – | 92 (4) | – | – | Yes (17-4-2014) | 10 |
| READSCAN | Developers | – | 62 (15) | – | – | No (16-9-2012) | 30 |
| FACS | Developers | – | 99 (2) | 100 (2) | – | Yes (17-12-2015) | 39 |
| Taxonomer | Developers | – | 95 (4) | 91 (1) | – | Yes (3-7-2017) | 16 |
| QuasQ | Developers | – | 96 (9) | – | 99 (9) | Yes (10-7-2014) | 5 |
| ViromeScan | Developers | – | 100 (1) | – | 100 (1) | Yes (29-5-2017) | 4 |
| GenSeed-HMM | Developers | – | 62 (4) | – | 82 (4) | Yes (13-10-2016) | 0 |
| IMSA+A | Developers | – | 97 (8) | – | 81 (8) | Yes (18-7-2017) | 0 |
| MetaShot | Developers | – | 98 (1) | – | 98 (1) | Yes (22-6-2017) | 0 |
| SMART | – | – | – | – | – | Yes (19-5-2016) | 4 |
| MetLab | – | – | – | – | – | Yes (28-2-2017) | 0 |
| EnsembleAssembler | – | – | – | – | – | No (30-11-2014) | 41 |
| DUDes | – | – | – | – | – | Yes (22-11-2016) | 3 |
| VirusFinder | – | – | – | – | – | Yes (19-6-2014) | 49 |
| VirusSeeker | – | – | – | – | – | Yes (21-11-2016) | 1 |
| VirVarSeq | – | – | – | – | – | Yes (28-4-2015) | 13 |
| Taxy-Pro | – | – | – | – | – | Yes (16-1-2013) | 14 |
| VirFind | – | – | – | – | – | Yes (30-6-2017) | 31 |
| Metavir | – | – | – | – | – | Yes (new version) | 88 |
| metaViC | – | – | – | – | – | Yes (20-6-2017) | NA |
| MePIC | – | – | – | – | – | Yes | 15 |
| ClassyFlu | – | – | – | – | – | Unknown | 0 |
| Rega Typing Tool v3 | – | – | – | – | – | Unknown | 79 + 298 |
| VIROME | – | – | – | – | – | Unknown | 59 |
| Giant Virus Finder | – | – | – | – | – | No (7-6-2015) | 3 |
| SRSA | – | – | – | – | – | Unknown | 40 |
| VMGAP | – | – | – | – | – | Unknown | 25 |
| Exhaustive Iterative Assembly (Virus Discovery Pipeline) | – | – | – | – | – | Unknown | 11 |
Workflow were ordered as: Tested by multiple other groups, benchmarked by developers and validated by other experiments, tested by one other group, validated by other experiments, benchmarked by developers, no sign of benchmark tests with updates, no validation and no updates. Tested by: the groups that have tested the workflow. Validation methods: the experiments conducted by the developers to validate the computational results. Sensitivity, specificity and precision: average performance scores of a number (between brackets) of different benchmark tests. Updates: whether or not a pipeline has received updates after publication. Citations: numbers of citations in Google Scholar as of 28 March 2017.
x: MEGAN visualizes the output of BLAST or DIAMOND and calculates lowest common ancestors. See Figure 2 for different scores.
: From personal communication with the developer, we know SLIM has been updated. –: absent/no information available.
Figure 2Different benchmark scores of virus classification workflows. Twenty-seven different workflows (Left) have been subjected to benchmarks, by the developers (Top) or by independent groups (Bottom), measuring sensitivity (Left column), specificity (Middle column) and precision (Right column) in different numbers of tests. Numbers between brackets (n = a, b, c) indicate number of sensitivity, specificity, and precision tests, respectively.
Figure 3Correlations between performance scores and analysis steps. Sensitivity, specificity and precision scores (in columns) for workflows that incorporated different analysis steps (in rows). Numbers at the bottom indicate number of benchmarks performed.
Figure 4Correlation between performance and search algorithm and runtime. Sensitivity, specificity and precision scores (in columns) for workflows that incorporated different search algorithms, using either nucleotide sequences, amino acid sequences or both, and workflows with different runtimes (rows). Numbers at the bottom indicate number of benchmarks performed.
Correlation between runtime and method.
| Pre-process | 1 | 6 |
| No pre-process | 7 | 3 |
| Filter | 2 | 5 |
| No filter | 6 | 4 |
| Assembly | 2 | 3 |
| No assembly | 6 | 6 |
| Nt sequences | 6 | 6 |
| Aa sequences | 1 | 1 |
| Nt + aa sequences | 1 | 2 |
| Alignment | 2 | 8 |
| Alignment + phylogeny | 2 | 0 |
| Exact k-mer matching | 3 | 0 |
| k-mer matching | 1 | 0 |
| Composition search | 0 | 1 |
Seventeen workflows, for which runtimes had been reported, were compared to find correlations between runtime and methods. Numbers indicate the number of workflows that process samples in a timeframe of either minutes or hours that use the method listed in the left column. Grayscales are proportional to the total number of scores per group, i.e., like a heatmap lower numbers are lighter and high numbers dark.
Figure 5Decision tree for selecting a virus metagenomics classification workflow for medical applications. Workflows are suitable for medical purposes when they can detect pathogenic viruses by classifying sequences to a genus level or further (e.g., species, genotype), or when they detect integration sites. Forty workflows matched these criteria. Workflows can be applied to surveillance or outbreak tracing studies when very specific classification are made, i.e., genotypes, strains or lineages. A 1-day analysis corresponds to being able to analyse a sample within 5 h. Detection of novel variants is made possible by sensitive search methods, amino acid alignment or composition search, and a broad reference database of potential hits. Numbers indicate the number of workflows available on the corresponding branch of the tree.
Figure 6Decision tree for selecting a virus metagenomics classification workflow for biodiversity studies. Workflows for the characterisation of biodiversity of viruses have to classify a range of different viruses, i.e., have multiple reference taxa in the database. Forty-three workflows fitted this requirement. Novel variants can potentially be detected by using more sensitive search methods, amino acid alignment and composition search, and using diverse reference sequences. Finally, workflows are grouped by the taxonomic groups they can classify. Numbers indicate the number of workflows available on the corresponding branch of the tree.