Literature DB >> 28045932

Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data.

Guillaume Carissimo1,2,3, Marius van den Beek4,5, Kenneth D Vernick1,2,6, Christophe Antoniewski4,5.   

Abstract

Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions.

Entities:  

Mesh:

Year:  2017        PMID: 28045932      PMCID: PMC5207757          DOI: 10.1371/journal.pone.0168397

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Viruses infect cells and manipulate the host machinery for their replication and transmission. Genomes of viruses show high diversity and can consist of single- or double-stranded RNA or DNA. Many types of viral replication cycles exist which may involve various cellular compartments, various DNA or RNA replication intermediates, and diverse strategies for viral RNA transcription and viral protein translation. Thus, deep-sequencing has become a powerful approach for virologists in their quest to detect and identify viruses in biological samples, even when they are present at low levels. Plants and invertebrates use RNA interference as an antiviral mechanism [1,2]. Antiviral RNAi activity results in accumulation of viral interfering small RNAs (viRNAs), whose extent depends on several factors such as the ability of a virus to replicate in the host and to evade the host RNAi machinery. Moreover, viRNAs derived from a variety of viruses can be detected in host organisms, regardless if these viruses have positive single strand, negative single strand or double-stranded RNA genomes, or DNA genomes [2]. Together, these features make small RNA deep sequencing a potent approach to detect viruses regardless of their genomic specificities, and different bioinformatic tools have been developed for detection or de novo assembly of viral genomes. Accordingly, viRNAs produced by the insect model Drosophila melanogaster in response to viral infections were sufficient to reconstruct and improve the genomic consensus sequence of the Nora virus [3] using the Paparazzi software [4] which is based on the SSAKE assembler [5]. In that study, Paparazzi improved the consensus sequence and the coverage of the Nora virus genome by ~20%, as compared to the previous Nora virus reference genome. SearchSmallRNA, a standalone tool with a graphical interface, used a similar approach to reconstruct viral genomes [6]. Importantly, both programs require known, closely related viral references for proper guidance of genome reconstructions from viRNAs, precluding the identification of more distant viral species or discovery of novel or unexpected viruses. To circumvent the need for viral reference sequences, Velvet [7] de novo assembled contigs from plant [8], fruit fly and mosquito [9] have been aligned to NCBI sequence databases, allowing the identification of partial or complete viral genomes. Several studies improved this strategy by combining two de novo assemblers [10-13], or scaffolding virus-aligned contigs using an additional translation-guided assembly step [14]. Collectively, these studies allowed important progress in virus assembly and identification from deep sequencing data. However, the existing computational workflows require specialist skills for installation, execution and adaptation to specific research, making them poorly accessible to a broad user base of biologists. In some cases, the tools lack documentation or their source codes are not available. In this context, we developed Metavisitor as an open source set of tools and preset workflows [15,16] which allow effective implementation of the computational strategies in the Galaxy framework, with short read as well as long read sequence datasets. In addition, Metavisitor workflows can be easily adapted to suit specific needs, by adding analysis steps or replacing/modifying existing ones with the numerous tools available in the Galaxy tool sheds. Here, we report a series of use cases of Metavisitor and we show that it provides biologists and medical practitioners with an easy-to-use and adaptable software for the detection or identification of viruses from high-throughput sequence datasets.

Experimental Procedures

Metavisitor consists of a set of Galaxy tools (Fig 1) that can be combined to (i) retrieve up-to-date nucleotide as well as protein sequences of viral genomes deposited in Genbank [17] and index these sequences for subsequent alignments; (ii) extract sequencing reads that do not align to the host genomes, known symbionts or parasites; (iii) perform de novo assembly of these reads using assembly tools available in Galaxy, align the de novo contigs against the viral nucleotide or protein blast databases using blastn or blastx, respectively, and generate reports from blast outputs to help in known viruses diagnosis or in candidate virus discovery; (iv) use CAP3 (optional, see Use Case 3–3), blast and viral scaffolds for selected viruses to generate guided final viral sequence assemblies of blast sequence hits. Below, we group analysis steps in functional tasks i to iv and provide details on the Metavisitor tools. These tasks are linked together to build full workflows adapted to the analysis of the use cases described in the result section.
Fig 1

Global view of the Metavisitor workflow.

The workflow is organised in sub workflows (dashed line) corresponding to functional tasks as described in the manuscript. All Galaxy Tools (square boxes) are available in the main Galaxy tool shed ().

Global view of the Metavisitor workflow.

The workflow is organised in sub workflows (dashed line) corresponding to functional tasks as described in the manuscript. All Galaxy Tools (square boxes) are available in the main Galaxy tool shed ().

(i) Get reference viral sequences

The “Get reference viral sequences” task is performed using the “Retrieve FASTA from NCBI” tool that sends a query string to the Genbank database [17] and retrieves the corresponding nucleotide or protein sequences. For the viral nucleotide and protein sequences referred to as “vir1”, we used the tool to query Genbank (oct 2015) and retrieve viruses sequences filtered out from cellular organisms and bacteriophage sequences (see S1 Fig). However, users can change the tool settings by entering query strings that fit their specific needs. As retrieving vir1 from NCBI takes several hours, we allow users to skip the step by directly accessing the nucleotides or protein vir1 datasets on the Mississippi server (http://mississippi.fr) or to download them from figshare (). For convenience, nucleotide and protein blast indexes of vir1 are also available in the public library of the Mississippi server,but can also be generated using the “NCBI BLAST+ makeblastdb” Galaxy tool [18]. Bowtie [19] as well as bowtie2 [20] indexes of the vir1 nucleotide sequences have been generated in the Mississippi Galaxy instance using the corresponding “data manager” Galaxy tools. Finally, users can upload their own viral nucleotide and protein sequences using ftp and transfer them to a Galaxy history (Fig 1), where they can use the Galaxy data manager tools to produce the blast and bowtie indexes necessary for Metavisitor.

(ii) Prepare data

The “Prepare data” task (Fig 1) processes Illumina sequencing datasets in order to optimize the subsequent de novo assembly of viral sequencing reads. Fastq files of sequence reads are first clipped from library adapters and converted to fasta format using our tool “Clip adapter” tool (S1 Table). The clipped reads may be further converted to a fasta file of unique sequences headed by a character string that contains a unique identifier and the number of times that the sequences were found in the dataset, thus reducing the size of the dataset without loss of information. This optional treatment removes sequence duplicates and drastically reduces the workload of the next steps as well as the coverage variations after de novo assembly (see Use Cases 1–1 to 1–3). Clipped reads are then depleted from non-viral sequences by sequential alignments to the host genome, to other genomes from known or potential symbionts and parasites, as well as to PhiX174 genome sequences which are commonly used as internal controls in Illumina sequencing and may contaminate the datasets (Fig 1). The sequence reads that did not match the reference genomes are retained and returned as a fasta file that can be used subsequently by a de novo assembly tool. Note that these subtraction steps can be skipped when the host genome is not known or if the aim of the user is to discover endogenous viral elements [21].

(iii) Assembly, Blast and Parsing

De novo assembly

In the task “Assemble, Blast and Parse” (iii), retained RNA sequences are subjected to de novo assembly. For short reads (<50 nt), we tested several rounds of de novo assembly by Velvet [7] using the Oases software package [22] (Fig 1) and k-mer lengths ranging from 15 to 35 (S1 Table). For reads between 50 nt and 100 nt, we also used the Oases with k-mer lengths ranging from 13 to 69. Finally in Use Case 3–3, we used the Trinity assembly software which is available as a Galaxy tool and was reported to performs well with long reads [23]. Trinity as well as SPAdes [24] assembly softwares were also tested as alternate option to Oases in the Use Case 2–2 (S1 Table), giving similar outputs. It is noteworthy that users can adapt a Metavisitor workflow using any assembly software available in the Galaxy tool shed.

Blast

Next, de novo assembled contigs are aligned to both nucleotide and protein vir1 BLAST databases built from the viral reference sequences (Fig 1) using the blastn or blastx Galaxy tools [18]. These tools search nucleotide or protein databases using nucleotide or translated nucleotide queries, respectively [25]. The default parameters are adjusted in order to report only the 5 best alignments per contig (Maximum hits option is set to 5) and to generate a tabular blast output that includes the 12 standard columns plus a column containing the length of the aligned subject sequences (extended columns option, “slen” checked). Note that this additional column in the blast output is required for subsequent parsing of the blast output by the “Parse blast output and compile hits” tool.

Parsing

Tabular outputs generated by blastn and blastx alignments are processed by the “Parse blast output and compile hits” tool (S1 Table), which returns 4 files, namely “blast analysis, by subjects”, “hits”, “Blast aligned sequences” and “Blast unaligned sequences”. In the “blast analysis, by subjects” file (S2 Fig), the subject sequences in the viral nucleotide or protein blast databases that produced significant blast alignments (hits) with de novo assembled contigs are listed, together with those contigs and hit information (% Identity, Alignment Length, start and end coordinates of hits relatively to the subject sequence, percentage of the contig length covered by the hit, E-value and Bit Score of the hit). In addition, for each subject sequence in the list, the length in nucleotide or amino-acid of the subject sequence (Subject Length), the summed coverage of the subject by all contig hits (Total Subject Coverage) as well as the fraction of the subject length that this coverage represents (Relative Subject Coverage), and the best (Best Bit Score) and mean (Mean Bit Score) bit scores produced by contig hits are computed and indicated. A simplified output can be generated without contigs and blast information by using the “compact” option for the reporting mode of the “Parse blast output and compile hits” tool. Note that the total and relative subject coverages indicate how much of the virus sequence is covered by the reconstructed contigs, whereas the Bit scores allow to estimate the distances between the reconstructed contigs and the subject sequence. The “hits” file contains the sequences of contig portions that produced significant alignment in the BLAST step (i.e. query hit sequences), flanked by additional contig nucleotides 5’ and 3’ to the hit (the size of these margins is set to 5 by default and can be modified by the user). These margins allow to include sequences that might not have significant homology but could still be of viral origin. Finally, the “Blast aligned sequences” file contains contigs that produced significant blast hits, whereas the “Blast unaligned sequences” file contains those that did not.

(iv) Blast-Guided Scaffolding

This last task allows to integrate hit sequences matching a candidate virus into a virus scaffold (Fig 1). First, blastn or blastx hits are retrieved from the “hits” file using the tool “Pick Fasta sequences” (S1 Table) and the appropriate query string (for instance, “Dengue” will retrieve hit sequences that significantly blast aligned with Dengue virus sequences). Next, these hit sequences can be further clustered in longer contigs using the “cap3 Sequence Assembly” Galaxy tool (S1 Table) adapted from CAP3 [26]. Finally, if there are still multiple unlinked contigs at this stage, they can be integrated (uppercase characters) in the matched viral sequence taken as a scaffold (lowercase characters). This scaffolding is achieved by (a) retrieving the viral sequence from the NCBI nucleotide database to be used as the backbone of the scaffold, generating a blast index from this sequence and aligning the contigs to this index with blastn or tblastx tools (b) running the “blast_to_scaffold” tools (S1 Table), taking as inputs the contigs, the viral guide sequence and the blastn or blastx output (Fig 1, bottom).

Availability of Metavisitor

All Metavisitor tools, workflows and use cases are available on the Galaxy server . Readers can import in their personal account the published Metavisitor use case histories and their corresponding workflows to re-run the described analyses or adapt them to their studies. We made all tools and workflows that compose Metavisitor available from the main Galaxy tool shed (), in the form of a tool suite (suite_metavisitor_1_2) which thus can be installed and used on any Galaxy server instance. The Metavisitor workflows are also available from the myexperiment repository () They can be freely modified or complemented with additional analysis steps within the Galaxy environment. The Metavisitor tool codes are accessible in our public GitHub repository (). We also provide a Docker image artbio/metavisitor:1.2 as well as an ansible playbook that both allow to deploy a Galaxy server instance with preinstalled Metavisitor tools and workflows in local infrastructures. Extensive documentation on how to install and use Metavisitor is available at https://artbio.github.io/Metavisitor-manual/.

Results

The strategy implemented by Metavisitor (Fig 1) is to perform de novo assembly of sequencing reads and to detect contigs of viral origin through blast alignments to a nucleotide or protein sequence database of known viruses (vir1). These contig alignments can be further clustered to reconstruct a viral genome. Below, we report use cases to demonstrate the use of Metavisitor in specific situations. For each use case, we briefly present the purpose of the original study from which the datasets originate and we describe an adapted Metavisitor workflow as well as its main outputs. Readers can further examine the workflows (https://mississippi.snv.jussieu.fr/workflow/list_published) and use case analyses (https://mississippi.snv.jussieu.fr/history/list_published) in every detail at http://mississippi.fr. Indicative execution times of the workflows are given in S2 Table.

1. Detection of known viruses

Use Cases 1–1, 1–2 and 1–3: detection and reconstruction of the Nora virus genome in small RNA sequencing datasets

Using small RNA sequencing libraries SRP013822 (EBI ENA) and the Paparazzi software [4] we were previously able to propose a novel reference genome (NCBI JX220408) for the Nora virus strain infecting Drosophila melanogaster stocks in laboratories [3]. This so-called rNora genome differed by 3.2% nucleotides from the Nora virus reference NC_007919.3 and improved the alignment rate of viral siRNAs by ~121%. Thus, we first tested Metavisitor on the small RNA sequencing datasets SRP013822 using the Oases de novo assembly tool which is well suited to assembly of short read [9]. Three Metavisitor workflows were run on the merged SRP013822 small RNA sequence reads and the NC_007919.3 genome as a guide for final scaffolding. The workflow for Use Case 1–1 (S3 Fig) used raw reads collapsed to unique sequences (experimental procedures) to reconstruct a Nora virus genome referred to as Nora_MV (S1 File). In a second workflow for Use Case 1–2 (S4 Fig), we did not collapse the SRP013822 reads to unique sequences, which allowed the reconstruction of a Nora_raw_reads genome (S2 File). Finally, the workflow for Use Case 1–3 (S5 Fig) normalized the abundances of SRP013822 sequence reads using the Galaxy tool “Normalize by median” [27] and reconstructed a Nora_Median-Norm-reads genome (S3 File). All three reconstructed genomes as well as the Paparazzi-reconstructed JX220408 genome had a high sequence similarity (>96.6% nucleotide identity) with the NC_007919.3 guide genome (S4 File). The final de novo (capital letters) assemblies of both the Nora_raw_reads and Nora_Median-Norm-reads genomes entirely covered the JX220408 and NC_007919.3 genomes (both 12333 nt), whereas the de novo assembled part of the Nora_MV genome was marginally shorter (12298 nt, the 31 first 5’ nucleotides are in lowercase to indicate that they were not de novo assembled but instead recovered from the guide genome). To evaluate the quality of assemblies, we remapped the SRP013822 reads to the 3 reconstituted Nora virus genomes as well as to the JX220408 guide genome using the “workflow for remapping in Use Cases 1–1,2,3” (S6 Fig). As can be seen in Fig 2, SRP013822 reads matched the genomes with almost identical profiles and had characteristic size distributions of viral siRNAs with a major peak at 21 nucleotides. Importantly, the numbers of reads re-matched to the Nora virus genomes were 1,578,704 (Nora_MV) > 1,578,135 (Paparazzi—JX220408) > 1,566,909 (Nora_raw_reads) > 1,558,000 (Nora_Median-Norm-reads) > 872,128 (NC_007919.3 reference genome guide).
Fig 2

Realignments of small RNA sequence reads to reconstructed (Nora_MV, Nora_raw_reads and Nora_Median−Norm−reads) or published (JX220408.1 and NC_007919.3) Nora virus genomes.

Plots (left) show the abundance of 18–30-nucleotide (nt) small RNA sequence reads matching the genome sequences and histograms (middle) show length distributions of these reads. Positive and negative values correspond to sense and antisense reads, respectively. Total read counts are indicated to the right hand side.

Realignments of small RNA sequence reads to reconstructed (Nora_MV, Nora_raw_reads and Nora_Median−Norm−reads) or published (JX220408.1 and NC_007919.3) Nora virus genomes.

Plots (left) show the abundance of 18–30-nucleotide (nt) small RNA sequence reads matching the genome sequences and histograms (middle) show length distributions of these reads. Positive and negative values correspond to sense and antisense reads, respectively. Total read counts are indicated to the right hand side. Thus, Metavisitor reconstructed a Nora virus genome Nora_MV whose sequence maximizes the number of vsiRNA read alignments which suggests it is the most accurate genome for the Nora virus present in the datasets. Of note, the Nora_MV genome differs from the JX220408 rNora genome generated by Paparazzi by only two mismatches at positions 367 and 10707, and four 2nt-deletions at positions 223, 365, 9059 and 12217 (see S4 File). These variations did not change the amino acid sequence of the 4 ORFs of the Nora virus. We conclude that Metavisitor performs slightly better than Paparazzi for a known virus, using de novo assembly of small RNA reads followed by blast-guided scaffolding. We did not observe any benefits of using raw reads or normalized-by-median reads for de novo assembly with Oases, but rather a decrease in the accuracy of the reconstructed genome as measured by the number of reads re-mapped to the final genomes (Fig 2).

Use Case 1–4: detection of multiple viruses in small RNA sequencing datasets

In order to show the ability of Metavisitor in detecting multiple known viruses in small RNA sequencing datasets, we built another workflow Case that performs blastn alignments of Oases contigs on the vir1 reference and reports for all significant alignments without filtering (S7 Fig). Applying this workflow to the SRP013822 sequence datasets produced a list of alignments which contains, as expected, the Nora virus. In addition, contigs were found to align with high significance (Mean BitScore > 200) to the Drosophila A virus and to the Drosophila C virus (S5 File and Table 1), strongly suggesting that the fly stocks analyzed in our previous work were also subject to persistent infection by these viruses [3].
Table 1

Report table generated by the “Parse blast output and compile hits” tool in Use Case 1–4 showing the presence of Drosophila A virus and Drosophila C virus in addition to the Nora virus in the small RNA sequencing of laboratory Drosophila.

See Method section for a description of the columns.

subjectsubject lengthTotal Subject Coverage (nt)Relative Subject CoverageBest Bit ScoreMean Bit Score
gi|157325505|gb|DQ321720.2|_Nora_virus,_complete_genome11908102110.857118404041
gi|822478532|gb|KP970099.1|_Nora_virus_isolate_RAKMEL13_gp1_(gp1)_gene,_partial_cds;_and_replicatio1141687360.765114413673
gi|822478537|gb|KP970100.1|_Nora_virus_isolate_GEO58_gp1_(gp1)_gene,_partial_cds;_and_replication_p1141624630.21640283607
gi|346421290|ref|NC_007919.3|_Nora_virus,_complete_genome12333105300.854118092653
gi|284022350|gb|GQ257737.1|_Nora_virus_isolate_Umea_2007,_complete_genome12333105300.854118092573
gi|822478527|gb|KP970098.1|_Nora_virus_isolate_AM04_gp1_(gp1)_gene,_partial_cds;_and_replication_po1141376540.67157452489
gi|822478512|gb|KP970095.1|_Nora_virus_isolate_RAK11_gp1_(gp1)_and_replication_polyprotein_(gp2)_ge1141661740.54153682419
gi|822478141|gb|KP969947.1|_Drosophila_A_virus_isolate_ywiP_DrosophilaA_RNA-dependent_RNA_polymeras451641570.92169802361
gi|402295620|gb|JX220408.1|_Nora_virus_isolate_FR1,_complete_genome12333123020.997127202324
gi|822478147|gb|KP969949.1|_Drosophila_A_virus_isolate_delta11_DrosophilaA_RNA-dependent_RNA_polyme448144420.99170812264
gi|822478417|gb|KP970078.1|_Nora_virus_isolate_D167_gp1_(gp1),_replication_polyprotein_(gp2),_gp3_(1189573150.61555032192
gi|822478517|gb|KP970096.1|_Nora_virus_isolate_K09_gp1_(gp1)_gene,_partial_cds;_and_replication_pol1141920270.17834902050
gi|822478144|gb|KP969948.1|_Drosophila_A_virus_isolate_XIB_DrosophilaA_RNA-dependent_RNA_polymerase451645070.99870922009
gi|822478150|gb|KP969950.1|_Drosophila_A_virus_isolate_Qdelta_DrosophilaA_RNA-dependent_RNA_polymer447644460.99370921847
gi|822478440|gb|KP970082.1|_Nora_virus_isolate_RAKMEL12_gp1_(gp1),_replication_polyprotein_(gp2),_g1196872140.60363961815
gi|822478497|gb|KP970092.1|_Nora_virus_isolate_delta11_gp1_(gp1)_gene,_partial_cds;_replication_pol1115723470.21033141695
gi|822478522|gb|KP970097.1|_Nora_virus_isolate_JJ17_gp1_(gp1)_gene,_partial_cds;_and_replication_po114209930.08716741659
gi|822478482|gb|KP970089.1|_Nora_virus_isolate_IM13_gp1_(gp1)_gene,_partial_cds;_replication_polypr1110318280.16519771565
gi|225356593|gb|FJ150422.1|_Drosophila_A_virus_isolate_HD,_complete_genome480647530.98969021419
gi|822478430|gb|KP970080.1|_Nora_virus_isolate_MONSIM03_gp1_(gp1),_replication_polyprotein_(gp2),_g1196820860.17429931416
gi|822478445|gb|KP970083.1|_Nora_virus_isolate_SAF04_gp1_(gp1),_replication_polyprotein_(gp2),_gp3_111426490.05811211121
gi|822478135|gb|KP969945.1|_Drosophila_A_virus_isolate_XID_DrosophilaA_RNA-dependent_RNA_polymerase451640740.90231031112
gi|822478403|gb|KP970076.1|_Nora_virus_isolate_ATH56_gp1_(gp1)_and_replication_polyprotein_(gp2)_ge1196523440.19628121025
gi|822478412|gb|KP970077.1|_Nora_virus_isolate_IM09_gp1_(gp1),_replication_polyprotein_(gp2),_gp3_(1196536890.30829891023
gi|822478435|gb|KP970081.1|_Nora_virus_isolate_MON28_gp1_(gp1)_and_replication_polyprotein_(gp2)_ge1196758590.49054451004
gi|822478507|gb|KP970094.1|_Nora_virus_isolate_K02_gp1_(gp1)_gene,_partial_cds;_replication_polypro1116012890.116957778
gi|822478132|gb|KP969944.1|_Drosophila_A_virus_isolate_wipe_DrosophilaA_RNA-dependent_RNA_polymeras451630700.6803097742
gi|822478477|gb|KP970088.1|_Nora_virus_isolate_IM12_gp1_(gp1)_gene,_partial_cds;_replication_polypr114139520.0831153732
gi|253761971|ref|NC_012958.1|_Drosophila_A_virus,_complete_genome48066070.1261045674
gi|822478542|gb|KP970101.1|_Nora_virus_isolate_SAFSIM01_gp1_(gp1)_gene,_partial_cds;_and_replicatio114133840.034661661
gi|822478138|gb|KP969946.1|_Drosophila_A_virus_isolate_LJ35_DrosophilaA_RNA-dependent_RNA_polymeras44689590.215848502
gi|9629650|ref|NC_001834.1|_Drosophila_C_virus,_complete_genome926463450.6851276445
gi|2388672|gb|AF014388.1|_Drosophila_C_virus_strain_EB,_complete_genome926465870.7111276431
gi|300871949|gb|GU983882.2|_Drosophila_C_virus_isolate_ZW141_polyprotein_gene,_partial_cds5002720.544482395
gi|300871965|gb|GU983892.2|_Drosophila_C_virus_isolate_psjmg_polyprotein_gene,_partial_cds5003100.620491353
gi|300871979|gb|GU983900.2|_Drosophila_C_virus_isolate_AL7_polyprotein_gene,_partial_cds5002710.542489342
gi|300871957|gb|GU983888.2|_Drosophila_C_virus_isolate_Bam73_H_polyprotein_gene,_partial_cds5004530.906491323
gi|300871941|gb|GU983878.2|_Drosophila_C_virus_isolate_mel15_H_polyprotein_gene,_partial_cds5004530.906489321
gi|300871955|gb|GU983885.2|_Drosophila_C_virus_isolate_16a9_polyprotein_gene,_partial_cds4901510.308273262
gi|300871953|gb|GU983884.2|_Drosophila_C_virus_isolate_Tam15_polyprotein_gene,_partial_cds5001510.302273262

Report table generated by the “Parse blast output and compile hits” tool in Use Case 1–4 showing the presence of Drosophila A virus and Drosophila C virus in addition to the Nora virus in the small RNA sequencing of laboratory Drosophila.

See Method section for a description of the columns.

2. Discovery of novel viruses

Use Case 2–1: identification of new viruses in small RNA sequencing datasets

Using Metavisitor, we recently discovered two novel viruses infecting a laboratory colony of Anopheles coluzzii mosquitoes [28]. In this case, a workflow (S8 Fig) was used to process small RNA datasets from these mosquitoes (EBI SRA ERP012577) and to assemble a number a Oases contigs that show significant blastx hits with Dicistroviridae proteins, including Drosophila C virus (DCV) and Cricket paralysis virus (CrPV) proteins (S6 File). The viral family of Dicistroviridae was named from the dicistronic organisation of their genome with a 5’ open reading frame encoding a non-structural polyprotein and a second non-overlapping 3’ open reading frame encoding the structural polyprotein. In order to construct a potential new A. coluzzii dicistrovirus genome, the “Pick Fasta Sequences” tool (S8 Fig) collected blastx hits showing significant alignment with both Drosophila C virus and Cricket paralysis viral polyproteins (S7 File) that were further clustered with the “cap3 Sequence Assembly” tool in 4 contigs of 1952, 341, 4688 and 320 nt, respectively (S8 File). These 4 contigs were further aligned to the DCV genome NC_001834.1 sequence with tblastx and integrated in this scaffold sequence with the “blast_to_scaffold” tool to produce a final assembly (S9 File). Re-mapping of the ERP012577 small RNA reads using the “Workflow for remapping in Use Cases 1–1,2,3” (S6 Fig) showed that they mostly align to de novo assembled regions (uppercase nucleotides) of this chimeric genome and have a typical size distribution of viral derived siRNA (S9 Fig), suggesting that the NC_001834.1 DCV sequences of the scaffold (lowercase nucleotides) are loosely related to the actual sequence of the novel A. coluzzii dicistrovirus. Nevertheless, the composite assembly allowed designing primers in the de novo assembled regions to PCR amplify and sequence the regions of the viral genome that could not be de novo assembled [28]. Several teams have used siRNA signature (a peak at 21 nt in the size distribution of re-aligned small RNA sequences) as an alternate approach to sequence similarity to identify contigs of potential viral origin [12,13]. In order to further illustrate the flexibility of Metavisitor for implementing this strategy, we built a workflow (S10 Fig) to realign ERP012577 small RNA sequences to Oases contigs of length higher than 300 nt and to generate in batch read maps and read size distributions for these contigs using the “Generate readmap and histograms from alignment files” tool (S1 Table). We manually inspected these read maps and size distributions (S10 File) and collected all contigs with a clear siRNA signature (a pick at 21nt for both forward and reverse strands of contig sequences), 2 sets of contigs with a modest excess of 21nt reads from the forward strand only and 3 sets of contigs with no siRNA signature as negative controls (S3 Table). With the notable exceptions of loci 3 and 46 contigs, all contigs with a clear siRNA signature blastx aligned to vir1 viral sequences (S3 Table and see here the public Galaxy history for details). Loci 3 and 46 contigs did not align either to the non-redundant protein database of the NCBI and may therefore be of potential viral origin (S3 Table). All 5 negative control contigs with unclear or no RNA signature only aligned significantly to non-viral proteins (S3 Table). Together, these results illustrate the use of Metavisitor to implement a sequence-independent strategy based on siRNAs for virus identification [12,13].

Use Case 2–2: identification of new viruses in mRNA sequencing datasets

In our study [28], we also used RNAseq libraries from the same A. coluzzii colony (EBI-SRA, ERS977505), demonstrating the use of a Metavisitor workflow for long RNA sequencing read datasets (S11 Fig). Thus, 100nt reads were aligned without adapter clipping to the Anopheles gambiae genome using bowtie2, and unmatched read were subjected to Oases assembly (kmer range 25 to 69, to take into account longer reads). Oases contigs were then filtered for a size > 5000 nt and aligned to the protein viral reference using blastx. Parsing of blastx alignments with the “blast analysis, by subjects” tool repeatedly pointed to a 8919nt long Oases contig that matched to structural and non-structural polyproteins of DCV and CrPV (S11 File). This 8919nt contig (S12 File) completely includes the contigs generated with the small RNA datasets (S8 File) and shows a dicistronic organization which is typical of Dicistroviridae and is referred to as a novel Anopheles C Virus [28]. The sequence of this Anopheles C Virus is deposited to the NCBI nucleotide database under accession number KU169878. As expected, when realigned to this genome (S12 Fig), the ERP012577 small RNA reads now show a typical alignment profile all along the AnCV genome sequence with a size distribution peaking at the 21nt length of viral derived siRNAs and no gap (Fig 3). Of note, we tested in Use Case 2–2 two alternate workflows substituting the Oases assembly tool with Trinity (S13 Fig) and SPAdes (S14 Fig), respectively. Both these workflows were equally able to assemble the genome KU169878 of the Anopheles C Virus (S13 File).
Fig 3

Alignments of small RNA sequence reads to the Anopheles C virus genome reconstructed in Use Case 2–2.

Plot shows the abundance of 18–30-nucleotide (nt) small RNA sequence reads matching the genome sequence and histogram shows the length distribution of these reads. Positive and negative values correspond to sense and antisense reads, respectively.

Alignments of small RNA sequence reads to the Anopheles C virus genome reconstructed in Use Case 2–2.

Plot shows the abundance of 18–30-nucleotide (nt) small RNA sequence reads matching the genome sequence and histogram shows the length distribution of these reads. Positive and negative values correspond to sense and antisense reads, respectively. Taken together, the Metavisitor Use Cases 2–1 and 2–2 illustrate that when short read datasets do not provide enough sequencing information, an adapted Metavisitor workflow (S11 Fig) is able to exploit long reads of RNA sequencing datasets, if available, to assemble a complete viral genome [28].

3. Virus detection in human RNAseq libraries

Having illustrated that Metavisitor is able to generate robust genome assemblies from known and novel viruses in Drosophila and Anopheles sequencing datasets, we tested whether it can be used to diagnose viruses in RNA sequencing datasets of human patients from three different studies [29-31].

Use Case 3–1

Innate lymphoid cells (ILCs) play a central role in response to viral infection by secreting cytokines crucial for immune regulation, tissue homeostasis, and repair. Therefore, the pathogenic effect of HIV on these cells was recently analyzed in infected or uninfected patients using various approaches, including transcriptome profiling [30]. ILCs are unlikely to be infected in vivo by HIV as they lack expression of the CD4 co-receptor of HIV and they are refractory in vitro to HIV infection. However, we reasoned that ILCs samples could still be contaminated by infected cells. This might allow Metavisitor to detect and assemble HIV genomes from patient’s ILC sequencing data (EBI SRP068722). We imported 40 ICL sequence datasets from the EBI SRP068722 archive and merged the datasets belonging to the same patients. As the data contained short 32 nt reads that in addition had to be 3’ trimmed to 27 nt to retain acceptable sequence quality, we designed a workflow for Use Case 3–1 (S15 Fig) that is similar to the workflows used in cases 1–1 and 2–1 for small RNA sequencing data. Thus, the sequencing datasets were depleted from reads aligning to the human genome (hg19) and viral reads were selected by alignment to the NCBI viral sequences using the sRbowtie tool (S1 Table). These reads were further submitted to Oases assembly (kmers 11 to 27, to take into account short reads) and the resulting contigs were aligned to the Nucleotide Viral Blast Database using blastn. Alignments were parsed using the “Parse blast output and compile hits” tool, removing alignments to NCBI sequences related to patents to simplify the report (“Patent” term in the filter option of the “Parse blast output and compile hits” tool). A final report was generated by concatenating the reports produced by this tool for each patient (S14 File and Table 2). In summary, we were able to detect HIV RNAs in samples from 3 out of 4 infected patients whereas all samples from control uninfected patients remained negative for HIV. This Metavisitor workflow was able to accurately detect HIV RNA, even in samples where the number of sequence reads was expected to be low.
Table 2

HIV detection in ILC patient samples of Use Case 3–1.

The table summarizes the report generated by Metavisitor from a batch of 40 sequence datasets (S14 File). Metadata associated with each indicated sequence dataset as well as the ability of Metavisitor to detect HIV in datasets and patients are indicated.

# GSM IDID-1PatientTreatment (SRR annotations)HIV statusDays post HIV infectionTreat. statusSRR IDMetavisitor HIV detection by samplenumber of raw readsnumber of raw reads by patientHistory for Use Case 3–1
GSM20437301103140450–318ILC2HIV+1untreatedSRR3111582+7 013 96234 252 732gi|45357423|gb|AY535449.1, gi|45357419|gb|AY535447.1
GSM20437311103140450–318ILC3HIV+1untreatedSRR3111583-3 246 980
GSM20437321803140450–318ILC2HIV+7untreatedSRR3111584+2 833 634
GSM20437331803140450–318ILC3HIV+7untreatedSRR3111585+2 989 628
GSM20437341704140450–318ILC2HIV+38untreatedSRR3111586-16 248 912
GSM20437351704140450–318ILC3HIV+38untreatedSRR3111587-1 919 616
GSM20437361106140387–272ILC2HIV+1untreatedSRR3111588-60 342 796227 307 414-
GSM20437371106140387–272ILC3HIV+1untreatedSRR3111589-34 189 278
GSM20437381706140387–272ILC2HIV+7untreatedSRR3111590-38 030 394
GSM20437391706140387–272ILC3HIV+7untreatedSRR3111591-29 100 534
GSM20437402907140387–272ILC2HIV+49untreatedSRR3111592-43 022 506
GSM20437412907140387–272ILC3HIV+49untreatedSRR3111593-22 621 906
GSM2043742412140629–453Acute ART+ ILC2HIV+1ARTSRR3111594+5 061 92054 052 098gi|296033826|gb|GU474419.1, gi|269294806|dbj|DM461231.1, gi|269294805|dbj|DM461230.1, gi|296556482|gb|AF324493.2gi|296556485|gb|M19921.2, gi|45357419|gb|AY535447.1, gi|45357423|gb|AY535449.1
GSM2043743412140629–453Acute ART+ ILC3HIV+1ARTSRR3111595-8 455 026
GSM20437441012140629–453Acute ART+ ILC2HIV+6ARTSRR3111596-12 451 684
GSM20437451012140629–453Acute ART+ ILC3HIV+6ARTSRR3111597+6 419 868
GSM20437461301150629–453Acute ILC2HIV+40ARTSRR3111598+6 837 584
GSM20437471301150629–453Acute ILC3HIV+40ARTSRR3111599+14 826 016
GSM20437481507140444–3123dR10 ILC2HIV+2ARTSRR3111600+15 618 28239 610 902gi|45357423|gb|AY535449.1, gi|45357419|gb|AY535447.1
GSM20437491507140444–3123dR10 ILC3HIV+2ARTSRR3111601+13 491 804
GSM20437502208140444–312ILC2HIV+41ARTSRR3111602-5 259 104
GSM20437512208140444–312ILC3HIV+41ARTSRR3111603-5 241 712
GSM2043752108140500-355negILC2HIV-uninfectednoneSRR3111604-802 63211 691 304-
GSM2043753108140500-355negILC3HIV-uninfectednoneSRR3111605-10 888 672
GSM2043754808140292-xxxnegILC2HIV-uninfectednoneSRR3111606-5 418 95819 222 152-
GSM2043755808140292-xxxnegILC3HIV-uninfectednoneSRR3111607-13 803 194
GSM2043756907140394–274ILC2HIV-uninfectednoneSRR3111608-13 779 57015 991 428-
GSM2043757907140394–274ILC3HIV-uninfectednoneSRR3111609-2 211 858
GSM20437581707140218-162negILC2HIV-uninfectednoneSRR3111610-9 838 77618 939 560-
GSM20437591707140218-162negILC3HIV-uninfectednoneSRR3111611-9 100 784
GSM20437601803140311-217HIVnegILC2HIV-uninfectednoneSRR3111612-2 281 5607 490 832-
GSM20437611803140311-217HIVnegILC3HIV-uninfectednoneSRR3111613-5 209 272
GSM20437622305140440-307negILC2HIV-uninfectednoneSRR3111614-11 816 18621 714 164-
GSM20437632305140440-307negILC3HIV-uninfectednoneSRR3111616-9 897 978
GSM20437642406140518-370negILC2HIV-uninfectednoneSRR3111617-16 135 60216 671 200-
GSM20437652406140518-370negILC3HIV-uninfectednoneSRR3111618-535 598
GSM20437662907140560-420negILC2HIV-uninfectednoneSRR3111619-1 235 76612 912 002-
GSM20437672907140560-420negILC3HIV-uninfectednoneSRR3111620-11 676 236
GSM20437682907140575-419negILC2HIV-uninfectednoneSRR3111621-8 713 81611 833 416-
GSM20437692907140575-419negILC3HIV-uninfectednoneSRR3111622-3 119 600

HIV detection in ILC patient samples of Use Case 3–1.

The table summarizes the report generated by Metavisitor from a batch of 40 sequence datasets (S14 File). Metadata associated with each indicated sequence dataset as well as the ability of Metavisitor to detect HIV in datasets and patients are indicated.

Use Case 3–2

Yozwiak et al. searched the presence of viruses in RNA Illumina sequencing data from serums of children suffering from fevers of unknown origins [29]. In this study, paired-end sequencing datasets were depleted from reads aligning to the human genome and the human transcriptome using BLAT and BLASTn, respectively, and the remaining reads were aligned to the NCBI nucleotide database using BLASTn. A virus was considered identified when 10 reads or more aligned to a viral genome which was not tagged as a known lab contaminant. For a significant number of Patient IDs reported in Table 1 of the article [29], we were not able to find the corresponding sequencing files in the deposited EBI SRP011425 archive. In addition, we did not find the same read counts for these datasets as those indicated by the authors. With these limitations in mind, we downloaded 86 sequencing datasets that could be further concatenated and assigned to 36 patients in Yozwiak et al [29]. As sequence reads in SRP011425 datasets are 97 nt long, we adapted a workflow for this Use Case 3–2 (S16 Fig) from the one used in the Use Case 3–1 with the following modifications: (i) sequences reads were depleted from human sequences and viral reads were selected by alignment to the NCBI viral sequences using the Galaxy bowtie2 tool (S1 Table) instead of the sRbowtie tool; (ii) viral reads were submitted to Oases assembly using kmer values ranging from 13 to 69 to take into account long reads; (iii) the SAM file with reads alignments to the vir1 bowtie2 index was parsed using the “join” and “sort” Galaxy tools in order to detect putative false negative datasets with viral reads that fails to produce significant Oases viral contigs. This workflow generated a report file (S15 File) summarized in Table 3. The results show that Metavisitor detected the same viruses as those reported by Yozwiak et al. in 17 patients. Although viral reads were detected in 16 other patients, they were not covering sufficient portions of viral genomes to produce significant viral assemblies. Finally, in the three remaining patients (patients 363, 330 and 345 in Table 3), we detected viruses (Dengue virus 2, Stealth virus 1 and Dengue virus 4, respectively) other than those identified by Yozwiak et al. These discrepancies are most likely due to misannotation of some of the deposited datasets, which precludes further detailed comparisons.
Table 3

Summary of virus detection in 36 traceable patients of the Use Case 3–2.

The Data of this table were extracted from the Metavisitor report file available as S15 File. Values of the column “Coverage of complete viral genome (%)” correspond to the fractions (in %) of the complete viral genomes that are covered by blast hits of viral contigs to these genomes and values of the column “Mean blast bit score” correspond to the mean values of the bit scores observed for these blast hits. Note that blast alignments to incomplete viral genomes were not taken into account. For detection of false positives, reads were aligned to the bowtie2 vir1 index before de novo assembly and counts of these reads were reported in the column “Read mapping to vir1 using bowtie2”).

Extracted from Yozwiak et al. Table 1Metavisitor
Patient ID# virus reads# initial readsFraction virus readsYozwiak et al. Virus detection# reads in NGS datasetsMetavisitor Virus detectionCoverage of complete viral genome (%)Mean blast bit scoreRead mapping to vir1 using bowtie2ENA-RUN
5662061.90E+061.08E-04Torque teno mini virus 41.07E+06none--No Significant alignmentsSRR453487
438724.40E+061.64E-05Human herpesvirus 63.20E+06Human herpesvirus 60.36239.349 Human_herpesvirus_6SRR453437
40121641.80E+051.20E-02Hepatitis A virus9.94E+05Hepatitis A virus69.74735.26154 reads Hepatitis_A_virusSRR453443,SRR453458
382449.60E+054.58E-05Human herpesvirus 42.33E+06none--38 Human_herpesvirus_4SRR453430
377816.60E+061.23E-05Cyclovirus PK50344.91E+06Circovirus-like_NI/2007-320.45212.058 reads Circovirus-like_NI/2007-376 Torque teno virusSRR453491
375533.80E+061.39E-05Porcine circovirus 12.65E+06Circovirus-like_NI/2007-320.63214.066 reads Circovirus-like_NI/2007-3SRR453499
350483.00E+061.60E-05Human herpesvirus 6, Torque teno mini virus 41.93E+06none--29 Human_herpesvirus_6128 Torque teno mini virus 4SRR453484
349471.60E+062.94E-05Torque teno midi virus1.03E+06none--No Significant alignmentsSRR453464
345622.20E+062.82E-05Beak and feather disease virus1.29E+06Dengue_virus_40.81156.0159 Dengue_virus_465 Circovirus-like_NI/2007SRR453506
3443031.30E+062.33E-04Human herpesvirus 67.48E+05Human herpesvirus 60.17304.0184 Human_herpesvirus_6SRR453417
335471.70E+062.76E-05Torque teno virus9.77E+05none--24 Torque teno virusSRR453490
3311131.40E+068.07E-05Torque teno midi virus 28.99E+05UNVERIFIED:_Torque_teno_virus_isolate_S55,_complete_genome20.93353.4772 Torque teno virusSRR453478
330771.80E+064.28E-05TTV-like mini virus1.85E+06AF191073_Stealth_virus_1_clone_3B432.68143.014 reads Stealth_virus_1_clone_C16130_T324 reads Dengue virus 2SRR453465,SRR453480
329142.20E+066.36E-06Gull circovirus2.93E+06Circovirus-like_NI/2007-353.64339.55 reads Circovirus-like_NI/2007-312 reads Dengue virus 2SRR453489,SRR453505
3222063.80E+065.42E-05Cyclovirus PK52222.63E+06Circovirus-like_NI/2007-325.35257.0208 reads Circovirus-like_NI/2007-3SRR453498
321304.60E+066.52E-06Porcine circovirus 13.54E+06none--10 reads Circovirus-like_NI/2007-317 reads Dengue virus 2SRR453446
315421.90E+062.21E-05African swine fever virus1.59E+06none--22 reads Dengue virus 2SRR453427,SRR453440
2826991.60E+064.20E-04Dengue virus 29.66E+05Dengue virus 23.63495.0651 reads Dengue virus 2SRR453438
27515111.60E+069.70E-04Dengue virus 21.11E+06Dengue virus 210.23539.21436 reads Dengue virus 2SRR453450
274271.20E+062.30E-05Dengue virus 16.24E+05none--28 reads Dengue virus 1/2SRR453460
270281.20E+062.33E-05Human herpesvirus 66.76E+05none--20 Human_herpesvirus_6SRR453485
2661357494.80E+062.80E-02Dengue virus 23.36E+06Dengue virus 298.651852.3121347 reads Dengue virus 2SRR453448
263NDNDNDTTV (virochip)3142332Dengue virus 239.19302.075 densovirus92 Dengue virusSRR453424,SRR453457
193561.60E+063.50E-05Torque teno mini virus 29.13E+05none--6 reads Torque_teno_virus_isolate_TTV-S34SRR453510
18742801.10E+063.90E-03Dengue virus 25.55E+05Dengue virus 239.30496.83970 reads Dengue virus 2SRR453456
18617012.00E+068.51E-04Torque teno virus 151.32E+06Torque teno virus (SEN virus)55.85389.4541 SEN virus AY449524.127 reads Torque_teno_virus_15SRR453425,SRR453469
183663.00E+062.20E-05Human herpesvirus 61.97E+06none--57 Human_herpesvirus_6BSRR453481
180428.00E+055.25E-05GB virus C4.91E+06GB virus C1.26185.041 reads GB virusSRR453531
179171.80E+069.44E-06Torque teno mini virus 11.31E+06none--No Significant alignmentsSRR453474
171181.20E+061.50E-05Torque teno mini virus 27.81E+05none--No Significant alignmentsSRR453509
168NDNDNDTTV (virochip)135412none--No Significant alignmentsSRR453451
161143.00E+064.67E-06Human parvovirus B192.75E+06Human parvovirus B192.89242.079 densovirus12 reads Human parvovirus B19SRR453495,SRR453504
1591432.60E+065.50E-05Torque teno mini virus 51.73E+06Uncultured virus DNA88.14339.0No Significant alignmentsSRR453500
1562132.30E+069.26E-05Torque teno midi virus 11.54E+06Torque_teno_virus31.01314.7550 reads Torque_teno_virus_isolate_S54SRR453493
131241.20E+062.00E-05Human herpesvirus 65.46E+05Human herpesvirus 60.18263.016 reads Human_herpesvirus_6SRR453444
781131.20E+069.42E-05Human herpesvirus 65.91E+05none--68 reads Human_herpesvirus_6BSRR453426

Summary of virus detection in 36 traceable patients of the Use Case 3–2.

The Data of this table were extracted from the Metavisitor report file available as S15 File. Values of the column “Coverage of complete viral genome (%)” correspond to the fractions (in %) of the complete viral genomes that are covered by blast hits of viral contigs to these genomes and values of the column “Mean blast bit score” correspond to the mean values of the bit scores observed for these blast hits. Note that blast alignments to incomplete viral genomes were not taken into account. For detection of false positives, reads were aligned to the bowtie2 vir1 index before de novo assembly and counts of these reads were reported in the column “Read mapping to vir1 using bowtie2”).

Use case 3–3

Matranga et al. recently improved library preparation methods for deep sequencing of Lassa and Ebola viral RNAs in clinical and biological samples [31]. Accordingly, they were able to generate sequence datasets of 150 nt reads providing high coverage of the viral genomes. We used these datasets, relevant in the context of Lassa and Ebola outbreak and epidemic response, to demonstrate the versatility of Metavisitor as well as its ability to generate high throughput reconstruction of viral genomes. In order to take into account longer reads and higher viral sequencing depths in the available datasets [31], we adapted a Metavisitor workflow for Use Case 3–3 (S17 Fig) as follows: (i) sequencing reads were directly aligned to vir1 sequences using bowtie2, without prior depletion by alignment to the human or rodent hosts; (ii) the Trinity de novo assembler [23] that performs well with longer reads was used instead of Oases (S1 Table); (iii) reconstruction of Lassa and Ebola genomes from the sequences of the blast hits with the nucleotide viral blast database was directly performed with the “blast to scaffold” tool without CAP3 assembly since the Trinity contigs were already covering a significant part of the viral genomes; (iv) the reports generated by our “Parse blast output and compile hits” tool as well as the reconstructed genome generated for each sample were merged in single datasets for easier browsing and subsequent phylogenetic or variant analyses; (v) for adaptability of this workflow to any type of virus, we allowed users to specify two input variables at runtime: the name of the virus to be searched for in the analysis and the identifier of the sequence to be used as guide in genome reconstruction steps. We imported 63 sequence datasets available in the EBI SRA PRJNA254017 and PRJNA257197 archives [31] and grouped these datasets in Lassa virus (55 fastq files) and Ebola virus (8 fastq files) dataset collections (see Table 4 for description of the sequence datasets). On the one hand, we executed the workflow (S17 Fig) taking the Lassa virus dataset collection as input sequences, “Lassa” as a filter term for the “Parse blast output and compile hits” tool and the NCBI sequence NC_004297.1 as a guide for reconstruction of the Lassa virus segment L. On the other hand, we executed the workflow taking the Ebola virus dataset collection as input sequences, “Ebola” as a filter term for the “Parse blast output and compile hits” tool and the NCBI sequence NC_002549.1 as a guide for reconstruction of the Ebola virus genome.
Table 4

Summary of detection of Ebola and Lassa viruses in Use Case 3–3.

The table summarizes the Metavisitor report files available as S16 and S17 Files.

VirusBioProjectBioSample idSRX numberSRR numberSample IDBAM file nameSourceData Type, SelectionFigure, Table from Matranga et al.Metavisitor detection (Trinity)
EBOVPRJNA257197SAMN03099684SRX733660SRR1613381G3676-2G3676-2_S6_L001_001.bamHumanRNase HFigure 5+
EBOVPRJNA257197SAMN03099684SRX733656SRR1613377G3676-2G3676-2-std_S13_L001_001.bamHumanRNA seqFigure 5+
EBOVPRJNA257197SAMN03099685SRX733661SRR1613382G3677-1G3677-1_S3_L001_001.bamHumanRNase HFigure 5+
EBOVPRJNA257197SAMN03099685SRX733657SRR1613378G3677-1G3677-1-std_S10_L001_001.bamHumanRNA seqFigure 5+
EBOVPRJNA257197SAMN03099686SRX733662SRR1613383G3677-2G3677-2_S2_L001_001.bamHumanRNase HFigure 5+
EBOVPRJNA257197SAMN03099686SRX733658SRR1613379G3677-2G3677-2-std_S9_L001_001.bamHumanRNA seqFigure 5+
EBOVPRJNA257197SAMN03099687SRX733663SRR1613384G3682-1G3682-1_S4_L001_001.bamHumanRNase HFigure 5+
EBOVPRJNA257197SAMN03099687SRX733659SRR1613380G3682-1G3682-1-std_S11_L001_001.bamHumanRNA seqFigure 5+
LASVPRJNA254017SAMN02927412SRX719120SRR1595772G2431LASV678_ERCC117HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927412SRX719079SRR1595696G2431LASV678_ERCC12HumanRNA seqFigure 2+
LASVPRJNA254017SAMN02927488SRX719056SRR1595665ISTH1003LASV347_ERCC126HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927488SRX718926SRR1595500ISTH1003LASV347_ERCC17HumanRNA seqFigure 2+
LASVPRJNA254017SAMN02927485SRX718761SRR1594619ISTH0531LASV334_ERCC136HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927485SRX719205SRR1595943ISTH0531LASV334_ERCC31HumanRNA seqFigure 2+
LASVPRJNA254017SAMN02927498SRX719063SRR1595673ISTH1121LASV363_ERCC69HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927498SRX719134SRR1595797ISTH1121LASV363_ERCC43HumanRNA seqFigure 2+
LASVPRJNA254017SAMN02927489SRX719117SRR1595763ISTH1038LASV349_ERCC62HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927489SRX718979SRR1595558ISTH1038LASV349_ERCC42HumanRNA seqFigure 2+
LASVPRJNA254017SAMN02927510SRX718802SRR1594664ISTH2050LASV386_ERCC84HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927503SRX719192SRR1595909ISTH2020LASV368_ERCC112HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927484SRX718789SRR1594651ISTH0230LASV435_ERCC96HumanRNase HFigure 2+
LASVPRJNA254017SAMN02927592SRX719159SRR1595835LM032.depLM032_DepletedMastomysRNase HFigure 3+
LASVPRJNA254017SAMN02927592SRX718836SRR1594698LM032.stdLM032_StandardMastomysRNA seqFigure 3+
LASVPRJNA254017SAMN03099734SRX733666SRR1613388NHP_DK9W-AG.dep728_DepletedMacaqueRNase HFigure 3+
LASVPRJNA254017SAMN03099735SRX733667SRR1613389NHP_DK9W-AG.std728_StandardMacaqueRNA seqFigure 3+
LASVPRJNA254017SAMN03099736SRX733668SRR1613390NHP_DK9W-AL.dep729_DepletedMacaqueRNase HFigure 3+
LASVPRJNA254017SAMN03099737SRX733669SRR1613391NHP_DK9W-AL.std729_StandardMacaqueRNA seqFigure 3+
LASVPRJNA254017SAMN03099738SRX733670SRR1613392NHP_DK9W-B.dep734_DepletedMacaqueRNase HFigure 3+
LASVPRJNA254017SAMN03099739SRX733671SRR1613393NHP_DK9W-B.std734_StandardMacaqueRNA seqFigure 3+
LASVPRJNA254017SAMN03099740SRX733672SRR1613394NHP_DK9W-K.dep733_DepletedMacaqueRNase HFigure 3+
LASVPRJNA254017SAMN03099741SRX733673SRR1613395NHP_DK9W-K.std733_StandardMacaqueRNA seqFigure 3+
LASVPRJNA254017SAMN03099742SRX733674SRR1613396NHP_DK9W-L.dep731_DepletedMacaqueRNase HFigure 3+
LASVPRJNA254017SAMN03099743SRX733675SRR1613397NHP_DK9W-L.std731_StandardMacaqueRNA seqFigure 3+
LASVPRJNA254017SAMN03099744SRX733676SRR1613398NHP_DK9W-S.dep732_DepletedMacaqueRNase HFigure 3+
LASVPRJNA254017SAMN03099745SRX733677SRR1613399NHP_DK9W-S.std732_StandardMacaqueRNA seqFigure 3+
LASVPRJNA254017SAMN02927592SRX719168SRR1595853LM032LASV68_BLCMastomysRNA seqFigure 4, Table 1+
LASVPRJNA254017SAMN02927476SRX727329SRR1606288G733LASV_90HumanRNA seqFigure 4, Table 1+
LASVPRJNA254017SAMN02927592SRX733690SRR1613412LM032LM032_HSMastomysHybrid SelectionFigure 4, Table 1+
LASVPRJNA254017SAMN02927476SRX733681SRR1613403G733G733_HSHumanHybrid SelectionFigure 4, Table 1+
LASVPRJNA254017SAMN02927593SRX727318SRR1606277LM222LASV_74MastomysRNA seqTable 1+
LASVPRJNA254017SAMN03099732SRX733664SRR1613386Z002LASV_77MastomysRNA seqTable 1-
LASVPRJNA254017SAMN03099733SRX733665SRR1613387G090LASV_79HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927477SRX727310SRR1606267G771LASV94HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927399SRX734464SRR1614275G2230Solexa-100929.tagged_332HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927483SRX731079SRR1610580ISTH0073Solexa-106870.tagged_851HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927500SRX719163SRR1595846ISTH1137LASV353_BLCHumanRNA seqTable 1+
LASVPRJNA254017SAMN02927503SRX718749SRR1594606ISTH2020LASV368_ERCC03HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927504SRX727274SRR1606236ISTH2025LASV374_ERCC58HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927510SRX718860SRR1594723ISTH2050LASV386_ERCC48HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927484SRX718809SRR1594671ISTH0230LASV435_ERCC53HumanRNA seqTable 1+
LASVPRJNA254017SAMN02927593SRX733692SRR1613414LM222LM222_HSMastomysHybrid SelectionTable 1+
LASVPRJNA254017SAMN03099732SRX733678SRR1613400Z002Z002_HSMastomysHybrid SelectionTable 1+
LASVPRJNA254017SAMN03099733SRX733679SRR1613401G090G090_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927477SRX733682SRR1613404G771G771_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927399SRX733680SRR1613402G2230G2230_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927483SRX733683SRR1613405ISTH0073ISTH0073_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927500SRX733685SRR1613407ISTH1137ISTH1137_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927503SRX733686SRR1613408ISTH2020ISTH2020_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927504SRX733687SRR1613409ISTH2025ISTH2025_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927510SRX733688SRR1613410ISTH2050ISTH2050_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927484SRX733684SRR1613406ISTH0230ISTH0230_HSHumanHybrid SelectionTable 1+
LASVPRJNA254017SAMN02927592SRX733689SRR1613411LM032LM032_DepletedMastomyscDNAND, manually added to the original sup file 3+
LASVPRJNA254017SAMN02927592SRX733691SRR1613413LM032LM032_StandardMastomyscDNAND, manually added to the original sup file 3-

Summary of detection of Ebola and Lassa viruses in Use Case 3–3.

The table summarizes the Metavisitor report files available as S16 and S17 Files. The results of both analyses are summarized in Table 4. Metavisitor was able to detect Ebola virus in all corresponding sequence datasets (S16 File) as well as Lassa virus in 53 out the 55 sequence datasets generated from Lassa virus samples (S17 File). Consistently, Matranga et al did not report reconstructed Lassa genomic segments from the two remaining datasets, which likely reflects high read duplication levels in the corresponding libraries [31]. The reconstructed Lassa virus L segments and Ebola virus genomes are compiled in S18 and S19 Files, respectively. In these sequences, de novo assembled segments in uppercases are integrated in the reference guide sequence (lowercase) used for the reconstruction. To note, for viruses with segmented genomes such as Lassa virus, the workflow has to be used separately with appropriate guide sequences for the segment to be reconstructed. As an example, we used this workflow with the filter term “Lassa” for the “Parse blast output and compile hits” tool and the Lassa S segmentNC_004296.1 for guiding the reconstruction (S20 File). At this stage, users can use the genomic fasta sequences for further analyses. For instance, multiple sequence alignments can be performed for phylogenetics or variant analyses, or reads in the original datasets can be realigned to the viral genomes to visualize their coverages, as has been done in Use Cases 1 and 2.

Discussion

Metavisitor performs de novo assembly of sequencing reads and detects contigs of viral origin through blast alignments, which then can be clustered to reconstruct a viral genome. On the one hand, this strategy reduces the rate of false positives since the ability to form contigs that align to known viral sequences is a strong evidence of the presence of a full viral genome in the analyzed datasets. In addition, we advise Metavisitor users to remove sequence reads that align to genomes of hosts, symbionts or parasites, if these are known and available (see Fig 1). Although this treatment can be skipped (as in Use Case 3–3), it avoids chimeric assemblies of viral and nonviral sequences, while speeding up the assembly of contigs of potential viral origin. It also ensures that sequences of the host genome that have been annotated as Endogenous Viral Elements (EVEs) are not retained for viral contigs assembly. Users should keep in mind that EVEs that have not yet be identified as such may be retained by Metavisitor as potential viral contigs. Should this happen, Webster et al. [13] have demonstrated that, when available, mapping of host genome sequencing reads to these contigs allows to discriminate between EVE and virus sequences. As illustrated in Use Case 2–1, when a host is known for having antiviral RNAi pathways, re-mapping small RNA reads and plotting their length distribution can also add support to the infectious origin of candidate viral contigs (21nt read peak, sense and antisense reads aligning along the contig). On the other hand, Metavisitor workflows may fail to assemble viral contigs when the abundance of viral reads is too low in sequenced samples, or when these reads align to short, scattered regions of viral genomes. However, as illustrated in the Use Case 3–2, it is possible to keep track of these false negatives by implementing a workflow that annotates and counts viral reads before the de novo contig assembly steps. We developed Metavisitor in Galaxy in order to benefit from a well supported framework allowing execution of computational tools and workflows through a user-friendly web interface. In addition, the advanced Galaxy functionalities ensure the highest levels of computational analyses, through rigorous recording of the produced data and metadata and of the used parameters as well as the ability to share, publish and reproduce these analyses (see Metavisitor availability section in Experimental Procedures). Another major benefit is that, as any Galaxy workflow, the Metavisitor workflows may be modified or extended by users. If they are already available in a Galaxy tool shed, integration of new tools in a workflow is straightforward, thanks to the Galaxy workflow editor. Although it requires coding skills, any other freely available software can be adapted to the Galaxy framework and used in a Metavisitor workflow. Through use cases, we have shown that Metavisitor is adaptable: short or longer reads from small RNAseq, RNAseq or DNAseq can be used as input data with or without adapter clipping; read datasets can be used as is, or compressed using reads-to-sequences or normalization by median procedures [27]; a variety of alignment and de novo assembly tools can be used, provided that they have been adapted for their execution in the Galaxy framework; finally, although we provide the vir1 nucleotide and protein references to identify sequences of viral origin users are free to upload and work with their own viral references. Thus, Metavisitor provides biologists and clinicians with an accessible framework for detection, reconstruction or discovery of viruses. Viral sequences reconstructed by Metavisitor can be used in a large range of subsequent analyses, including phylogenetic or genetic drift analyses in contexts of epidemics or virus surveillance in field insect vectors, animal or human populations, or systematic identification of viruses for evaluation of their morbidity. In Use Cases 3–1 to 3–3, we have shown that Metavisitor allows analysis of numerous datasets in batch with consistent tracking of individual samples. Thus, we are confident that Metavisitor is scalable to large epidemiological studies or to clinical diagnosis in hospital environments. For instance, it could be used to analyse RNAseq data from Zika infected patients [32,33]. Finally, we wish to stress that Metavisitor has the potential for integrating detection or diagnosis of non-viral, microbial components in biological samples. Eukaryotic parasites or symbionts and bacteria are mostly detectable in sequencing datasets from their abundant ribosomal RNAs whose sequences are strongly conserved in the main kingdoms. This raises specific issues for their accurate identification and their taxonomic resolution that are not currently addressed by Metavisitor. However, many tools and databases [34] addressing these metagenomics challenges can be adapted, when not already, to the Galaxy framework. For instance, Qiime [35] and the SILVA database of ribosomal RNAs [36] can be used within Galaxy and could thus be integrated in future Metavisitor workflows aiming at detection and discovery of non virus organisms in deep sequence datasets. Screenshot of the “Retrieve FASTA from NCBI” tool form to retrieve viral nucleotide (A) or protein (B) vir1 sequences. The query string “txid10239[orgn] NOT txid131567[orgn] NOT phage” retrieves viruses sequences (txid10239) while filtering out cellular organisms sequences (txid131567) and phage sequences. (PDF) Click here for additional data file.

Screenshot of an output produced by the “Parse blast output and compile hits” Metavisitor tool.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 1-1.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 1–2.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 1–3.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for remapping for use cases 1–1, 1–2, 1–3.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 1–4.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 2–1.

(PDF) Click here for additional data file.

Alignments of small RNA sequence reads to the partially reconstructed Anopheles C virus genome (Use Case 2–1).

Plot shows the abundance of 18–30-nucleotide (nt) small RNA sequence reads matching the genome sequences and histogram shows length distributions of these reads. Positive and negative values correspond to sense and antisense reads, respectively. (PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for small RNA profiling of contigs.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 2–2.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for remapping in use cases 2–1 and 2–2.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Trinity test in Use Case 2–2.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for SPAdes test in Use Case 2–2.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 3–1.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 3–2.

(PDF) Click here for additional data file.

Screenshot of Metavisitor workflow for Use Case 3–3.

(PDF) Click here for additional data file.

Nora_MV sequence of Nora virus reconstructed by Metavisitor using reads collapsed to unique sequences (Use Case 1–1).

(TXT) Click here for additional data file.

Nora_raw_reads sequence of Nora virus reconstructed by Metavisitor using raw reads (Use Case 1–2).

(TXT) Click here for additional data file.

Nora_Median-Norm-reads sequence of Nora virus reconstructed by Metavisitor using normalisation of read abundance by median procedure (Use Case 1–3).

(TXT) Click here for additional data file.

MAFFT (http://www.ebi.ac.uk/Tools/msa/mafft/) Multiple Alignment of the Nora virus genome sequences published (JX220408.1 and NC_007919.3) or generated in Use Cases 1-1 to 1-3 (Nora_MV, Nora_raw_reads and Nora_Median−Norm−reads).

A view of the alignments was produced by MView (). The html file can be visualized by opening it locally with a web browser. (HTML) Click here for additional data file.

Output of the “parse blast output and compile hits” tool in Use Case 1–4.

(TXT) Click here for additional data file.

Output of the “parse blast output and compile hits” tool in Use Case 2–1.

(TXT) Click here for additional data file.

Output of the “Pick Fasta Sequences” tool in Use Case 2–1.

(TXT) Click here for additional data file.

Sequences of the 4 contigs generated by the “CAP3 sequence assembly” tool in Use Case 2–1.

(TXT) Click here for additional data file.

Integration of the 4 assembled contigs (S8 File) in the DCV genome scaffold NC_001834.1 by the “blast_to_scaffold” tool.

Lowercase correspond to NC_001834.1 sequences while uppercase correspond to contig sequences. (TXT) Click here for additional data file.

siRNA profiling of de novo assembled contigs in Use Case 2–1.

Small RNA sequences reads were aligned to the contigs and size distribution and read maps were generated using the “Generate readmap and histograms from alignment files” tool. Plots show the map and abundance of 18–30 nt small RNA reads for indicated contigs and histograms show length distributions of these reads. Positive and negative values correspond to sense and antisense reads, respectively. (PDF) Click here for additional data file.

Parsing of blastx alignments with the “blast analysis, by subjects” tool in Use Case 2–2.

(TXT) Click here for additional data file.

Sequence of the 8919 nt contig in Use Case 2–2.

(TXT) Click here for additional data file.

MAFFT (http://www.ebi.ac.uk/Tools/msa/mafft/) Multiple Alignment in Clustal format of 3 AnCV genomes reconstructed with Oases, Trinity and SPAdes assembly programs.

(TXT) Click here for additional data file.

Merge of all reports generated by the “Parse blast output and compile hits” tool in Use Case 3–1.

(TXT) Click here for additional data file.

Merge of all reports generated by the “Parse blast output and compile hits” tool in Use Case 3–2.

These reports are summarized in Table 3. (TXT) Click here for additional data file.

Merge of all reports for Ebola virus generated by the “Parse blast output and compile hits” tool in Use Case 3–3.

These reports are summarized in Table 4. (TXT) Click here for additional data file.

Merge of all reports for Lassa virus generated by the “Parse blast output and compile hits” tool in Use Case 3–3.

These reports are summarized in Table 4. (TXT) Click here for additional data file.

Lassa virus segment L reconstructed sequences in NC_004297.1 scaffold in Use Case 3–3.

(TXT) Click here for additional data file.

Ebola virus reconstructed sequences in NC_002549.1 scaffold in Use Case 3–3.

(TXT) Click here for additional data file.

Lassa virus segment S reconstructed sequences in NC_004296.1 scaffold in Use Case 3–3.

(TXT) Click here for additional data file.

Galaxy tools used in Metavisitor.

(PDF) Click here for additional data file.

Duration of execution of the Metavisitor workflows.

The times given correspond to execution of the workflows on a 16-core (2GHz) machine with 96 Mo RAM, Galaxy release 16.04. (PDF) Click here for additional data file.

Sequence-independent strategy to identify candidate viral contigs (Use Case 2–1).

Set of contigs (Loci) with clear (+), unclear (?) or no siRNA signature were manually selected from S10 Fig and tested for significant blastx alignment against the vir1 index and the Non-redundant NCBI protein database (october 201). (PDF) Click here for additional data file.
  35 in total

1.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

2.  Development of a virus detection and discovery pipeline using next generation sequencing.

Authors:  Thien Ho; Ioannis E Tzanetakis
Journal:  Virology       Date:  2014-10-22       Impact factor: 3.616

3.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

4.  Endogenous viral elements in animal genomes.

Authors:  Aris Katzourakis; Robert J Gifford
Journal:  PLoS Genet       Date:  2010-11-18       Impact factor: 5.917

5.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.

Authors:  Marcel H Schulz; Daniel R Zerbino; Martin Vingron; Ewan Birney
Journal:  Bioinformatics       Date:  2012-02-24       Impact factor: 6.937

6.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

7.  SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data.

Authors:  Roberto R S de Andrade; Maite F S Vaslin
Journal:  Virol J       Date:  2014-03-07       Impact factor: 4.099

8.  Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples.

Authors:  Christian B Matranga; Kristian G Andersen; Sarah Winnicki; Michele Busby; Adrianne D Gladden; Ryan Tewhey; Matthew Stremlau; Aaron Berlin; Stephen K Gire; Eleina England; Lina M Moses; Tarjei S Mikkelsen; Ikponmwonsa Odia; Philomena E Ehiane; Onikepe Folarin; Augustine Goba; S Humarr Kahn; Donald S Grant; Anna Honko; Lisa Hensley; Christian Happi; Robert F Garry; Christine M Malboeuf; Bruce W Birren; Andreas Gnirke; Joshua Z Levin; Pardis C Sabeti
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

9.  Convergent evolution of argonaute-2 slicer antagonism in two distinct insect RNA viruses.

Authors:  Joël T van Mierlo; Alfred W Bronkhorst; Gijs J Overheul; Sajna A Sadanandan; Jens-Ola Ekström; Marco Heestermans; Dan Hultmark; Christophe Antoniewski; Ronald P van Rij
Journal:  PLoS Pathog       Date:  2012-08-16       Impact factor: 6.823

10.  NCBI BLAST+ integrated into Galaxy.

Authors:  Peter J A Cock; John M Chilton; Björn Grüning; James E Johnson; Nicola Soranzo
Journal:  Gigascience       Date:  2015-08-25       Impact factor: 6.524

View more
  5 in total

1.  An architecture for genomics analysis in a clinical setting using Galaxy and Docker.

Authors:  W Digan; H Countouris; M Barritault; D Baudoin; P Laurent-Puig; H Blons; A Burgun; B Rance
Journal:  Gigascience       Date:  2017-11-01       Impact factor: 6.524

2.  Metagenomic Virome Analysis of Culex Mosquitoes from Kenya and China.

Authors:  Evans Atoni; Yujuan Wang; Samuel Karungu; Cecilia Waruhiu; Ali Zohaib; Vincent Obanda; Bernard Agwanda; Morris Mutua; Han Xia; Zhiming Yuan
Journal:  Viruses       Date:  2018-01-12       Impact factor: 5.048

Review 3.  Viral Diagnostics in Plants Using Next Generation Sequencing: Computational Analysis in Practice.

Authors:  Susan Jones; Amanda Baizan-Edge; Stuart MacFarlane; Lesley Torrance
Journal:  Front Plant Sci       Date:  2017-10-24       Impact factor: 5.753

4.  Mosquito Small RNA Responses to West Nile and Insect-Specific Virus Infections in Aedes and Culex Mosquito Cells.

Authors:  Giel P Göertz; Pascal Miesen; Gijs J Overheul; Ronald P van Rij; Monique M van Oers; Gorben P Pijlman
Journal:  Viruses       Date:  2019-03-18       Impact factor: 5.048

5.  Identification of a piscine reovirus-related pathogen in proliferative darkening syndrome (PDS) infected brown trout (Salmo trutta fario) using a next-generation technology detection pipeline.

Authors:  Ralph Kuehn; Bernhard C Stoeckle; Marc Young; Lisa Popp; Jens-Eike Taeubert; Michael W Pfaffl; Juergen Geist
Journal:  PLoS One       Date:  2018-10-22       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.