| Literature DB >> 29029728 |
Abstract
Computer-assisted technologies of the genomic structure, biological function, and evolution of viruses remain a largely neglected area of research. The attention of bioinformaticians to this challenging field is currently unsatisfying in respect to its medical and biological importance. The power of new genome sequencing technologies, associated with new tools to handle "big data", provides unprecedented opportunities to address fundamental questions in virology. Here, we present an overview of the current technologies, challenges, and advantages of Next-Generation Sequencing (NGS) in relation to the field of virology. We present how viral sequences can be detected de novo out of current short-read NGS data. Furthermore, we discuss the challenges and applications of viral quasispecies and how secondary structures, commonly shaped by RNA viruses, can be computationally predicted. The phylogenetic analysis of viruses, as another ubiquitous field in virology, forms an essential element of describing viral epidemics and challenges current algorithms. Recently, the first specialized virus-bioinformatic organizations have been established. We need to bring together virologists and bioinformaticians and provide a platform for the implementation of interdisciplinary collaborative projects at local and international scales. Above all, there is an urgent need for dedicated software tools to tackle various challenges in virology.Entities:
Keywords: Bioinformatics; Software; Virology; Virus sequence analysis
Mesh:
Year: 2017 PMID: 29029728 PMCID: PMC7172532 DOI: 10.1016/bs.aivir.2017.08.004
Source DB: PubMed Journal: Adv Virus Res ISSN: 0065-3527 Impact factor: 9.937
Virus-Specific Databases Besides the General NCBI Database
| Tool | Description | Ref. |
|---|---|---|
| ViPR | ViPR database integrates genomes and various other types of data for multiple virus families belonging to the Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Flaviviridae, Filoviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae, and Togaviridae families. | |
| EpiFluTM | GISAID EpiFluTM is the world's most complete collection of genetic sequence data of influenza viruses and related clinical and epidemiological data. EpiFluTM is tailored to the needs of influenza researchers from both the human and the veterinary fields. The data is publicly accessible but not public domain (GISAID does not remove nor waive any preexisting rights). | |
| HIV | The HIV database contains data on HIV genetic sequences and immunological epitopes. The website also provides an access to several tools that can be used for analysis and visualization. | |
| HCV | HCV is a comprehensive database of the hepatitis C virus (HCV). | |
| ViralZone | ViralZone is a web-resource from the Swiss Institute of Bioinformatics for all viral genus and families, providing general molecular and epidemiological information, along with virion and genome figures. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries. | |
| VVR | The virus variation resource (VVR) is a selection of web retrieval interfaces, analysis, and visualization tools for virus sequence datasets. |
Commonly Used Next-Generation Sequencing (NGS) Technologies and Their Major Specifications
| Platform | Length (bp) | Throughput | Number of Reads | Error | Cost Per Gb |
|---|---|---|---|---|---|
| 454 Pyrosequencing | 400–1000 | 35–700 Mb | 0.1–1 M | 1%, indel | $10–40,000 |
| Ion Torrent | 200–400 | 100 Mb–15 Gb | 2–80 M | 1%, indel | $500–2000 |
| Illumina Solexa | 25–300 | 2–900 Gb | 10 M–4 B | 0.1%, subst. | $7–1000 |
| Qiagen GeneReader | 100 | NA | 10 M–4 B | 0.1%, subst. | NA |
| SOLiD | 60–100 | 10–320 Gb | 700 M–1.4 B | 0.1%, AT bias | $100 |
| Pacific BioSciences | up to 40 Kb | 0.5–7 Gb | ∼55 k | 13% (single) | $1000 |
| 1% (circular) | |||||
| Oxford Nanopore (MinION) | up to 200 Kb | up to 1.5 Gb | >100 k | 12%, indel | $750 |
Generally, NGS technologies can be divided in short-read and long-read approaches, depending on the length of the produced reads. SNA, single-nucleotide addition; CRT, cyclic reversible termination; SMRT, single-molecule real-time sequencing; indel, nucleotide insertion–deletion; subst., nucleotide substitution.
This table is mainly based on recent reviews Goodwin et al., 2016, Mardis, 2017.
De Novo Assembly Tools Suitable for the Assembly of Viral Genomes
| Tool | Description | Ref. |
|---|---|---|
| AV454 | AV454 is a de novo consensus assembler designed for small and nonrepetitive genomes sequenced at high depth. | |
| RIEMS | RIEMS is a software for the sensitive and reliable analysis of metagenomic datasets. | |
| V-FAT | V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. | |
| VICUNA | VICUNA is a de novo assembly tool targeting populations with high mutation rates. | |
| VrAP | The VrAP (Viral Assembly Pipeline) is based on the genome assembler SPAdes ( |
Fig. 1Comparison of eight assembly tools based on a sequenced C6/36 cell, infected with a Piura virus strain from Mexico. The figure depicts an alignment of de novo assembled contigs (rectangles) to the reference genome of Piura virus (KM249340.1). SPAdes assembles the full viral genome without any difficulties. All other assemblers fail to build a continuous single contig. Green—contigs that align correctly. Red—misassemblies. The different color shades are only for a better visualization of adjacent contigs. The alignment plot was created with Quast (Gurevich et al., 2013).
Fig. 2Workflow of the viral de novo assembly pipeline VrAP. The pipeline requires (preprocessed) reads as input. The output consists of final contigs and an annotation list. The pipeline combines multiple read corrections with SPAdes, a super-contig construction and a contig classification. VrAP comes as an easy to use command-line tool (http://www.rna.uni-jena.de/en/vrap/). All steps in square bracket are optional. FACS, fluorescent activated cell sorting.
A Selection of Tools for the Detection of Secondary Structures in RNA Viruses
| Tool | Description | Alignment | Ref. |
|---|---|---|---|
| RNAfold | RNAfold is a tool to predict secondary structures of single stranded RNA or DNA sequences. | No | |
| mfold | mfold is a web server that provides easy access to RNA and DNA folding and hybridization software. | No | |
| RNAalifold | RNAalifold is a tool for calculating secondary structures for a set of aligned RNAs. It is part of the Vienna RNA Package. | Yes | |
| LocARNA | LocARNA is a multiple alignment tool based on the calculation of sequence and structure simultaneously. | Yes | |
| LRIscan | LRIscan is a tool for the prediction of long-range interactions in full viral genomes based on a multiple genome alignment. LRIscan is able to find interactions spanning thousands of nucleotides. | Yes |
Fig. 3Alignment-based secondary structure prediction of 5′ genome regions of alphacoronaviruses. The viruses included in this analysis represent all currently recognized species in the genus Alphacoronavirus. The alignment (not shown) was calculated by LocARNA (Will et al., 2007) and the structure by RNAalifold (Hofacker, 2007). The consensus sequence is represented using the IUPAC code. Colors are used to indicate conserved base pairs: from red (conservation of only one base pair type) to purple (all six base pair types are found); from dark (all sequences contain this base pair) to light colors (one or two sequences are unable to form this base pair). To refine the alignment, an anchor at the highly conserved core TRS-L was used.
Fig. 4Long-range interactions in 5′-UTR, CRE, VR, and X-tail of HCV (Fricke et al., 2015). (A) Overview and possible interactions for all tested HCV sequences. Gray lines—known interactions derived from literature, validated by this analysis for all examined isolates; Green lines—novel interactions (based on new calculations). The detailed interactions are shown on the right side next to each corresponding interaction line. The leftmost interaction can be extended for a possible circularization of HCV. (B) Possible circularization of HCV. Interacting loops of SLII and DLS of the HCV plus-strand can be extended to at least 62 bp in all available 19 isolates.
Fig. 5Host–virus RNA-Seq methods pipeline for the detection of differential expressed genes (see text for details).