Literature DB >> 29055712

A decade of RNA virus metagenomics is (not) enough.

Abstract

It is hard to overemphasize the role that metagenomics has had on our recent understanding of RNA virus diversity. Metagenomics in the 21st century has brought with it an explosion in the number of RNA virus species, genera, and families far exceeding that following the discovery of the microscope in the 18th century for eukaryotic life or culture media in the 19th century for bacteriology or the 20th century for virology. When the definition of success in organism discovery is measured by sequence diversity and evolutionary distance, RNA viruses win. This review explores the history of RNA virus metagenomics, reasons for the successes so far in RNA virus metagenomics, and methodological concerns. In addition, the review briefly covers clinical metagenomics and environmental metagenomics and highlights some of the critical accomplishments that have defined the fast pace of RNA virus discoveries in recent years. Slightly more than a decade in, the field is exhausted from its discoveries but knows that there is yet even more out there to be found.

Entities: Chemical Disease Gene Species

Keywords: Conjoined circles; Metagenomics; RNA virus; Viral discovery

Mesh：

Year: 2017 PMID： 29055712 PMCID： PMC7114529 DOI： 10.1016/j.virusres.2017.10.014

Source DB: PubMed Journal: Virus Res ISSN： 0168-1702 Impact factor: 3.303

Introduction

In 1986, using direct cloning of 16S rDNA genes Norman Pace presciently noted that the method “seems to have no upper limit as to the complexity of the populations” (Pace et al., 1986). Thirty years later, Pace’s maxim only seems ever more true. Deeper and deeper sequencing technologies have enabled the discovery of thousands of viral species. Metagenomic viral discovery can be applied to almost any sample type and novel viruses emerge. Growth in virome and metagenomics papers is outpaced only by that of the viruses discovered (Fig. 1 ). This review is meant to cover the field of modern-day RNA virus metagenomics, where RNA is perhaps best defined as “not DNA”, mostly due to the use of DNA-specific nucleases in library preparations. Many excellent reviews on viral metagenomics already exist, but no specific review focuses on RNA virus metagenomics (Bexfield and Kellam, 2011, Delwart, 2007, Edwards and Rohwer, 2005, Mokili et al., 2012, Rosario and Breitbart, 2011, Tang and Chiu, 2010). Since there is absolutely no way to cover all the samples sequenced or genome organizations found even if one was restricted to a single order or family, the author apologizes in advance if he failed to cite your contribution to the field.

Fig. 1

Growth everywhere. Publications by year for “RNA virus metagenomics” (A) and “virome” (B) over the past decade using the “Results by Year” graph from Pubmed. Number of viral species (C), genera (D), and families (E) assigned by the ICTV over the past four decades (ICTV, 2017a, ICTV, 2017b). Changepoint analysis with the “at most one change” statistical test was significant at year 1996 for genera and family data and 2015 for species data (“changepoint, ” n.d.).

TMRCA of RNA virus metagenomics

The root

Where do we begin with RNA virus metagenomics? Construction and sequencing of cDNA libraries share most of the features of RNA viral metagenomic protocols and have existed since the late 1970s (Sim et al., 1979, Efstratiadis et al., 1977). One could argue about the meta- or the −genomic nature of the cDNA sequencing example, but the method is nearly unchanged since its introduction. A single organism contains multitudes and now one of the more common ways of discovering new viruses is from searching said cDNA sequencing data (Feng et al., 2008, Shi et al., 2016, Krishnamurthy et al., 2016). Clearly, the depth of sampling has changed in recent years but sequencing even two clones could be considered metagenomics. RNA virus metagenomics began with a DNA virus and a contaminant. In a prescient 2001 manuscript, Allander et al. added DNase to spin-filtered serum prior to extraction to enrich for viral particles (Allander et al., 2001). Nucleic acids were extracted and separately processed for DNA and RNA libraries and then randomly amplified, enabling the discovery of two bovine parvoviruses from the commercial bovine serum that was used to dilute their viral spike-in controls. Surprisingly, the novel DNA viruses were present in the RNA half of the experiment – the side that was specifically reverse transcribed and amplified in order to test the protocol’s ability to detect GB virus C (Allander et al., 2001). This experiment is worth focusing on as it foreshadowed several themes in the world of RNA virus metagenomics. The protocol used in the paper is surprisingly similar to that used today (Conceição-Neto et al., 2015). Host or environmental background DNA dwarfs most nucleic acids and sequencing of RNA species circumvents the problem so well that it may be the best way to find high concentration DNA viruses that are being actively transcribed. Even today concerns over whether metagenomics is sufficiently sensitive bedevil its use as a gold standard (Schlaberg et al., 2017, Wylie et al., 2012). Finally, in the world of metagenomics, you do not know what is in your sample until you sequence, and the sensitivity of your assay is perhaps best measured by your ability to recover ubiquitous reagent contaminants (Bhatt et al., 2013, Naccache et al., 2013).

The early branches

Environmental RNA virus metagenomics arose somewhat later. As recently as 2003, degenerate PCR targeting the RNA-dependent RNA polymerase from picorna-like viruses had shown the potential for viral discovery in marine virus communities with the detection of four new RNA virus families (Culley et al., 2003). Large scale, high-profile efforts such as sailboat-based shotgun sequencing of the Sargasso Sea came two years later and used 0.1–3.0 μm filters that predominantly enriched for DNA from microbial communities (Venter et al., 2004). Three years later, Robert Edwards and Forest Rohwer were still calling for the development of methods that would allow for sequencing of RNA viruses (Edwards and Rohwer, 2005). Allander’s original method that detected the bovine parvovirus contaminants gave rise to clinical metagenomics in 2005 by discovering a novel human parvovirus (human bocavirus) and detecting a novel human coronavirus (coronavirus HKU1) by screening pooled human nasopharyngeal aspirates (Allander et al., 2005, Woo et al., 2005). Reads to influenza A virus, human orthopneumovirus (RSV), metapneumovirus, human adenovirus, and TTV-like virus were also recovered from the 864 reads. That same year 36,789 colony sequences from DNase-treated RNA extracted from viral concentrates of human feces revealed a preponderance of pepper mild mottle virus and other plant viruses, but no new human viruses (Zhang et al., 2005). The first non-human environmental-focused RNA viral metagenomic survey used DNase-treated extracted nucleic acids from nuclease-treated viral concentrates from coastal communities in British Columbia (Culley et al., 2006). The resultant pure RNA was reverse transcribed and double-stranded cDNA was synthesized, blunt-end ligated, and cloned to reveal nothing, for there was not enough RNA present even in 40–60L of seawater, highlighting the importance of amplification in most any RNA virus metagenomic preparation. After amplification, two novel picorna-like viruses and one novel +ssRNA virus that remains taxonomically unassigned were discovered at high concentrations among the 355 reads. It is incredible to note that only one year later, researchers were reporting RNA viral metagenomic sorcery of honeybees with hundreds of thousands of reads (Cox-Foster et al., 2007, Koonin, 2007). Next-generation sequencing and rapid bioinformatics gains allowed researchers unbelievable opportunities (Fig. 1) (Delwart, 2007). Due to the short read length of Solexa sequencing at the time, initial viral metagenomic efforts required 454 sequencing, which allowed for hundreds of thousands of reads a few hundred bases long (Rothberg and Leamon, 2008). The result was an array of human viral discoveries, including Merkel cell polyomavirus, human bocavirus 2, salivirus, Lujo arenavirus, Dandenong arenavirus, SFTS bunyavirus, and a variety of new human astroviruses including MLB1 and VA1 through VA5 (Briese et al., 2009, Feng et al., 2008, Finkbeiner et al., 2008a, Finkbeiner et al., 2008b, Greninger et al., 2009, Holtz et al., 2009, Kapoor et al., 2009a, Kapoor et al., 2009b, Koonin, 2007, Li et al., 2009, Meyer et al., 2015, Palacios et al., 2008, Yu et al., 2011). Groups often found the same viruses within days of each other and discrepant names for the same virus were common (Greninger et al., 2009, Holtz et al., 2008, Holtz et al., 2009, Kapoor et al., 2008, Li et al., 2009). Pyrosequencing yielded impressive gains in environmental RNA virus sequencing with profiling of a freshwater lake yielding tens of novel ssRNA and dsRNA viruses (Djikeng et al., 2009). The RNA virus metagenomic age was officially underway.

Why has RNA virus metagenomics been so successful?

Diversity

While early studies of metagenomics lamented the lack of RNA viruses discovered compared to DNA viruses, RNA virus metagenomics appeals to the viral discoverer for a number of reasons. The first is that success in viral metagenomics is often measured by the genetic distance of a newly discovered virus from other known viruses. While finding a new virus in the environment is not necessarily a problem in either DNA or RNA, the low fidelity of RNA polymerases and the sequence space they are capable of sampling, along with the possibility of recombination, lend themselves to new species and genera that are the trophies of metagenomic viral discovery. These properties also favor the metagenomic library preparation process. Sufficient homology to land PCR primers for amplicon-tiling library preparation approaches may not exist, and the number of oligos needed to cover the sequence space of a given RNA virus such as Hepatitis C virus can make hybrid capture approaches cost-prohibitive. De novo assembly methods that allow for the rapid reconstruction of near complete genomes have also dramatically improved over the past decade and can handle widely disparate coverage for each contig (Grabherr et al., 2011, Bankevich et al., 2012). All of these properties often make metagenomics the first method of choice when trying to recover sequence from even a known RNA virus (Greninger et al., 2017a, Ogimi et al., 2017).

It is not DNA

Another reason for the provision of RNA virus metagenomics is the ease and increased depth and sensitivity associated with removing background DNA. Indeed, one of the difficulties of RNA virus metagenomics is that RNA virus genome sizes are generally measured in the kilobases, but the size and quantity host DNA background measures in the gigabases. If not partitioning sequence selectively across RNA and DNA, a single contaminating host cell can be the equivalent of half a million virions. Removal of host DNA is often required for one to recover complete RNA virus genomes. The library preparation process simply requires a 30-min treatment with DNAse I or Benzonase to remove all background DNA leaving the RNA viruses behind. In addition, RNA viral capsids are measured in nanometers and can be selectively enriched using 0.22 um or 0.45 um filtration. Even if one is interested in DNA viruses, RNA metagenomics allows the recovery of mRNAs that are associated with actively replicating DNA viruses, which may be helpful information when so many DNA viruses can be latent or integrated into host genomes (Perlejewski et al., 2015).

Sitting right there

Recovery of these RNA viruses is also made easier by the fact that, once host DNA is removed, they can be present at extraordinarily high concentrations in total RNA or constitute a surprisingly large proportion of transcripts in an mRNA library, up to almost 90% of non-ribosomal RNAs in some invertebrates (Li et al., n.d.; Shi et al., 2016). Based on total masses of nucleic acid in virus like particles, it has been estimated that half of marine viruses are RNA viruses due to the larger burst size of eukaryotic RNA viruses (Steward et al., 2013). A particularly instructive example was the discovery of the Lake Sinai virus in honeybees (Runckel et al., 2011). It was originally discovered and assembled metagenomically at such high titer that when researchers went to perform Northern blots, they could see the viral RNA staring back at them after RNA electrophoresis. The author has also previously taken part in studies where the method of screening for known segmented RNA viruses such as rotavirus was to run gel electrophoresis on extracted stool RNA without amplification and look for banding (Greninger et al., 2010, Yu et al., 2012).

No wet lab necessary

Mining of publicly available transcriptome data has contributed greatly to the discovery of novel RNA viruses (Basler et al., 2005, Schomacker et al., 2004). Indeed, genome-transcriptome studies of eurkaryotes sometimes miss the viruses present in their sequence when they map their RNA-Seq reads to their assembled DNA-Seq genome (Babb et al., 2017, Clarke et al., 2015). RNA viruses are lurking in the unaligned portion of the reads and often can be readily assembled and aligned to NCBI to find many new viruses in a given host (Bekal et al., 2011). Most recently, six novel RNA viruses were discovered in publicly available orb weaver spider transcriptome data (Debat, 2017). Such sequences can be downloaded from the Short Read Archive, assembled, and deposited into NCBI’s Third Party Annotation database (“NCBI Third Party Annotation, ” n.d.). Research parasites of the world can now identify with their obligate intracellular parasitic brethren (Longo and Drazen, 2016).

A metastable approach

Rapid increases in sequencing depth achieved with next-generation sequencing have been matched by the increasing ease of library preparation. Original viral sequencing protocols involved the preparation of multiple micrograms of nucleic acid and about a week of work (Thurber et al., 2009, Willerth et al., 2010). Protocols were slowly refined to reduce library preparation to a matter of days (Greninger et al., 2010). The development of transposon-based library creation and amplification have reduced RNA metagenomic library preparation such that most labs perform the process in 12 days, and when coupled to nanopore sequencing the entire sample-to-sequence process can be as little as 6 h (Greninger et al., 2015b, Greninger et al., 2017b, Greninger et al., 2017c). A typical RNA virus metagenomic protocol entails filtration, nuclease treatment, double-stranded cDNA synthesis, followed by Nextera XT tagmentation and PCR amplification (Alexander L. Greninger et al., 2015c, Hall et al., 2014). The benefit of the protocol is the minimal hands-on time, low cost given the dilutability of the transposase, ease of dual-indexing, and direct compatibility with Illumina sequencers. The protocol can be performed in as little as 4–6 h and is amenable to automation (Greninger et al., 2017b, Greninger et al., 2017c). Variants include the use of additional PCR cycles or amplification prior to tagmentation. However, in the author’s experience these are not necessary given the amplification step present in the tagmentation library preparation protocol and the possibility of extra PCR cycles creating coverage bias (Conceição-Neto et al., 2015). Steps to increase the amount of nucleic acid used as input are helpful for the retrieval of longer reads and reducing PCR jackpotting, although often practitioners move forward with the RNA present. While no doubt the best way to enrich for viral like particles, and perhaps to help define a sequence as likely viral in origin, the use of ultracentrifugation is not necessarily required for viral metagenomics. Rather ultracentrifuge is kept for defined “virome” work, especially where there is a high bacterial background such as feces (Kohl et al., 2015, Temmam et al., 2015, Thurber et al., 2009). Indeed a direct head-to-head comparisons of ultracentrifugation and filtration for viral particle enrichment found the main benefit of ultracentrifugation to be the removal of host DNA background, which is less of a concern for RNA viral metagenomics if extracted nucleic acids are DNAse treated (Kleiner et al., 2015).

Metagenomics for the people

As evidenced from the early history, discovery of new etiological agents of human disease drove the adoption of viral metagenomics (Lysholm et al., 2012, Tang and Chiu, 2010). Clinical metagenomics, or direct shotgun sequencing from clinical samples with minimal sample processing, for diagnostic testing has largely been driven by the diversity of RNA viruses that preclude the routine use of single-plex or even multiplex PCR for broad clinical detection. Indeed, current rapid diagnostics followed by 16S or ITS PCR and Sanger sequencing cover most of the actionable and more exotic organisms present in clinical specimens. While metagenomics delivered on the promise of finding novel human viruses, viral discovery in humans has increasingly become a tragic story of patients interacting with the wrong squirrel or tick on the wrong day and most samples sequenced are frankly negative (Hoffmann et al., 2015, McMullan et al., 2012). In part because of the declining number of viral discoveries in humans, clinical metagenomics has taken on the mantle of detecting all known pathogens instead (Naccache et al., 2014, Wood and Salzberg, 2014). Metagenomics has been especially successful at finding known RNA viruses in unexpected sample types (Doan et al., 2016, Naccache et al., 2015). Provided there is sufficient coverage, it can also provide single nucleotide resolution to define strain relatedness for outbreak epidemiology and, where available, antiviral resistance (Barzon et al., 2011, Capobianchi et al., 2013, Greninger et al., 2017b, Greninger et al., 2017c). Furthermore, the vast majority of clinical samples sent for clinical metagenomic sequencing are entirely negative, yielding a greater potential role for “ruling out” an infectious cause through metagenomic testing prior to treating for other diseases such as autoimmune disorders with immunosuppressant medication. The excitement over using clinical metagenomics for one definitive test for clinical microbiology is tempered by questions around its sensitivity, actionable-ness of the sequence recovered (including cross-contamination), and cost. Since clinical labs are highly unlikely to perform ultracentrifugation, metagenomics likely has the best sensitivity in acellular environments such as cerebrospinal fluid rather than respiratory or stool specimens (Schlaberg et al., 2017). Human cells become an interfering substance and at concentrations much lower than those that interfere with other clinical chemistry measurements (Ranjitkar et al., 2017). Detection of viruses at low concentration such as Zika virus make shotgun metagenomics problematic for a rule-out result in diagnostic virology (Bingham et al., 2016, Landry and St George, 2017, Naccache et al., 2016). As with other culture-independent testing, culture will likely be required to provide antimicrobial sensitivity. While early detection can reduce further diagnostic testing, most RNA viruses in humans have no treatment and are essentially unactionable.

Bat-guano crazy about sampling

The ease and ubiquity of RNA virus metagenomics is perhaps best illustrated by the increasingly diverse samples that are sequenced. When sequence divergence is the metric of success, metagenomicists boldly go where no human has gone before. For the purposes of raw discovery, the use of extraordinary mixed samples allow for one-stop shopping and diversify discovery risk, albeit with the drawback that host species cannot be confidently defined. And the continuing need to fund and sell discovery from the basis of security has focused metagenomic efforts on species with high zoonotic pandemic potential (Racaniello, 2016, Temmam et al., 2014). Here, bat guano routinely ranks high on the RNA virus metagenomic list as it checks all three requirements listed above. Bats aggregate a wide array of arthropods, plants, and other animals and produce copious amounts of guano and have been associated with multiple zoonoses. Cameroonian fruit bats, Chinese bats, Myanmar bats, Hungarian bats, French bats, American bats, big brown bats, tricolored bats, and little brown bats have all been profiled showing a vast array of RNA viruses from eukaryotic hosts (Dacheux et al., 2014, Donaldson et al., 2010, Ge et al., 2012, He et al., 2013, He et al., 2017, Kemenesi et al., 2014, Li et al., 2010, Yinda et al., 2017). Bats have been used to put an upper limit – in the hundreds of thousands − of the total number of eukaryotic viruses that remain to be discovered (Anthony et al., 2013). Other insect and arthropod aggregators such as spiders and birds have also yielded many new RNA virus species (Debat, 2017, Ducatez and Guérin, 2015, Shean et al., 2017, Zhou et al., 2015). The greatest paradigm shifter in recent viral metagenomics work has been the sheer number and diversity of novel RNA viruses present in arthropods and invertebrates. Yong-Zhen Zhang and team at the Chinese CDC have been systematically profiling the transcriptomes of invertebrates to upend our understanding of animal virology. A haul of 112 novel negative-stranded RNA viruses from 70 arthropods surpassed the known diversity of Mononegavirales (Li et al., n.d.). These included the first circular negative stranded RNA viruses ever found, which were part of a larger family of viruses (Chuviridae) that may yet become an order given that it is phylogenetically ancient to existing segmented and unsegmented viruses and was arranged in a number of topologies. And yet those were just the negative-stranded viruses. When the team increased the number of hosts profiled to 220, they found 1445 novel RNA viruses (Shi et al., 2016). The genomes found illustrated widespread recombination among RNA viruses, viruses with multiple copies of structural genes, loss of structural genes, as well as transfer of genes between virus and host. Indeed, these discoveries so dwarf all previous RNA virus discoveries, especially in the picorna-like superfamily, that the paper itself would be worth reprinting here in full. Three additional novel families of positive-stranded viruses and two novel families of negative stranded viruses were described. Rather than detailing what was found, the surprises are the families in which novel RNA dependent RNA polymerase (RdRp) sequences were not found such as the Arenaviridae, Picornaviridae, Hepeviridae, Paramyxoviridae, Arteriviridae, and Secoviridae. Of course, whole undiscovered taxa basal to these taxa were also recovered. Perhaps most impressively, Zhang’s team finished complete genomes with rapid amplification of cDNA ends (RACE) for each of these viral genomes, a feat that not many metagenomic viral discovers undertake when finding more than a few novel viruses. No doubt evolutionary RNA virologists will have just begun to comb through this data trove before the next ten thousand whole genomes of novel RNA viruses is deposited from peat moss. Most importantly for veterinarians, RNA virus metagenomics has revealed a number of candidate etiological agents for a variety of animal ailments including avian proventricular dilatation disease, mink shaking syndrome, snake inclusion body disease, python respiratory illness, dairy cow disease, and tilapia die-offs (Kistler et al., 2008, Gancz et al., 2009, Blomström et al., 2010, Stenglein et al., 2014, Stenglein et al., 2012, Uccellini et al., 2014, Hoffmann et al., 2012, Bacharach et al., 2016). Animal feces has proven fruitful hunting for metagenomicists, with canine, feline, porcine, equine, goose, duck, and shrew fecal RNA viromes turning up a plethora of mostly novel picorna-like viruses, albeit with an unclear relationship to disease and perhaps some more closely linked to the invertebrates indicated above (Fawaz et al., 2016, Greninger and Jerome, 2016, Li et al., 2011, Moreno et al., 2017, Nagai et al., 2015, Naoi et al., 2016, Ng et al., 2014b, Phan et al., 2011, Reuter et al., 2012, Sano et al., 2016, Sasaki et al., 2015, Zhang et al., 2014). The recovery of Ancient Northwest Territories cripavirus from 700-year old frozen caribou feces illustrated an incredible stability of encapsidated RNA (Ng et al., 2014a). Screens for fish viruses are still in their infancy and yet the phylogenetic diversity of fish and perhaps their intense exposure to marine RNA viruses may prove them to be the chordate arthropods of viral discovery. Novel members of the Orthomyxoviridae, Picornaviridae, and Reoviridae have recently been described in fish (Bacharach et al., 2016, Reuter et al., 2013, Reuter et al., 2015). Our understanding of reptile and amphibian viral diversity is also still in its early days. The discovery of a non-Old World, non-New World arenavirus in boid snakes with an envelope protein most closely related to filoviruses proved a tantalizing hint toward new variations on old taxonomies (Stenglein et al., 2012). For the basic scientists, a number of novel RNA viruses have been discovered in model organisms such as C. elegans or D. melanogaster (Félix et al., 2011, Webster et al., 2015). Thousands of new viruses have been found in plants through metagenomics, with most awaiting further characterization or even finished assembly (Roossinck et al., 2010). Plant virus metagenomics is probably one of the more fertile areas of growth for viral discovery in the coming years given their incredible biodiversity (Roossinck, 2012). RNA viral metagenomics has made a number of contributions to our understanding of viral diversity at the far reaches of life. Sequencing of a hot, acidic lake and wastewater revealed a novel ssDNA virus with an RNA virus-like capsid protein, suggesting past recombination between DNA and RNA viruses (Diemer and Stedman, 2012; “RDHV-like virus SF1 replication-like protein and capsid protein genes, complete cds,” 2015). While there are still no definitive RNA viruses of Archaea, viral RNA metagenomics isolated a 5.6 kb contig that contained an capsid-like protein and an RNA-dependent RNA polymerase that differed from the RdRps of viruses from eukaryotes and bacteria (Bolduc et al., 2012). The same can be said of the ciliates, who were previously noted among the eukaryota for having no known RNA viruses (Koonin et al., 2008). Metagenomic sequencing of wastewater after a rainstorm revealed two RNA contigs that were highly suggestive of ciliate RNA viruses in a metagenomic background of Tetrahymena and several viruses best translated in the ciliate genetic code were found in the invertebrate surveys detailed above (Greninger and DeRisi, 2015a, Shi et al., 2016). Given the incredible growth in bacteriophage diversity from metagenomics, RNA phages were a surprisingly late arrival to the metagenomic discovery party. As recently as 2016, genome sequences from RNA phages were surprisingly few and far between with only 11 ssRNA and 5 dsRNA genomes compared to over 1000 DNA bacteriophage genomes from 494 species (Greninger and DeRisi, 2015b, Krishnamurthy et al., 2016). Mining of metagenomic datasets yielded an additional 122 partial genomes from novel RNA phages covering >100 novel species including the first known RNA phage number of a Gram-positive bacteria (Krishnamurthy et al., 2016). These sequences included multiple unique genomic arrangements as well as ORFs that had no similarity to known proteins. Invertebrate virus sequencing efforts detailed above also found a similar number of levi-like viruses (Shi et al., 2016). Given the many uses of the MS2 phage coat protein for understanding RNA-protein interactions and as a tool for bioengineering RNA affinity purification, the expansion of the RNA phageome will likely create a number interesting tools for molecular biology that can build on MS2.

Pity the ICTV

The ridiculous number of virus discoveries in the past several years has put an incredible strain on the whole system of viral taxonomy (Fig. 1C–E). The inclusion of uncharacterized metagenomic viral sequence in taxonomy has long been of some debate at the International Committee on Taxonomy of Viruses (ICTV), the adjudicator of novel families, genera, and species of the viral world. Only this past year did the ICTV codify its existing policy through issue of a consensus statement on the inclusion of metagenomic data in its consideration of taxonomical placement of viruses (Simmonds et al., 2017). The torrid pace of discoveries has forced the hand of viral discoverers to come up with euphonious names for viruses for which there is almost no biological understanding (ICTV, 2017a, ICTV, 2017b). Convention previously dictated something like location, host, and number, even if the World Health Organization now recommends against ruining the tourist economy along the Ebola River or in Coxsackie, New York (Fukuda et al., 2015). Since sequence similarities govern our understanding of genomic function, there is a temptation to name based on homology and let past discoveries anchor the novel (e.g., picobirnavirus or dicipivirus/cadicivirus/picodicistrovirus) (Woo et al., 2012). Default nomenclature is approaching that of drug manufacturers or therapeutic antibody naming, with alternating consonant-vowel pairs that might have some basis in Latin or relation to the location but obscures the actual place (e.g. avisivirus, aquamavirus, hunnivirus, harkavirus) (Boros et al., 2015, Reuter et al., 2012). Recently, five ancient Chinese states formed the basis for naming novel family lineages (Shi et al., 2016). Detection of ancient recombination in RNA virus sequences is revealing genomic abominations that only a liger could love. What do you get when you cross a picornavirus and a calicivirus? Probably a non-functional, non-replicating piece of RNA, but Picalivirus A–D are still a thing (Greninger and DeRisi, 2015c; Ng et al., 2012). Tombusviridae and Nodaviridae? Tombunodavirus (Grasse and Spring, 2017, Greninger and DeRisi, 2015d). The inventiveness of nomenclature for novel virus discovery in a space of anarchy combats a steady march of rules and reason by the ICTV (Simmonds et al., 2017). Not even clinically relevant viruses such as human parainfluenza viruses or respiratory syncytial viruses – now human respirovirus, rubulavirus, orthopneumovirus – can resist binomial nomenclature and taxonomical reassignment (Adams et al., 2017). However, if discoverers get out far enough of the rationalization, their original names can take advantage of the movement to have an outsized influence on viral nomenclature (Boros et al., 2015, Runckel et al., 2011, Woo et al., 2012).

Remaining frontiers

In a field with seemingly no end to frontiers, it is curious to define even more. No doubt the ‘virtuous cycle’ of metagenomic sequencing will only increase in breadth and depth in the coming years with a concomitant growth of new bioinformatic algorithms leading to discovery of even more new organisms and further decrease in so-called unalignable viral “dark matter” (Fig. 2 ) (Greninger et al., 2015a, Krishnamurthy and Wang, 2017). Turning the crank on RNA virus discovery metagenomics is now the provenance of undergraduate theses (Makhsous et al., 2017, Shean et al., 2017, Zaaijer et al., 2016). Other than sampling more broadly and maintaining the exponential growth in Genbank, what are orthogonal challenges for future RNA virus metagenomic studies?

Fig. 2

The conjoined circles of metagenomic success.

The increased output of modern-day sequencers has led to increased metagenomic sequencing of samples, both environmental and clinical. This in turn has led to an explosion in the NCBI Genbank and WGS databases. New discoveries beget the discovery of more divergent new viruses and organisms as they are now alignable to new references in the database. Rather than searching through gapped alignments, the increased coverage of the Genbank reference database allows for more exact k-mer searching, which allows for faster, more sensitive alignments of reads. This in turn makes metagenomic sequencing more useful, especially in the clinic, which in turn begets more sequencing.

The conjoined circles of metagenomic success. The increased output of modern-day sequencers has led to increased metagenomic sequencing of samples, both environmental and clinical. This in turn has led to an explosion in the NCBI Genbank and WGS databases. New discoveries beget the discovery of more divergent new viruses and organisms as they are now alignable to new references in the database. Rather than searching through gapped alignments, the increased coverage of the Genbank reference database allows for more exact k-mer searching, which allows for faster, more sensitive alignments of reads. This in turn makes metagenomic sequencing more useful, especially in the clinic, which in turn begets more sequencing.

Easier, faster, better, broader protocols

To start in the wet lab, better methods of host depletion and recovery of full viral genomes are absolutely required. Too many analogies to “finding a needle in a haystack” in the literature necessitate the use of cellulases (Allen et al., 2009, Kowalchuk et al., 2007, Lax and Gilbert, 2015, Lecuit and Eloit, 2014, Naccache et al., 2014, Soueidan et al., 2015). Even if RNA viral transcripts constitute >50% of an arthropod, they rarely do so in mammalian tissue (Feng et al., 2008, Shi et al., 2016). In both clinical metagenomics and complex mixtures there are always low concentration viral sequences, either due to time of sampling, viral biology, or the long tail of ecological abundance, which deeper sequencing alone may not solve. Synthetic biology and programmable nucleases may allow new options for depletion beyond the ribosome although current efficiencies must be improved (Gu et al., 2016, Matranga et al., 2014). Even though sample preparation methods have become considerably easier in the past decade, they still take several hours of hands-on time and have not routinely been ported to automated liquid handlers. With plunging sequencing costs, library preparation costs for RNA virus metagenomics have taken a front seat. The time and cost of library preparation hamper adoption of metagenomics in the clinical virology lab. Similar to the expanding suite of Cas programmable sequence-guided nucleases, an RNA-directed transposase for adapter tagging followed by one-step, dual-indexed amplification would speed up library preparation as the bulk of time in current preparations is spent on double-stranded cDNA synthesis (Gertz et al., 2012). Such an enzyme might be discovered through the approaches highlighted here, or perhaps through directed evolution of existing DNA transposases (Adey et al., 2010). Another solution might be a one-pot mix of enzymes that performed library preparation in a similar fashion as Gibson cloning. Even in a host-free environment, current amplification methods do not recover full viral genomes from end-to-end. Instead, the common paradigm is random priming with amplification followed by a follow-up step of RACE to recover ends (Li et al., n.d.; Shi et al., 2016). This second step is highly laborious, especially for segmented RNA viruses, and difficult to execute on viral sequences at low concentration or in complex mixtures. Furthermore, current transposase-based library methods can produce inversions or sequencing artifacts at the end of genome segments or in areas of RNA secondary structure. As “unbiased” or “agnostic” as metagenomics can be, we still rely on sequencing by synthesis and a four base read-out. The ultimate immunoevasion strategy may be virally-encoded nucleotides (Bryson et al., 2015, Murphy et al., 2013, Weigele et al., 2017). Host and virally-encoded RNA editing already bedevils calling of accurate whole genome sequences in both positive- and negative-strand RNA viruses (Park et al., 2015, Pelet et al., 1991, Piontkivska et al., 2017, Vidal et al., 1990). Adenosine-based modifications of viral genomes have shown effects on viral and host biology, but detecting them requires additional modifications to most sequencing protocols (Gokhale and Horner, 2017, Gonzales-van Horn and Sarnow, 2017, Kennedy et al., 2016, Kennedy et al., 2017). Our understanding of RNA base modifications in viral genomes is still in its infancy. Here, nanopore-based approaches to sequencing may hold the key to new discoveries as they can directly detect modified nucleotides, although current approaches are as dependent on the biology of pore proteins as we are on polymerases (Ayub et al., 2013, Carlsen et al., 2014). The ultimate answer may be direct mass spectrometry-based detection of viral nucleotides (Gooskens et al., 2014, Cobo, 2013). Improvements in sensitivity of nanopore sequencing in terms of sequencing depth and the host depletion strategies highlighted above are required before nanopore-based metagenomics becomes routine or meaningful (Greninger et al., 2015b). Other than the retroviruses, RNA viruses do not readily provide host linkage information in metagenomics like DNA viruses and phages often do. Linking RNA virus and host has been imputed based on transcript levels or dinucleotide usage, although the latter has recently been shown to more correlated with viral family than host species (Kapoor et al., 2010, Giallonardo et al., 2017, Shi et al., 2016). Even when sequencing discrete organisms such as a singular honeybee, the co-existence of eukaryotic parasites means that imputation of viral host cannot be exact (Runckel et al., 2011). Host promiscuity seems to be a common theme among newly-described RNA viruses (Nunes et al., 2017). Highly-indexed libraries, chemical linkage of RNA species, and physical partitioning through single cell transcriptomics currently provide the best potential solution to the host imputation problem in complex mixtures (Burton et al., 2014, Chow et al., 2015, Turner et al., 2009).

Experimenting with function

To date, most metagenomics has adopted the 19th century British naturalist approach of cataloging inventory and diversity. This focus of the field has been somewhat punishing for those involved, with burnout of both scientists and reviewers from not fully understanding the why of it all (Canuti and van der Hoek, 2014). This author does not necessarily have a better plan, but perhaps phenotype might be a reasonable start. Mass spectrometry was available to previous generations of biochemists, but they did not necessarily stop to catalog the contents of every fraction, focusing instead on function. This is not to understate the critical impact of the technical training that comes from viral metagenomics, nor to minimize the power of the method. But once we know that viruses are diverse, some concern for function is most likely in order. Although they are now no doubt contributing, viral metagenomicists have previously borrowed liberally from the biochemical functions assigned to strings of sequence without confirmation. Genomics has been the wires that allowed biochemistry to scale (167, 168). A number of options for functional characterization of these viruses are available but they unfortunately require work. Additional culture models are needed to handle all the new discoveries and genetic engineering methods for more easily establishing cell lines from exotic species are worth pursuing (Ettayebi et al., 2016, Finkbeiner et al., 2012, Stenglein et al., 2012, Bell-Sakyi and Attoui, 2016, Janowski et al., 2017). Focusing discovery on genetically tractable organisms has allowed for relatively rapid functional studies of the new discovered C. elegans Orsay virus (Chen et al., 2017, Jiang et al., 2017). A number of culture-independent methods exist now to characterize the novel viruses and their genes. Biochemical assays for RdRp, proteases, helicases, capsids, and internal ribosome entry sites all exist and synthetic biology allows for easy cloning from a database to test these new proteins found through metagenomics (Ladd Effio et al., 2016, O’Donoghue et al., 2012, Peersen, 2017). Profiling known functions across the new viral species, such as RNA binding strength and sequence specificity of novel RNA phage proteins related to MS2 phage or viral protease specificity and kinetics, might be a place to start (O’Donoghue et al., 2012). Affinity purification-mass spectrometry is a powerful method that allows discovery of viral-host protein–protein interactions in the absence of culture (Jäger et al., 2011, Greninger et al., 2012, Medina et al., 2017, Greene et al., 2016). For vertebrate viruses, serological assays have long been used to confirm roles in disease (O’Sullivan et al., 1997, Bao et al., 2011, Coller et al., 2016). The ubiquity of these viruses also leads to questions of how we publish in the absence of experiments and what a sequence is worth. The number of novel viruses discovered needed for a high-profile paper has increased by logarithms (Krishnamurthy et al., 2016, Shi et al., 2016). In the absence of particular phenotypic data or wet-lab viral characterization, many authors have turned to Genome Announcements, biorxiv, or simply uploading to Genbank with extra metadata, figuring that sequencing is the most likely future method of both detection and discovery (Debat, 2017, Greninger and DeRisi, 2015e, Greninger and DeRisi, 2015f, Karamendin et al., 2016, Sharman et al., 2016, Sparks et al., 2013). Given the glut of new viruses, expansion of sequencing, and the time it takes to publish, scientists may be more likely to align to your novel virus rather than read about it in a journal and decide to screen for it. This method has allowed for rapid sharing of data and de facto global screens of novel RNA virus prevalence, tropism, and evolution that then lead to something resembling a story.

New roots

Perhaps the most difficult challenge to spring from sequence gazing at all the novel RNA viruses is our understanding of the origins of RNA viruses. At what point in host evolution does an RNA viral pathogen arrive? And what exactly did the host look like then? RNA virus metagenomics still does not provide the answer. Results from initial RNA virus genomics work suggested picorna-like viruses predated the radiation of the five supergroups of eukaryotic organisms (Koonin et al., 2008). Since then, hints of genetic exchange between hosts and viruses, prominent in DNA virus and bacteriophage genomics, are showing up in RNA viruses (Hughes and Stanway, 2000, Sasaki and Taniguchi, 2008, Shi et al., 2016, Staring et al., 2017). Accounting for each of the domains commonly present in RNA viruses is a challenge for any origin story, though greater knowledge of domain organization and swapping between different RNA viruses may help firm up these dates (Koonin et al., 2015). Biochemical characterization of the incredibly diverse RNA viruses discovered through ongoing metagenomic screens will also test theories of the origin of RNA viruses and better date the origin of different RNA virus clades relative to different epochs of eukaryotic evolution. Recent genetic and proteomic screens of eukaryotic RNA viruses have indicated key reliance on host lipid modifying enzymes that regulate vesicular transport (Greninger et al., 2013, Sasaki et al., 2012, Greninger, 2015, Carette et al., 2011, Marceau et al., 2016, Borawski et al., 2009, Berger et al., 2011, Arita et al., 2013, Nagy and Pogany, 2012, Xu and Nagy, 2016, Salloum et al., 2013). Membrane repurposing is a critical determinant of RNA virus replication which should be added to the “hallmark genes” of RNA viruses, though they currently escape sequence alignment profiling (Koonin and Dolja, 2014). These results have led to a hypothesis in which part of the RNA virus origins involve selfish vesicular transport that likely expanded with the rise of intracellular eukaryotic transport (Greninger, 2015, Koonin et al., 2006, Kuehn and Kesty, 2005). The list of RNA viruses that have acquired a cellular envelope continues to grow (Chen et al., 2015, Feng et al., 2013, McKnight et al., 2017). New concepts in bacterial vesicular transport might be consistent the close evolutionary link between the eukaryotic RNA virus polymerases and bacterial retroelements (Jan, 2017, Kuehn and Kesty, 2005).

Make it stop

To put it scientifically, it is nuts that all this and much more have happened in only approximately a decade. If they were not already, RNA viruses have become the star child of evolutionary biologists, whether for day-by-day or eon-by-eon examples of evolution (Koonin, 2007, Koonin et al., 2008, Koonin et al., 2015, Shi et al., 2016, Xue et al., 2017). With our world expanded and our place in it shrunk, there is still much to do regarding viral sequencing for our own quotidian human RNA viruses (Tang et al., 2017). There is but one way to ensure that we find no more viruses. Go sequence something.

213 in total

1. Metagenomic analysis of the viromes of three North American bat species: viral diversity among different bat species that share a common habitat.

Authors: Eric F Donaldson; Aimee N Haskew; J Edward Gates; Jeremy Huynh; Clea J Moore; Matthew B Frieman
Journal: J Virol Date: 2010-10-06 Impact factor: 5.103

2. Virus discovery: are we scientists or genome collectors?

Authors: Marta Canuti; Lia van der Hoek
Journal: Trends Microbiol Date: 2014-05 Impact factor: 17.079

3. Identification of further diversity among posaviruses.

Authors: Kaori Sano; Yuki Naoi; Mai Kishimoto; Tsuneyuki Masuda; Hitomi Tanabe; Mika Ito; Kazutaka Niira; Kei Haga; Keigo Asano; Shinobu Tsuchiaka; Tsutomu Omatsu; Tetsuya Furuya; Yukie Katayama; Mami Oba; Yoshinao Ouchi; Hiroshi Yamasato; Motohiko Ishida; Junsuke Shirai; Kazuhiko Katayama; Tetsuya Mizutani; Makoto Nagai
Journal: Arch Virol Date: 2016-09-12 Impact factor: 2.574

4. Metagenomic analysis of viruses from bat fecal samples reveals many novel viruses in insectivorous bats in China.

Authors: Xingyi Ge; Yan Li; Xinglou Yang; Huajun Zhang; Peng Zhou; Yunzhi Zhang; Zhengli Shi
Journal: J Virol Date: 2012-02-15 Impact factor: 5.103

5. Protein composition of the hepatitis A virus quasi-envelope.

Authors: Kevin L McKnight; Ling Xie; Olga González-López; Efraín E Rivera-Serrano; Xian Chen; Stanley M Lemon
Journal: Proc Natl Acad Sci U S A Date: 2017-05-10 Impact factor: 11.205

6. Draft Genome Sequence of Laverivirus UC1, a Dicistrovirus-Like RNA Virus Featuring an Unusual Genome Organization.

Authors: Alexander L Greninger; Joseph L DeRisi
Journal: Genome Announc Date: 2015-07-02

7. Spider Transcriptomes Identify Ancient Large-Scale Gene Duplication Event Potentially Important in Silk Gland Evolution.

Authors: Thomas H Clarke; Jessica E Garb; Cheryl Y Hayashi; Peter Arensburger; Nadia A Ayoub
Journal: Genome Biol Evol Date: 2015-06-08 Impact factor: 3.416

8. A Herpesviral induction of RAE-1 NKG2D ligand expression occurs through release of HDAC mediated repression.

Authors: Trever T Greene; Maria Tokuyama; Giselle M Knudsen; Michele Kunz; James Lin; Alexander L Greninger; Victor R DeFilippis; Joseph L DeRisi; David H Raulet; Laurent Coscoy
Journal: Elife Date: 2016-11-22 Impact factor: 8.140

Review 9. Making the Mark: The Role of Adenosine Modifications in the Life Cycle of RNA Viruses.

Authors: Sarah R Gonzales-van Horn; Peter Sarnow
Journal: Cell Host Microbe Date: 2017-06-14 Impact factor: 21.023

10. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

Authors: W Gu; E D Crawford; B D O'Donovan; M R Wilson; E D Chow; H Retallack; J L DeRisi
Journal: Genome Biol Date: 2016-03-04 Impact factor: 13.583

43 in total

Review 1. Metaviromics coupled with phage-host identification to open the viral 'black box'.

Authors: Kira Moon; Jang-Cheon Cho
Journal: J Microbiol Date: 2021-02-23 Impact factor: 3.422

2. Societal Implications of the Internet of Pathogens.

Authors: Alexander L Greninger
Journal: J Clin Microbiol Date: 2019-05-24 Impact factor: 5.948

3. Marine DNA Viral Macro- and Microdiversity from Pole to Pole.

Authors: Ann C Gregory; Ahmed A Zayed; Nádia Conceição-Neto; Ben Temperton; Ben Bolduc; Adriana Alberti; Mathieu Ardyna; Ksenia Arkhipova; Margaux Carmichael; Corinne Cruaud; Céline Dimier; Guillermo Domínguez-Huerta; Joannie Ferland; Stefanie Kandels; Yunxiao Liu; Claudie Marec; Stéphane Pesant; Marc Picheral; Sergey Pisarev; Julie Poulain; Jean-Éric Tremblay; Dean Vik; Marcel Babin; Chris Bowler; Alexander I Culley; Colomban de Vargas; Bas E Dutilh; Daniele Iudicone; Lee Karp-Boss; Simon Roux; Shinichi Sunagawa; Patrick Wincker; Matthew B Sullivan
Journal: Cell Date: 2019-04-25 Impact factor: 41.582

Review 4. Diversity and evolution of the animal virome.

Authors: Erin Harvey; Edward C Holmes
Journal: Nat Rev Microbiol Date: 2022-01-04 Impact factor: 60.633

Review 5. Revolutionized virome research using systems microbiology approaches.

Authors: Suwalak Chitcharoen; Pavaret Sivapornnukul; Sunchai Payungporn
Journal: Exp Biol Med (Maywood) Date: 2022-06-20

6. Isolation of a novel rhabdovirus and detection of multiple novel viral sequences in Culex species mosquitoes in the United States.

Authors: Chandra S Tangudu; Alissa M Hargett; S Viridiana Laredo-Tiscareño; Ryan C Smith; Bradley J Blitvich
Journal: Arch Virol Date: 2022-09-03 Impact factor: 2.685

7. A Needle in A Haystack: Tracing Bivalve-Associated Viruses in High-Throughput Transcriptomic Data.

Authors: Umberto Rosani; Maxwell Shapiro; Paola Venier; Bassem Allam
Journal: Viruses Date: 2019-03-01 Impact factor: 5.048

8. Partitiviruses Infecting Drosophila melanogaster and Aedes aegypti Exhibit Efficient Biparental Vertical Transmission.

Authors: Shaun T Cross; Bernadette L Maertens; Tillie J Dunham; Case P Rodgers; Ali L Brehm; Megan R Miller; Alissa M Williams; Brian D Foy; Mark D Stenglein
Journal: J Virol Date: 2020-09-29 Impact factor: 5.103

Review 9. Evolution and ecology of plant viruses.

Authors: Pierre Lefeuvre; Darren P Martin; Santiago F Elena; Dionne N Shepherd; Philippe Roumagnac; Arvind Varsani
Journal: Nat Rev Microbiol Date: 2019-07-16 Impact factor: 60.633

10. Nanopore Sequencing Is a Credible Alternative to Recover Complete Genomes of Geminiviruses.

Authors: Selim Ben Chehida; Denis Filloux; Emmanuel Fernandez; Oumaima Moubset; Murielle Hoareau; Charlotte Julian; Laurence Blondin; Jean-Michel Lett; Philippe Roumagnac; Pierre Lefeuvre
Journal: Microorganisms Date: 2021-04-23