Literature DB >> 21111643

Metagenomics and the molecular identification of novel viruses.

Abstract

There have been rapid recent developments in establishing methods for identifying and characterising viruses associated with animal and human diseases. These methodologies, commonly based on hybridisation or PCR techniques, are combined with advanced sequencing techniques termed 'next generation sequencing'. Allied advances in data analysis, including the use of computational transcriptome subtraction, have also impacted the field of viral pathogen discovery. This review details these molecular detection techniques, discusses their application in viral discovery, and provides an overview of some of the novel viruses discovered. The problems encountered in attributing disease causality to a newly identified virus are also considered.

Entities: Species

Mesh：

Substances：
DNA, Viral

Year: 2010 PMID： 21111643 PMCID： PMC7110547 DOI： 10.1016/j.tvjl.2010.10.014

Source DB: PubMed Journal: Vet J ISSN： 1090-0233 Impact factor: 2.688

Introduction

Given that animal pathogens (in particular viruses) are considered a significant source of emerging human infections (Cleaveland et al., 2001), the identification and optimal characterisation of novel organisms affecting both domestic and wild animal populations is central to protecting both human and animal health. Recent outbreaks of human infection caused by influenza H7N7 virus transmitted from poultry (Koopmans et al., 2004) and H1N1 virus transmitted from pigs (Dawood et al., 2009, Itoh et al., 2009) are cases in point, highlighting the need for ongoing, vigilant epidemiological surveillance of such pathogens in animal populations. Moreover, epidemiological studies strongly suggest that novel infectious agents remain to be discovered (Woolhouse et al., 2008) and may be contributing to a host of cancers, autoimmune disorders, and degenerative diseases in humans (Relman, 1999, Dalton-Griffin and Kellam, 2009). Similar, yet-to-be-identified viruses may be contributing to the pathogenesis of similar diseases in animals. Viruses can be identified by a wide range of techniques. Traditional methods include electron microscopy, cell culture, inoculation studies and serology (Storch, 2007). Whereas many of the viruses known today were first identified by these techniques, the methods have limitations. For instance, many viruses cannot be cultivated in the laboratory and can only be characterised by molecular methods (Amann et al., 1995), and in recent years we have seen the increasing use of these techniques in pathogen discovery (Fig. 1 ).

Fig. 1

A schematic overview of the molecular methods currently available for viral discovery. Hybridisation methods include microarray and subtractive hybridisation techniques such as representational difference analysis. PCR-based methods include degenerate PCR, degenerate oligonucleotide primed PCR (DOP-PCR), sequence-independent single primer amplification (SISPA), random PCR and rolling circle amplification (RCA). One such approach uses sequence information from known pathogens to identify related but undiscovered agents through cross-hybridisation. Examples include microarray (Wang et al., 2002) and subtractive (Lisitsyn et al., 1993) hybridisation-based methods. Another advance has involved PCR amplification of the pathogen genome, where there is complete knowledge of the pathogen to be amplified (conventional PCR), or where this information is limited (degenerate PCR). Other PCR methods such as sequence-independent single primer amplification, degenerate oligonucleotide primed PCR, random PCR and rolling circle amplification, also have the capacity to detect completely novel pathogens. Hybridisation and PCR-based methods are more effective if the sample to be analysed is first enriched for virus, a process achieved by removing host and other contaminating nucleic acids. The end result of most hybridisation and PCR methods are amplified products that require definitive identification by sequencing. Advances in sequencing that have facilitated virus discovery include the arrival of ‘next or second generation sequencing’, which can generate very large amounts of sequence data. Technological advances have also resulted in the development of metagenomics, the culture-independent study of the collective set of microbial populations (microbiome) in a sample by analysing the sample’s nucleotide sequence content (Petrosino et al., 2009). The different microorganisms constituting a microbiome can include bacteria, fungi (mostly yeasts) and viruses. Examples of microbiomes in mammalian biology include the microbial populations inhabiting the human intestine or mucosal surfaces both in health and disease. To date, the study of the viral microbiome (virome) has been applied to a range of biological and environmental samples including human (Breitbart et al., 2003, Zhang et al., 2006, Finkbeiner et al., 2008) and equine (Cann et al., 2005) intestinal contents, bat guano (Li et al., 2010), sea water (Breitbart et al., 2002, Angly et al., 2006, Williamson et al., 2008), marine sediment (Breitbart et al., 2004), fresh water (Breitbart et al., 2009, Djikeng et al., 2009), hot springs (Schoenfeld et al., 2008), soil (Fierer et al., 2007) and plants (Coetzee et al., 2010). Early results from a large initiative to describe the humane microbiome associated with health and disease have recently been published (Nelson et al., 2010), and such findings, together with those of other studies, are likely to lead to the discovery of a wealth of previously unknown viruses. This review describes the current molecular techniques available for the detection of viruses infecting animals and humans. We begin by discussing hybridisation and PCR-based methods and describe advances that have facilitated the detection of completely novel viruses. Advances in sequencing methodology and data analysis, such as transcriptome subtraction, are also appraised. The review concludes with an assessment of the problems encountered when attempting to attribute disease causality to a newly discovered virus.

Hybridisation-based methods

Microarray techniques

Microarrays consist of high-density oligonucleotide probes (or segments of DNA) immobilised on a solid surface. Any complementary sequences (labelled with fluorescent nucleotides) in a test sample hybridise to the probe on the microarray. The results of hybridisation are detected and quantified by fluorescence-based detection and thus the relative abundance of nucleic acid sequences in a sample can be determined (Clewley, 2004). Two types of microarray techniques are commonly used for virus identification. The first uses short oligonucleotide probes (sensitive to single-base mismatches) to detect or identify known, or sub-types of known, viruses. Such a technique has been used to discriminate between human herpes viruses (for example Foldes-Papp et al., 2004). The second type of microarray method employs long oligonucleotide probes (60 or 70 bp) that allow for sequence mismatches (Wang et al., 2002). Microarray applications have been used in the discovery of novel animal viruses such as a coronavirus in a Beluga whale (Mihindukulasuriya et al., 2008), the bornavirus that causes proventricular dilation disease in wild psittacine birds (Kistler et al., 2008), and an enterovirus associated with tongue erosions in bottle-nose dolphins (Nollens et al., 2009). In human medicine they have been used to characterise SARS-CoV (Wang et al., 2003), and to identify novel coronaviruses and rhinoviruses in asthmatics (Kistler et al., 2007), gammaretrovirus in prostate tissue (Urisman et al., 2006) and cardioviruses in the gastro-intestinal tract (Chiu et al., 2008). Microarray technology is a powerful tool as it screens for a large number of potential pathogens simultaneously (Wang et al., 2002, Palacios et al., 2007, Xiao-Ping et al., 2009). The method does have limitations however, as the process of interpreting hybridisation signals is not a trivial one, often involving the empirical characterisation of signals produced by known viruses and the development of specialised software (Urisman et al., 2005). Furthermore, microarray techniques utilise probes with a finite specificity for a particular pathogen or small group of pathogens so that novel or highly divergent strains or viruses can be difficult to detect. Non-specific binding of test material to hybridisation probes can also result in loss of test sensitivity. Despite these limitations, microarrays have proven extremely effective in novel pathogen discovery.

Subtractive hybridisation

This form of hybridisation identifies sequence differences between two related samples and is based on the principle of removing common nucleic acid sequences from two samples while leaving differing sequences intact. Such a process can be applied to any pair of nucleic acid sources such as ‘treated’ vs. ‘untreated’ or ‘diseased’ vs. ‘undiseased’ tissue, or to samples obtained prior to and after experimental infection (Muerhoff et al., 1997). Subtractive hybridisation uses two nucleic acid sources termed ‘tester’ and ‘driver’ with only the tester containing pathogen sequences (Ambrose and Clewley, 2006). DNA in both the tester and driver is digested by restriction enzymes and adaptors are ligated to the DNA fragments from the tester sample only. The two DNA populations are mixed, denatured and annealed to form three types of molecule: (1) tester/tester; (2) hybrids of tester/driver, and (3) driver/driver. The tester/tester molecules should now be enriched for pathogen, which are preferentially and exponentially amplified by primers specific for the adaptors present on both DNA strands. The tester/driver molecules, which only contain an adaptor on one DNA strand, undergo linear amplification but are then removed by enzymatic digestion. The driver/driver molecules have no adaptors and are not amplified. Sufficiently enriched in this way, the tester sample is sequenced and the pathogen identified. An example of a subtractive hybridisation method is representational difference analysis (RDA) (Lisitsyn et al., 1993). Despite its impressive performance in model systems, RDA has had limited success in the discovery of novel viruses, largely due to the requirement for two highly matched nucleic acid sources. Restriction enzyme digestion also leads to an increased DNA complexity and the risk of inefficient subtractive hybridisation, a particular problem with samples containing large amounts of host DNA, such as serum or plasma. Despite these limitations, RDA has been used to identify the agent causing Kaposi’s sarcoma (human herpesvirus-8) (Chang et al., 1994), torque teno or transfusion-transmitted virus (TTV) (Nishizawa et al., 1997) and the hepatitis GBV-A and GBV-B viruses (Simons et al., 1995b).

PCR-based methods

Degenerate PCR

Conventional PCR is frequently used to identify or exclude the presence of a virus in samples. Given that the method relies on the annealing of specific primers complementary to the pathogen’s genomic sequence of interest, it is unsuitable for the detection of novel viruses where there are marked sequence differences from the primers. Prior knowledge of the viral sequence is therefore a pre-requisite. An alternative PCR method, degenerate PCR, uses primers designed to anneal to highly-conserved sequence regions shared by related viruses. Because these regions are almost never completely conserved, primers generally include some degeneracy that permits binding to all or the most common known variants on the conserved sequence (Rose et al., 1998). The overall aim is to achieve a balance between covering all possible viral variants within a family (i.e. primers with high degeneracy) and creating an unwieldy number of different primers. At high levels of degeneracy, only a small proportion of primers are able to prime DNA synthesis, whereas a large proportion of the remaining primers will be able to anneal but, because of sequence mismatches, will be refractory to PCR extension. The maximum level of degeneracy is usually fixed at approximately 256, and degeneracy can be reduced by using codon usage tables (Wada et al., 1992) and inter-codon dinucleotide frequencies (Smith et al., 1983). Degenerate primers are used to detect viruses, including novel viruses, from existing sufficiently homologous virus families. Such primers have been used in the identification of pig endogenous retrovirus (PERV) (Patience et al., 1997), numerous macaque gammaherpesviruses (Van Devanter et al., 1996, Rose et al., 1997), a novel alphaherpesvirus associated with death in rabbits (Jin et al., 2008), and a novel chimpanzee polyomavirus (Johne et al., 2005). Novel viruses infecting humans detected using this technique include hepatitis G virus (Simons et al., 1995a), a hantavirus (sin nombre virus) (Nichol et al., 1993), coronaviruses (Sampath et al., 2005), and parainfluenza viruses 1–3 (Corne et al., 1999).

Sequence-independent single primer amplification (SISPA)

Sequence-independent amplification of viral nucleic acid avoids the potential limitations of other methods, particularly the lack of microarray hybridisation due to genetic divergence from known viruses, the absence of a matched sample for subtractive hybridisation and where PCR amplification using conventional or degenerate primers fails. The advantages of these methods are their ability to detect novel viruses highly divergent from those already known, their relative speed and simplicity of use and their lack of bias in identifying particular groups of viruses (Delwart, 2007). A sequence-independent amplification technique termed sequence-independent single primer amplification (SISPA) was introduced almost two decades ago to identify viral nucleic acid of unknown sequence present in low amounts (Reyes and Kim, 1991). SISPA was used first to sequence the norovirus genome from human faeces (Matsui et al., 1991), in addition to an astrovirus (Matsui et al., 1993) and a rotavirus (Lambden et al., 1992) infecting humans. Originally the SISPA method involved endonuclease digestion of DNA, followed by directional ligation of an asymmetric adaptor or primer on to both ends of the DNA molecule (Reyes and Kim, 1991). Common end sequences of the adaptor allowed the DNA to be amplified in a subsequent PCR reaction using a complementary single primer. Due to the low complexity of a viral genome, enzymatic digestion produces a large amount of a limited number of fragments. After amplification these are visible as discrete bands on an agarose gel and can be sequenced and identified (Allander et al., 2001). Since animal and bacterial genomes are larger and more complex, restriction digestion generates many different-sized fragments, the amplification of which can result in ‘smears’ on agarose gel. One of the disadvantages of sequence-independent amplification techniques is the contemporaneous amplification of ‘contaminating’ host and bacterial nucleic acid. Enriching methods that reduce such ‘background’ genomic material include filtration, ultra-centrifugation, density-gradient ultra-centrifugation and enzymatic digestion of non-viral nucleic acids using DNAse and RNAse (Delwart, 2007). These techniques take advantage of the differential protection afforded to the virus genome by nucleocapsids and capsids. However, as viral nucleic acid not protected by such capsids is removed by the purification process and not amplified, some potential assay sensitivity is lost. Furthermore, the random nature of the amplification reaction means that great care must be taken to maintain PCR integrity and prevent cross-contamination. The original SISPA method has now been modified to include steps to detect both RNA and DNA viruses, to enrich for virus, and to remove host genomic and contaminating nucleic acid (Allander et al., 2001, Djikeng et al., 2008). Novel human and animal viruses detected in clinical samples using these modified methods include parvoviruses (Allander et al., 2001, Jones et al., 2005), a coronavirus (van der Hoek et al., 2004), an adenovirus (Jones et al., 2007a), an orthoreovirus (Victoria et al., 2008), a picornavirus (Jones et al., 2007b), and a porcine pestivirus (Kirkland et al., 2007).

Degenerate oligonucleotide primed PCR (DOP-PCR)

This sequence-independent amplification technique, termed degenerate oligonucleotide primed PCR (DOP-PCR), was initially developed for genome mapping studies (Telenius et al., 1992), but has more recently been modified to detect viral genomic material (Nanda et al., 2008). DOP-PCR uses primers with a short (4–6 nucleotide) 3′-anchor sequence which typically occur every 256 and 4096 bp, respectively, preceded by a non-specific degenerate sequence of 6–8 nucleotides for random priming. Immediately upstream of the non-specific degenerate sequence, each primer also contains a defined 5′-sequence of 10 nucleotides. Because of the degenerate sequence, each reaction includes a mixture of several thousand different primers. At low stringency during the first few DOP-PCR amplification cycles, at least 12 consecutive nucleotides from the 3′ end of the primer anneal to DNA sequences on the PCR template. In subsequent cycles at higher stringency, these initial PCR products are amplified further using the same primer population. DOP-PCR, when followed by sequencing of the product, has the advantage of facilitating the detection of both RNA and DNA viruses without a priori knowledge of the infectious agent (Nanda et al., 2008).

Random PCR

This further, alternative sequence-independent amplification technique is known as ‘random’ PCR (Froussard, 1992). The method is commonly used to amplify and label probes with fluorescent dyes for microarray analysis but has also been used in the identification of novel viruses. Unlike SISPA, random PCR has no requirement for an adaptor ligation step and compared with ‘conventional’ PCR, which utilises a pair of complementary ‘forward’ and ‘reverse’ primers to amplify DNA in both directions, random PCR utilises two different primers and two separate PCR reactions. The single primer used in the first PCR reaction has a defined sequence at its 5′ end, followed by a degenerate hexamer or heptamer sequence at the 3′ end. A second PCR reaction is then performed with a specific primer complementary to the 5′ defined region of the first primer thus enabling amplification of products formed in the first reaction. Random PCR has been used extensively for the detection of both DNA and RNA viruses and is currently the molecular method most commonly used to identify unknown viruses. Viruses infecting animals identified using this technique include a dicistrovirus associated with ‘honey-bee colony collapse disorder’ (Cox-Foster et al., 2007), a seal picornavirus (Kapoor et al., 2008), and circular DNA viruses in the faeces of wild-living chimpanzees (Blinkova et al., 2010). Random PCR has also proved successful in detecting novel viruses infecting humans including a parvovirus (Allander et al., 2005), a coronavirus (Fouchier et al., 2004), and a polyomavirus in patients with respiratory tract disease (Allander et al., 2007), a parechovirus (Li et al., 2009c), a picornavirus (Li et al., 2009b), and a bocavirus in patients with diarrhoea (Kapoor et al., 2009), a human gammapapillomavirus in an patient with encephalitis (Li et al., 2009a), and several viruses in children with acute flaccid paralysis (Blinkova et al., 2009, Victoria et al., 2009).

Rolling circle amplification (RCA)

A ‘rolling circle’ sequence-independent amplification technique makes use of the property of circular DNA molecules such as plasmids or viral genomes replicating through a rolling circle mechanism. RCA mimics this natural process without requiring prior knowledge of the viral sequence, utilising random hexamer primers that bind at multiple locations on a circular DNA template, and a polymerase enzyme, such as bacteriophage ϕ29 DNA polymerase. The polymerase enzyme has a strong strand-displacing capability, high processivity (approximately 70 000 bases/binding event), and proof-reading activity (Esteban et al., 1993). When the polymerase enzyme comes ‘full circle’ on a circular viral genome it displaces its 5′ end and continues to extend the new strand multiple times around the DNA circle. Random primers can then anneal to the displaced strand and convert it to double-stranded DNA (Dean et al., 2001). By using multiply-primed RCA, unknown circular DNA templates can be exponentially amplified. The long, double-stranded DNA products can then be cut with a restriction enzyme to release linear fragments, sequenced and identified, the length of the circle. Although technically more demanding than other methods of sequence-independent amplification, the RCA approach has facilitated the identification of a novel variant of bovine papillomavirus type-1 (Rector et al., 2004b) and of novel papillomaviruses in a Florida manatee (Rector et al., 2004a). This method has also yielded the full genomic sequences of polyomaviruses (Johne et al., 2006b), an anellovirus (Niel et al., 2005), circoviruses (Johne et al., 2006a) and wasp polydnavirus (Espagne et al., 2004). Through the use of a combination of RCA and SISPA, nine anelloviruses found in human plasma and cat saliva have been detected and characterised (Biagini et al., 2007).

Sequencing methods

Most hybridisation and PCR methods generate products that require definitive identification by sequencing. One method of achieving this is the commonly used ‘chain termination method’, often referred to as ‘Sanger’ or ‘dideoxy sequencing’. This method is based on the DNA polymerase-dependent synthesis of a complementary DNA strand in the presence of natural 2′-doexynucleotides (dNTPs) and 2′,3′-didoexynucleotides (ddNTPs) that serve as non-reversible synthesis terminators (Sanger et al., 1977). A limitation of this technique in terms of virus identification can be the requirement to clone viral sequences into bacteria prior to sequencing, although direct sequencing of PCR products can also be employed. When cloning is performed using this method, host-related bias can occur (Hall, 2007), and, as only a relatively limited number of clones can be sequenced, methods to enrich for virus prior to amplification are required. Recently, use of the Sanger method has been partially succeeded by ‘next generation sequencing’ technologies that circumvent the need for cloning by using highly efficient in vitro DNA amplification (Morozova and Marra, 2008). Next generation sequencing technology includes the 454 pyrosequencing-based instrument (Roche Applied Sciences), genome analysers (Illumina) and the SOLiD system (Applied Biosystems). This approach dramatically increases cost-effective sequence throughput, albeit at the expense of sequence read-length. Compared to read-lengths in the region of up to 900 bp produced by modern automated Sanger instruments, read-lengths of approximately 76–106 bp are generated by Illumina and of 250–400 bp by 454 technology. The comparatively short read-length of next generation sequencing technologies is however compensated for by the large number of ‘reads’ generated. Typically 100 kilobases of sequence data are produced from a modern Sanger instrument with 454 sequencing capable of generating up to 400 megabases of data, and Illumina sequencing technology can produce up to 20 gigabases of sequence data/run (Metzker, 2010).

Bioinformatics

Several different approaches have been used to analyse data produced by sequencing methods. To date, the majority of novel viruses have been discovered using Basic Local Alignment Search Tool (BLAST) programmes that compare detected nucleotide sequences to those in a database, and rely on the fact that novel viruses have some homology to known viruses. Detecting distant viral relatives or completely new viruses can however be problematic. For instance, a proportion of sequences (5–30%) derived from animal samples by sequence-independent amplification methods, and an even greater fraction of sequences derived from environmental samples, do not have nucleotide or amino acid sequences similar to those of viruses listed in existing databases (Delwart, 2007). However, using these methods, viruses have been identified that are distantly related to known viruses. Several approaches can be used to increase the likelihood of identifying virus, including ‘querying’ translated DNA sequences against a translated DNA database, as evolutionary relationships remain detectable for longer at the amino acid than at the nucleotide level. The computational generation of theoretical ancestral sequences, and their subsequent use in sequence similarity searches, may also improve identification of highly divergent viral sequences (Delwart, 2007). Computational biologists have also developed new ingenious algorithms and techniques to analyse data produced by next generation sequencing to aid the identification of novel viruses (Wooley et al., 2010). Before viruses are identified, the hybridisation and PCR methods previously described generally require both an initial step, to enrich for virus, and an amplification step (Fig. 2 A). Enrichment can result in loss of viral nucleic acid thus reducing test sensitivity, and amplification can generate bias towards a dominant (potentially host-derived) sequence. A method known as transcriptome subtraction has been developed for viral discovery (Weber et al., 2002) with the advantage that it can be performed without the need for enrichment or amplification (Fig. 2B). Transcriptome subtraction is based on the principal that genes are transcribed (expressed) to produce mRNA, which can be converted in vitro to single-stranded DNA product complementary DNA (cDNA). The sequencing of this cDNA, rather than genomic DNA, therefore allows the transcribed portion of the genome to be analysed. In view of the large number of transcripts present, sequencing is usually performed using next generation technologies.

Fig. 2

Sequence of events in the molecular detection of viruses: (A) Samples processed by hybridisation or PCR require steps to enrich for virus before amplified products are sequenced and identified. Enrichment may result in decreased assay sensitivity, and amplification can generate bias towards a dominant sequence; (B) Transcriptome subtraction methods can be performed without enrichment or amplification with direct sequencing of nucleic acids extracted from a sample of interest. Subsequent subtraction of resulting sequences from databases facilitates virus identification. The technique works on the assumption that a sample infected with a virus would contain host and viral transcripts. Host transcript sequences are aligned and subtracted from public databases; in the case of a human sample, these include reference sequences such as the human RefSeq RNA, mitochondrial or assembled chromosome sequences in the National Centre for Biotechnology Information (NCBI) databases. After aligning and subtracting human sequences against databases, non-matched virus-enriched sequences will remain and can be further studied. With the completion of the sequencing of several animal genomes, transcriptome subtraction techniques are applicable to a variety of other species, and the possibility exists to use both public databases and subtraction against un-infected control material. A transcriptome subtraction method has been used to identify a previously unknown polyomavirus in human Merkel cell carcinoma (Feng et al., 2008) and to identify an uncharacterised arenavirus associated with three transplant-related deaths (Palacios et al., 2008). This technique has the advantage of being able to identify very small amounts of virus, as in the case of the polyomavirus detailed above, only 10 viral transcripts/cell were present. Given that each cell contains approximately 1 million host transcripts, only a small proportion of the cellular RNA is virus-derived. Providing every cell is infected, even at very low levels, 10 million sequence ‘reads’ gives a >99.99% probability of detecting at least one viral sequence (Fig. 3 ). Such a large number of reads is readily obtainable using next generation technology such as the Illumina platform. However the technique does have limitations in that if only 1/10 cells is infected, or a sequencing methodology is used which produces only 50 000 sequence reads, the probabilities of detecting viral sequence decrease to approximately 60% and 5%, respectively.

Fig. 3

Graphic representation of the probability of detecting viral sequences based on the viral genome-transcript sequence frequency and the number of sequence ‘reads’ generated (lines with symbols).

Identification of viral sequences and proof of causation

While many newly identified viruses infecting animals and humans were initially found in patients with particular clinical signs or symptoms, most have not been causally associated with particular diseases. The detection of viruses in such contexts may merely reflect the presence of a virus in a sample or the ability of a virus to replicate within a particular disease environment, rather than the virus directly causing the disease. For example, although several infectious agents have been found in samples from human patients with multiple sclerosis (Johnson et al., 1984, Challoner et al., 1995, Perron et al., 1997, Thacker et al., 2006), causal roles in pathogenesis have never been attributed (Munz et al., 2009). Similarly, herpes simplex virus type-2 (HSV-2) was strongly implicated as the cause of cervical cancer in humans for many years until human papilloma virus DNA was identified in biopsies (Durst et al., 1983). Henle–Koch postulates are a well known set of criteria that must be fulfilled by a microorganism for it to be proven as the cause of disease. The ability to culture viruses in vitro and the detection of antibodies against viruses led to new proposals for the demonstration of causality (Rivers, 1937). Advances in technology have resulted in new challenges to the assigning of causation and sequence-based approaches to virus identification have led to the formulation of guidelines defining the relationship between the presence of viral sequences and disease (Fredericks and Relman, 1996). Such guidelines have been used to link hepatitis C virus (HCV) with non-A, non-B hepatitis (Kuo et al., 1989), and human herpesvirus-8 with Kaposi’s sarcoma (Moore and Chang, 1995, Noel et al., 1996), but are often ignored in the race to assign significance to virus discovery. In infectious disease research a balance must be struck between the prompt identification of highly significant new human pathogens such as pandemic swine H1N1 influenza (Dawood et al., 2009), and clearly defining the more tenuous connection between xenotropic murine leukaemia virus-related virus (XMRV) and chronic fatigue syndrome (Lombardi et al., 2009). Epidemiological, immunological and sequence-based criteria should support any proposed link between an infectious organism and the disease under study. Establishing causality must also involve an appreciation of the full range of genetic diversity of the viral species, as it is well established that distinct viral genotypes or even minor genetic variations can result in large changes in viral pathogenicity.

Conclusions

Viral identification is an ever-evolving discipline where new technologies are likely to have significant impact over the coming decades. The further development of hybridisation and PCR-based methods, the increased availability of next generation sequencing, improvements in transcriptome subtraction methods, continued expansion of viral and animal genome databases, and improved bioinformatic tools will all facilitate the acceleration of this identification process.

Conflict of interest statement

Neither of the authors of this paper has a financial or personal relationship with other people or organisations that could inappropriately influence or bias the content of the paper.

112 in total

1. Genomic analysis of uncultured marine viral communities.

Authors: Mya Breitbart; Peter Salamon; Bjarne Andresen; Joseph M Mahaffy; Anca M Segall; David Mead; Farooq Azam; Forest Rohwer
Journal: Proc Natl Acad Sci U S A Date: 2002-10-16 Impact factor: 11.205

2. Diversity and population structure of a near-shore marine-sediment viral community.

Authors: Mya Breitbart; Ben Felts; Scott Kelley; Joseph M Mahaffy; James Nulton; Peter Salamon; Forest Rohwer
Journal: Proc Biol Sci Date: 2004-03-22 Impact factor: 5.349

3. Cloning of a human parvovirus by molecular screening of respiratory tract samples.

Authors: Tobias Allander; Martti T Tammi; Margareta Eriksson; Annelie Bjerkner; Annika Tiveljung-Lindell; Björn Andersson
Journal: Proc Natl Acad Sci U S A Date: 2005-08-23 Impact factor: 11.205

4. Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences.

Authors: T M Rose; E R Schultz; J G Henikoff; S Pietrokovski; C M McCallum; S Henikoff
Journal: Nucleic Acids Res Date: 1998-04-01 Impact factor: 16.971

5. Detection and analysis of diverse herpesviral species by consensus primer PCR.

Authors: D R VanDevanter; P Warrener; L Bennett; E R Schultz; S Coulter; R L Garber; T M Rose
Journal: J Clin Microbiol Date: 1996-07 Impact factor: 5.948

6. Plaque-associated expression of human herpesvirus 6 in multiple sclerosis.

Authors: P B Challoner; K T Smith; J D Parker; D L MacLeod; S N Coulter; T M Rose; E R Schultz; J L Bennett; R L Garber; M Chang
Journal: Proc Natl Acad Sci U S A Date: 1995-08-01 Impact factor: 11.205

7. Transmission of H7N7 avian influenza A virus to human beings during a large outbreak in commercial poultry farms in the Netherlands.

Authors: Marion Koopmans; Berry Wilbrink; Marina Conyn; Gerard Natrop; Hans van der Nat; Harry Vennema; Adam Meijer; Jim van Steenbergen; Ron Fouchier; Albert Osterhaus; Arnold Bosman
Journal: Lancet Date: 2004-02-21 Impact factor: 79.321

8. RNA viral community in human feces: prevalence of plant pathogenic viruses.

Authors: Tao Zhang; Mya Breitbart; Wah Heng Lee; Jin-Quan Run; Chia Lin Wei; Shirlena Wee Ling Soh; Martin L Hibberd; Edison T Liu; Forest Rohwer; Yijun Ruan
Journal: PLoS Biol Date: 2006-01 Impact factor: 8.029

9. Rapid identification of emerging pathogens: coronavirus.

Authors: Rangarajan Sampath; Steven A Hofstadler; Lawrence B Blyn; Mark W Eshoo; Thomas A Hall; Christian Massire; Harold M Levene; James C Hannis; Patina M Harrell; Benjamin Neuman; Michael J Buchmeier; Yun Jiang; Raymond Ranken; Jared J Drader; Vivek Samant; Richard H Griffey; John A McNeil; Stanley T Crooke; David J Ecker
Journal: Emerg Infect Dis Date: 2005-03 Impact factor: 6.883

10. Pan-viral screening of respiratory tract infections in adults with and without asthma reveals unexpected human coronavirus and human rhinovirus diversity.

Authors: Amy Kistler; Pedro C Avila; Silvi Rouskin; David Wang; Theresa Ward; Shigeo Yagi; David Schnurr; Don Ganem; Joseph L DeRisi; Homer A Boushey
Journal: J Infect Dis Date: 2007-08-06 Impact factor: 5.226

33 in total

Review 1. Next-generation sequencing in clinical virology: Discovery of new viruses.

Authors: Sibnarayan Datta; Raghvendra Budhauliya; Bidisha Das; Soumya Chatterjee; Vijay Veer
Journal: World J Virol Date: 2015-08-12

Review 2. From orphan virus to pathogen: the path to the clinical lab.

Authors: Linlin Li; Eric Delwart
Journal: Curr Opin Virol Date: 2011-10 Impact factor: 7.090

3. The diversity of human RNA viruses.

Authors: Mark E J Woolhouse; Kyle Adair
Journal: Future Virol Date: 2013-02 Impact factor: 1.831

4. Landscape of DNA virus associations across human malignant cancers: analysis of 3,775 cases using RNA-Seq.

Authors: Joseph D Khoury; Nizar M Tannir; Michelle D Williams; Yunxin Chen; Hui Yao; Jianping Zhang; Erika J Thompson; Funda Meric-Bernstam; L Jeffrey Medeiros; John N Weinstein; Xiaoping Su
Journal: J Virol Date: 2013-06-05 Impact factor: 5.103

5. DLA class II alleles and haplotypes are associated with risk for and protection from chronic hepatitis in the English Springer spaniel.

Authors: Nicholas H Bexfield; Penny J Watson; Jesús Aguirre-Hernandez; David R Sargan; Laurence Tiley; Jonathan L Heeney; Lorna J Kennedy
Journal: PLoS One Date: 2012-08-01 Impact factor: 3.240

6. Circovirus in tissues of dogs with vasculitis and hemorrhage.

Authors: Linlin Li; Sabrina McGraw; Kevin Zhu; Christian M Leutenegger; Stanley L Marks; Steven Kubiski; Patricia Gaffney; Florante N Dela Cruz; Chunlin Wang; Eric Delwart; Patricia A Pesavento
Journal: Emerg Infect Dis Date: 2013-04 Impact factor: 6.883

7. Virus identification in unknown tropical febrile illness cases using deep sequencing.

Authors: Nathan L Yozwiak; Peter Skewes-Cox; Mark D Stenglein; Angel Balmaseda; Eva Harris; Joseph L DeRisi
Journal: PLoS Negl Trop Dis Date: 2012-02-07

Review 8. Animal virus discovery: improving animal health, understanding zoonoses, and opportunities for vaccine development.

Authors: Eric Delwart
Journal: Curr Opin Virol Date: 2012-03-15 Impact factor: 7.090

9. Mining for viral fragments in methylation enriched sequencing data.

Authors: Klaas Mensaert; Wim Van Criekinge; Olivier Thas; Ed Schuuring; Renske D M Steenbergen; G Bea A Wisman; Tim De Meyer
Journal: Front Genet Date: 2015-02-04 Impact factor: 4.599

10. PRICE: software for the targeted assembly of components of (Meta) genomic sequence data.

Authors: J Graham Ruby; Priya Bellare; Joseph L Derisi
Journal: G3 (Bethesda) Date: 2013-05-20 Impact factor: 3.154