Literature DB >> 23725672

Viral pathogen discovery.

Abstract

Viral pathogen discovery is of critical importance to clinical microbiology, infectious diseases, and public health. Genomic approaches for pathogen discovery, including consensus polymerase chain reaction (PCR), microarrays, and unbiased next-generation sequencing (NGS), have the capacity to comprehensively identify novel microbes present in clinical samples. Although numerous challenges remain to be addressed, including the bioinformatics analysis and interpretation of large datasets, these technologies have been successful in rapidly identifying emerging outbreak threats, screening vaccines and other biological products for microbial contamination, and discovering novel viruses associated with both acute and chronic illnesses. Downstream studies such as genome assembly, epidemiologic screening, and a culture system or animal model of infection are necessary to establish an association of a candidate pathogen with disease.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 23725672 PMCID： PMC5964995 DOI： 10.1016/j.mib.2013.05.001

Source DB: PubMed Journal: Curr Opin Microbiol ISSN： 1369-5274 Impact factor: 7.934

Current Opinion in Microbiology 2013, 16:468–478 This review comes from a themed issue on Host–microbe interactions: viruses Edited by Carlos F Arias For a complete overview see the and the Available online 29th May 2013 1369-5274/$ – see front matter, © 2013 The Author. Published by Elsevier Ltd. All rights reserved.

Introduction

The identification of novel pathogens has a tremendous impact on infectious diseases, microbiology, and human health. Nearly all of the outbreaks of clinical and public health importance over the past two decades have been caused by novel emerging viruses, including Severe Acute Respiratory Syndrome (SARS) coronavirus [1], Sin Nombre hantavirus [2], 2009 pandemic influenza H1N1 [3, 4], and the recently described coronavirus EMC [5, 6, 7] and H7N9 avian influenza viruses [8], with most originating from animal reservoirs. Changes in the environment, globalization, growth of wet (live animal) markets, and the rapid expansion of the human population into wildlife habitats all promote the rapid spread of previously unidentified pathogens that are capable of causing widespread and devastating epidemics of human illness [9]. Arthropods such as mosquitoes and ticks are vectors for emerging pathogens including West Nile virus [10, 11], and the Severe Fever and Thrombocytopenia Syndrome (SFTS) [12, 13] and Heartland bunyaviruses [14]. Moreover, the link between new viruses and disease is not only restricted to acute illnesses, but also can be seen in chronic disease states, as demonstrated by the strong association between infection by the novel Merkel cell polyomavirus (MCPyV) and a rare, highly aggressive skin tumor in elderly patients [15]. Currently available diagnostic tests for pathogens are generally narrow in scope and fail to detect an agent in a significant fraction of cases. Traditional methods such as culture, serology, or targeted nucleic acid-based testing, such as specific polymerase chain reaction (PCR), have limited utility in investigations where there is no a priori knowledge of the identity of potential infectious agents. Notably, in certain infectious diseases such as encephalitis, conventional testing fails to identify a pathogen in up to 70% of cases [16, 17, 18]. In contrast, state-of-the-art genomic technologies such as pan-microbial microarrays or unbiased next-generation sequencing (NGS) can be attractive tools for broad-based pathogen discovery. Nearly all infectious agents, with the sole exception of prions [19], contain either RNA or DNA, and are thus amenable to nucleic acid-based detection. In principle, these technologies are capable of comprehensively identifying all potential pathogens in clinical samples from humans and animals. This review will describe the genomic approaches for pathogen discovery currently being employed in the field, and highlight recent examples of their use in the discovery and characterization of novel viral pathogens (Table 1 ).

Table 1

Some recent examples of viral pathogen discoveryd

Namea	Detection platform	NGS bioinformatics approach	Disease assocation	Strength of associationc
Coronavirus-EMC	Culture and 454 NGS [7]	De novo genome assembly	Severe pneumonia (humans)	++
SFTS (severe fever with thrombocytopenia virus) bunyavirus [12, 13]	Illumina NGS [12]	Subtraction and BLAST search	Severe fever with thrombocytopenia	++
Heartland bunyavirus	Culture and 454 NGS [14]	BLAST search and de novo gene assembly	Severe febrile illness	++
MCPyV (Merkel cell polyomavirus)	454 NGS [15]	Subtraction and BLAST search	Merkel cell carcinoma (MCC)	++
Bat paramyxoviruses	Consensus PCR [20, 21, 22]	N/A	–	–
Raccoon polyomavirus	Consensus PCR and RCA [38]	N/A	Brain tumors (racoons)	++
HPyV6 and HPyV7 (human polyomaviruses 6 and 7)	RCA [39]	N/A	N/A	–
TSPyV (trichodysplasia spinulosa-associated polyomavirus)	RCA [40]	N/A	Trichodysplasia spinulosa	++

2009 pandemic influenza A(H1N1)b	Microarray and Illumina NGS [51]	Subtraction and BLAST search	Febrile illness	++
	454 NGS [110, 111]	BLAST search	Febrile illness	++
	Illumina NGS [112]	BLAST search	Febrile illness	++

TMAdV (titi monkey adenovirus)	Microarray and Illumina NGS [52]	BLAST search	Pneumonia (titi monkeys)	++
BASV (Bas-Congo virus), a rhabdovirus	Illumina NGS [58]	Subtraction, BLAST search, and de novo genome assembly	Acute hemorrhagic fever	++
Novel circoviruses and cycloviruses in humans and monkeys	454 NGS [59]	Subtraction and BLAST search	Diarrhea	–

Human klassevirus/salivirus	Illumina NGS [61]	Subtraction and BLAST search	Diarrhea	–
Human klassevirus/salivirus	454 NGS [60, 62]	Subtraction and BLAST search	Diarrhea	–

MWPyV/HPy10/MXPyV (MW polyomavirus)	454 NGS [63]	Subtraction and BLAST search	Diarrhea	–
	Illumina NGS [64]	Subtraction and BLAST search	Diarrhea	–
	Illumina NGS [65]	BLAST search	WHIM syndrome	–

HPyV9 (human polyomavirus 9)	Illumina NGS [66]	Subtraction and BLAST search	–	–
HPyV9 (human polyomavirus 9)	Consensus PCR [24]	N/A	–	–

Human bufavirus	454 NGS [67]	Subtraction and BLAST search	Diarrhea	–
HAstV-PS (human astrovirus Puget Sound)	454 NGS [68]	Subtraction and BLAST search	Encephalitis	++
Human enterovirus 109	Consensus PCR and Illumina NGS [69]	Subtraction and BLAST search	Acute respiratory illness	+
Dandenong arenavirus	454 NGS [70]	Subtraction and BLAST search	Fatal febrile illness in transplant patients	++
Lujo arenavirus	454 NGS [71]	Subtraction and BLAST search	Acute hemorrhagic fever	++
TDAV (Theiler's disease-associated virus), a novel pegivirus	Illumina NGS [73]	BLAST search and de novo genome assembly	Hepatitis (horses)	++
Bat, canine, horse, and rodent hepaciviruses and pegiviruses	454 NGS [74, 75, 76, 77]	BLAST search	Respiratory infection (dogs)	–
Canine bocavirus 3	Illumina NGS [78]	BLAST search	Hemorrhagic diarrhea and vasculitis (dog)	–
Snake arenaviruses	Illumina NGS [79]	Subtraction and BLAST search	Inclusion body disease (snakes)	++

SAdV-C (simian adenovirus C)	Culture and Illumina NGS [105]	BLAST search and de novo genome assembly	Pneumonia (baboons)	++
SAdV-C (simian adenovirus C)	Culture and Illumina NGS [105]	BLAST search and de novo genome assembly	Acute respiratory illness (humans)	+

Viruses were detected in clinical samples from humans unless otherwise specified.

Discovered previously in 2009 by conventional testing [3, 4].

–, unknown or no association observed with the given disease; +, moderate association; ++, strong association.

Viruses are listed in order in which they are first mentioned. Note that table is not comprehensive and only lists viruses that are specifically highlighted in the text. Abbreviations: NGS, next-generation sequencing; 454, Roche 454 pyrosequencing platform; PCR, polymerase chain reaction; RCA, rolling circle amplification; subtraction, computational ‘digital’ subtraction of host background sequences from NGS data; BLAST, basic local alignment search tool; WHIM syndrome, Warts, Hypogammaglobulinemia, Infections, and Myelokathexis syndrome; N/A, not applicable.

Some recent examples of viral pathogen discoveryd Viruses were detected in clinical samples from humans unless otherwise specified. Discovered previously in 2009 by conventional testing [3, 4]. –, unknown or no association observed with the given disease; +, moderate association; ++, strong association. Viruses are listed in order in which they are first mentioned. Note that table is not comprehensive and only lists viruses that are specifically highlighted in the text. Abbreviations: NGS, next-generation sequencing; 454, Roche 454 pyrosequencing platform; PCR, polymerase chain reaction; RCA, rolling circle amplification; subtraction, computational ‘digital’ subtraction of host background sequences from NGS data; BLAST, basic local alignment search tool; WHIM syndrome, Warts, Hypogammaglobulinemia, Infections, and Myelokathexis syndrome; N/A, not applicable.

Genomic approaches for pathogen discovery

Pathogen discovery entails the use of genomic-based methods to identify novel microbes, followed by further investigation to determine potential associations with disease (Figure 1 ). As a pathogen discovery tool, consensus PCR uses degenerate primers to detect conserved sequences that are broadly shared between members of a group. This approach was recently used to identify novel paramyxoviruses in samples from large-scale surveys of bats and rodents [20, 21, 22] and emerging viruses such as coronavirus EMC, the cause of a new severe and occasionally fatal respiratory disease in the Middle East and Europe [6], although many other examples of this strategy for viral discovery exist [23, 24]. However, the identity of an infectious agent is often not known a priori, and a random, unbiased, and sequence-independent method for ‘universal’ amplification becomes necessary for pathogen discovery [25]. In the past, such universal amplification methods have been used in combination with conventional shotgun Sanger sequencing to detect novel human viruses such as human metapneumovirus in respiratory secretions [26], PARV4, a novel parvovirus in blood from patients with acute viral infection syndrome [27], and novel astroviruses, parvoviruses, picornaviruses (cardioviruses and cosaviruses), and polyomaviruses in diarrheal stool [25, 28, 29, 30, 31, 32, 33, 34]. One caveat with this approach may be the relatively low detection sensitivity of ∼106 genome equivalents per milliliter [25]. A related strategy is the use of rolling circle amplification (RCA) [35, 36], which has been successful in the unbiased detection and/or characterization of DNA viruses with circular genomes, such as novel papillomaviruses, circoviruses, and polyomaviruses [37, 38, 39, 40].

Figure 1

Genomic approaches to pathogen discovery. Clinical samples are subjected to pathogen enrichment and host depletion methods, followed by genomic analysis using consensus PCR, pan-microbial microarrays, and/or NGS. After a novel agent is identified, downstream studies are needed to establish a causal association between the candidate pathogen and disease. DNA microarrays have been used for multiplexed detection of a defined set of known pathogens using conserved primers [41], or for broad pan-microbial detection by universal amplification [42, 43, 44]. Microarrays are miniaturized detection platforms consisting of short (25-mer to 70-mer) single-stranded oligonucleotide probes deposited onto a solid substrate. These probes are typically designed to target conserved sequences at different levels of the taxonomy (family, genus, and species), which allows detection of novel pathogens that share homology with known, previously characterized viruses. Fluorescently labeled clinical samples are hybridized to the microarray, and hybridization patterns are analyzed to identify the specific pathogens that are present (Figure 2a) [43, 44, 45, 46, 47].

Figure 2

Microarray and NGS analyses of pandemic 2009 influenza A(H1N1) infection in humans. (a) Heat map of ViroChip microarray hybridization patterns obtained from nasal swab samples from patients with influenza-like illness and asymptomatic negative controls (‘neg’). The samples (x-axis) and microarray probes (y-axis) are clustered using a hierarchical clustering algorithm [45]. High-intensity probes derived from swine influenza A(H1N1) and human influenza A(H1N1) sequences are observed in samples from patients infected by pandemic 2009 influenza A(H1N1), with higher relative signal intensity in the swine influenza A(H1N1) probes. In contrast, the ViroChip signature in nasal swabs from patients infected with seasonal H3N2 influenza consists primarily of influenza A(H3N2) probes. No microarray cross-hybridization is observed in patients infected with other respiratory viruses or negative controls. Note that the influenza probes on the ViroChip microarray shown here were designed before onset of the pandemic 2009 influenza A(H1N1) outbreak. (b) Computational pipeline for analysis of NGS data. Preprocessing and computational subtraction of host (human) sequences are then followed by alignment to pathogen reference databases. The percentages show the remaining proportion of reads after each step, beginning with 100% of the preprocessed reads. Abbreviations: DBs, databases; rRNA, ribosomal RNA; mRNA, messenger RNA. Pan-microbial DNA microarrays currently in use include the ViroChip (University of California, San Francisco) [42, 48], GreeneChip (Columbia University) [43], and the Lawrence Livermore Microbial Detection Array, or LLMDA (Lawrence Livermore National Laboratory) [44]. The ViroChip is a pan-viral DNA microarray and was originally employed to characterize the coronavirus responsible for the 2003 outbreak of SARS [1]. Since then, studies have employed the ViroChip to discover a number of novel viruses including a previously undescribed rhinovirus clade [49], human cardioviruses [50], and 2009 pandemic influenza H1N1 (Figure 2a) [51]. In 2011, the ViroChip was also used to identify a novel adenovirus that caused a fulminant pneumonia outbreak in a New World titi monkey colony, with serologic evidence of concurrent cross-species infection of a human researcher [52]. The GreeneChip is a pan-microbial array that includes ∼30k 60-mer probes and is designed to broadly detect all viruses, as well as pathogenic bacteria, fungi, and protozoa on the basis of conserved 16S/18S sequences [43]. The LLMDA is yet another comprehensive pan-microbial detection array that targets all potential pathogens, with probes derived from their full genome sequences [44]. The GreeneChip and LLMDA have been used to detect Plasmodium falciparum in a patient with an unknown febrile illness [43] and porcine circovirus as a contaminant in a rotavirus vaccine [53], respectively. Although useful for the detection of a wide spectrum of pathogens, and for the detection of novel strains, microarrays are still limited by the genome sequence information available at the time of design. NGS, otherwise known as massively parallel or deep sequencing, has emerged as one of the most promising strategies for the detection of novel infectious agents in clinical specimens [54, 55]. This ‘needle-in-a-haystack’ approach involves analysis of millions of sequences derived from nucleic acid present in clinical specimens to detect sequences corresponding to candidate pathogens. Given low amounts of input nucleic acid in clinical samples, an unbiased, random method employing universal amplification is typically performed during NGS library generation [25, 56], similar to that used in pan-microbial microarray assays [42]. Because of its unbiased nature, NGS can identify both known but unexpected agents and highly divergent novel agents. NGS is thus particularly attractive for the identification of novel emerging viruses, which can exhibit high inherent sequence diversity and rapid rates of mutation, recombination, or reassortment [57]. For example, NGS was recently used to identify and recover the genome of a novel, highly divergent rhabdovirus, Bas-Congo virus (BASV), associated with a 2009 hemorrhagic fever outbreak in the Congo, Africa (Figure 3a) [58]. In this study, the genome of BASV was de novo assembled from 140 million deep sequencing reads corresponding to an acute serum sample from an affected patient (Figure 3b). The discovery of BASV underscores the potential of NGS in facilitating early identification of pathogens causing unknown outbreaks in remote areas of the world before they gain a foothold in human populations.

Figure 3

Discovery of Bas-Congo virus (BASV), a novel rhabdovirus associated with acute hemorrhagic fever in humans. (a) Map of Africa showing viral hemorrhagic fever outbreak regions. Hemorrhagic fever due to flaviviruses, such as dengue and yellow fever, is widespread throughout the continent. The location of the BASV hemorrhagic fever outbreak is designated by a red star. (b) Deep sequencing and de novo genome assembly of BASV. The BASV genome is highly divergent, sharing only 25% amino acid identity with rabies and <42% amino acid identity with any other rhabdovirus. In addition to the identification of BASV, the use of NGS technology has led to the discovery of many novel human viruses over the past decade, including, among others, the aforementioned MCPyV [15]; novel circoviruses/cycloviruses [59], kobuviruses (klassevirus/salivirus) [60, 61, 62]; polyomaviruses such as the HPyV9 and MWPyV/HPyV10/MXPyV [63, 64, 65, 66]; a novel parvovirus named bufavirus [67]; a novel astrovirus associated with encephalitis [68]; a novel enterovirus species in tropical febrile illness [69]; as well as novel arenaviruses in a fatal outbreak of transplant recipients [70] and a hemorrhagic fever outbreak from South Africa [71]. In 2011, an unknown outbreak of fever and thrombocytopenia involving hundreds of patients occurred in rural China [12, 13]. Unbiased NGS of pooled patient serum samples was used by one research group to identify the causal agent as a novel, highly divergent bunyavirus in the Phlebovirus genus referred to as Severe Fever and Thrombocytopenia Syndrome (SFTS) virus [12]. Furthermore, NGS has been used to enable whole-genome sequencing and assembly of highly divergent viruses identified from unknown cultures exhibiting cytopathic effect. Heartland virus, a presumed novel tick-borne bunyavirus in the Phlebovirus genus associated with two cases of severe febrile illness in hospitalized patients in Missouri [14], and Lone Star virus, another phlebovirus infecting the Amblyomma americanum tick [72], were both successfully sequenced from virally infected cell culture supernatants using NGS. NGS approaches have also been successful in the identification of novel animal viruses, including the discovery of bats, dogs, horses, and rodents as reservoirs for novel flaviruses (pegiviruses and hepaciviruses distantly related to human hepatitis C) [73, 74, 75, 76, 77], a novel bocavirus in canine liver [78], and novel arenaviruses associated with inclusion body disease in snakes [79]. Recently, a novel flavivirus in the Pegivirus genus, named Theiler's disease-associated virus (TDAV), was found by NGS to be the likely cause of an mysterious acute hepatitis in horses associated with the administration of equine blood products, a diagnosis that had eluded microbiologists for nearly a century [73]. Finally, infection by non-viral agents, such as Fusobacterium nucleatum bacteria in the setting of colon cancer, has also been detected by NGS [80].

Sample preparation methods

Both unbiased NGS, and, to a lesser extent, pan-microbial microarrays are affected by the level of host background, limiting sensitivity for detection of pathogen-derived sequences. In a study using NGS to investigate occult bacterial infection in tissues, microbial sequences were only detected in 0.00067% of NGS reads, corresponding to fewer than 10 per million [80]. Pathogen enrichment or host depletion before microarray and deep sequencing analyses hence becomes critical to maximize sensitivity for identification of novel agents in clinical samples (Figure 1). For viruses, capsid purification procedures involving repeated freeze/thaw cycles, filtration, ultracentrifugation, and prenuclease digestion have been developed to enrich host tissues or body fluids for infectious particles [78, 81]. Strategies to deplete the sample of background host DNA can also be implemented, including the use of methylation-specific DNAse to selectively degrade host genomes [82], removal of host ribosomal RNA [83], and/or removal of the most abundant host sequences by duplex-specific nuclease (DSN) normalization [84]. Another complementary approach is to perform target enrichment using biotinylated probes to enrich NGS libraries for sequences corresponding to pathogens, akin to now well-established techniques that have been developed in the cancer field [85]. This strategy can also potentially harness prior experience with microarrays for pathogen discovery by the use of previously validated microarray probes to enrich NGS libraries for microbial sequences. The choice of NGS platforms on the market today for pathogen discovery is driven by two main parameters: read length and read depth. NGS reads must be long enough (typically at least 100–300 nt) to unambiguously identify the presence of a novel pathogen, and to discriminate reads from host or background flora. There must also be sufficient read depth, or number of sequence reads generated per run, to detect novel agents with a high degree of sensitivity. For pathogen discovery, the Roche 454 GS-FLX+pyrosequencing™ platform has been widely applied given the long reads (currently up to 1 million single or paired-end reads with average read lengths of 400–500 nt with the GS-FLX+ Titanium™ platform) and high accuracy. More recently, Illumina NGS sequencing platforms (GAIIx™, HiSeq™, and MiSeq™) have been used for pathogen discovery given the ∼10–1000× improved read depth relative to 454, resulting in much greater sensitivity for the detection of viruses [86], and gradually improving read lengths (currently up to 150 nt paired-end reads for the HiSeq and 250 nt paired-end reads for the MiSeq). In fact, previous studies suggest that the limits of detection of viruses in clinical samples by NGS with Illumina sequencing are comparable to specific PCR [51, 86]. The use of paired-end sequencing, or sequencing from each end of the DNA fragment in NGS libraries, can be particularly useful for pathogen discovery given that the forward and reverse reads can facilitate the design of PCR primers to confirm potential sequence ‘hits’ to novel microbes and de novo genome assembly [87]. Other NGS technologies, such as platforms by Ion Torrent (very fast run times of under three hours) and Pacific Biosciences (very long reads of up to 7 kb; average read lengths 3–4 kb) [88], have yet to be used widely for pathogen discovery, although one application may be rapid genome sequencing of emerging pathogens such as Escherichia coli O104:H4, associated with a recent foodborne outbreak of hemolytic-uremic syndrome in Germany [89, 90]. One particular concern for all unbiased NGS technologies is the high potential for reagent and laboratory contamination, especially with the use of universal amplification methods [51, 86, 91].

Bioinformatics analysis challenges

Whereas for microarrays, specialized bioinformatics algorithms for pathogen detection are in routine use [43, 44, 45, 46, 47], analysis of NGS data for pathogen discovery poses enormous computational challenges. The most widely used strategy is computational subtraction, in which reads are first sequentially aligned to reference databases to filter out sequences corresponding to host background [92]. Sequences derived from microbes are then typically identified by nucleotide or translated amino acid alignments using BLAST [93]. This approach was previously used, for example, to detect pandemic 2009 influenza A(H1N1) in nasal swabs from affected patients with respiratory illness (Figure 2b) [51]. For highly divergent viruses, successful identification can sometimes only be made by searching for remote homologs of protein sequences using methods such as HMMER [94, 95]. Dedicated bioinformatics analysis pipelines, such as PathSeq, used to detect Fusobacterium bacteria in colon cancer tissues [80], RINS, CaPSID, and READSCAN are now available for automated pathogen identification from NGS data [96, 97, 98, 99], although their performance has yet to be rigorously tested on a large number of clinical samples. Ongoing limitations of available bioinformatics software for pathogen discovery include the data-intensive computing workloads that are not amenable to real-time analysis in the absence of ultra-rapid processing algorithms, the lack of a graphical user interface, the requirement for a minimum level of computer hardware and bioinformatics expertise, and the lack of a validated scoring system to permit confident identification of microbes from NGS data. In addition, existing reference sequence databases, such as NIH GenBank, can be heavily baised and fraught with annotation errors. Notably, over 40% of the GenBank viral database consists of overrepresented HIV or influenza sequences. Comprehensive, well-annotated reference databases for pathogens are thus needed in support of NGS-based pathogen discovery efforts.

Linking a novel pathogen to disease

The mere discovery of a candidate pathogen is only the first step in determining whether or not it is associated with disease. Clinical samples are colonized with a variety of commensal organisms (the ‘microbiome’) [100], and it is often difficult, if not impossible, to unambiguously identify a single causal infectious agent. Highly divergent, novel agents such as torque teno virus (TTV) [101, 102] may be nonpathogenic and part of the normal microbial flora. Follow-up studies to establish causality are thus needed to establish a link between a candidate infectious agent and disease (Figure 1). To assign causality, attempts should be made to address Koch's postulates, which require that the agent be isolated in culture, or River's modifications, which recognize the added significance of the generation of specific antibodies in response to infection [103]. For novel viruses, this begins with assembly of the entire genome, either de novo directly from NGS data [58, 72, 87] or by standard methods such as primer walking, probe enrichment [104], and/or specific PCR to fill in gaps [52]. Full or partial genomic sequence permits a detailed phylogenetic analysis of the novel agent, which can provide clues as to its potential host range and pathogenicity [58]. The availability of sequence information also facilitates the development of specific PCR-based or serological assays for detection. Epidemiological screening of the distribution of the candidate pathogen in diseased patients and asymptomatic controls by PCR, as well as assessment of the geographic and temporal distribution of infections, can help in establishing a link to disease. Serology can also play a critical role in determining pathogenicity, as increases in titer support the association of a given pathogen with infection. For example, serologic analyses of a novel adenovirus species named ‘simian adenovirus C (SAdV-C)’ associated with a pneumonia outbreak in a baboon colony (Figure 4a) were recently used to establish that staff personnel at the facility had also been exposed to this newly discovered virus (Figure 4b) [105]. Finally, development of a culture system and animal model for infection can directly confirm that a candidate novel agent plays a causal role in disease.

Figure 4

Baboon and human infections from a novel adenovirus species. (a) A 1997 acute respiratory outbreak in a baboon colony. A novel adenovirus, named simian adenovirus C (SAdV-C), was discovered in association with an outbreak at a primate research facility that sickened 4 of 9 baboons and resulted in two cases of fatal pneumonia. (b) Serological testing of staff personnel at the facility and controls (five epidemiologically unrelated young children) for exposure to simian adenoviruses SAdV-B and SAdV-C. Neutralizing antibodies to SAdV-C are absent before the outbreak but detected in 6 of 6 staff personnel after the outbreak, indicating recent or prior exposure to the virus. Abbreviations: BAdV, baboon adenovirus; SAdV, simian adenovirus; Pre, pre-outbreak; Post, post-outbreak; N/A, not applicable. One advantage of using microarrays and NGS for pathogen discovery is that these same technologies can also be applied to evaluate the potential pathogenicity of newly identified novel agents. Host transcriptome analysis using gene expression microarrays [106] or RNA-Seq [107] can enable the characterization of associated host biomarkers in response to infection. Detailed NGS-based quasispecies analysis of novel pathogens that exhibit high mutation rates, such as RNA viruses [108, 109], can also provide insights into how these agents infect and invade the host.

Conclusions

Although sometimes derided as a ‘fishing expedition’, pathogen discovery is, in actuality, a highly worthwhile scientific endeavor. Without a cause identified for many presumed infectious diseases, it is not possible to conduct downstream investigations in pathogenesis and host–microbial interactions, nor is it possible to design effective vaccines or antimicrobial drugs to combat the associated illness. Potential applications of pathogen discovery range from outbreak investigation of emerging pathogens, to screening of blood products, vaccines, and other biologics for viral contaminants, to clinical diagnosis of unknown acute or chronic infectious diseases. The current availability of state-of-the-art genomic technologies such as pan-microbial microarrays and NGS provides an unprecedented opportunity to ‘cast a wide net’ and survey the full breadth of as-yet undiscovered pathogens in nature that pose significant threats to human health.

Competing interests statement

The author's research on viral pathogen discovery is partially supported by an award by Abbott Laboratories, Inc. The author has also filed provisional patent applications related to Lone Star virus, a novel bunyavirus in the Amblyomma americanum tick, and the novel baboon SAdV-C adenoviruses referred to in this article.

112 in total

1. Microarray-based detection and genotyping of viral pathogens.

Authors: David Wang; Laurent Coscoy; Maxine Zylberberg; Pedro C Avila; Homer A Boushey; Don Ganem; Joseph L DeRisi
Journal: Proc Natl Acad Sci U S A Date: 2002-11-12 Impact factor: 11.205

2. Viral nucleic acids in live-attenuated vaccines: detection of minority variants and an adventitious virus.

Authors: Joseph G Victoria; Chunlin Wang; Morris S Jones; Crystal Jaing; Kevin McLoughlin; Shea Gardner; Eric L Delwart
Journal: J Virol Date: 2010-04-07 Impact factor: 5.103

3. Identification of MW polyomavirus, a novel polyomavirus in human stool.

Authors: Erica A Siebrasse; Alejandro Reyes; Efrem S Lim; Guoyan Zhao; Rajhab S Mkakosya; Mark J Manary; Jeffrey I Gordon; David Wang
Journal: J Virol Date: 2012-06-27 Impact factor: 5.103

4. Emergence of a novel swine-origin influenza A (H1N1) virus in humans.

Authors: Fatimah S Dawood; Seema Jain; Lyn Finelli; Michael W Shaw; Stephen Lindstrom; Rebecca J Garten; Larisa V Gubareva; Xiyan Xu; Carolyn B Bridges; Timothy M Uyeki
Journal: N Engl J Med Date: 2009-05-07 Impact factor: 91.245

5. Human infection with a novel avian-origin influenza A (H7N9) virus.

Authors: Rongbao Gao; Bin Cao; Yunwen Hu; Zijian Feng; Dayan Wang; Wanfu Hu; Jian Chen; Zhijun Jie; Haibo Qiu; Ke Xu; Xuewei Xu; Hongzhou Lu; Wenfei Zhu; Zhancheng Gao; Nijuan Xiang; Yinzhong Shen; Zebao He; Yong Gu; Zhiyong Zhang; Yi Yang; Xiang Zhao; Lei Zhou; Xiaodan Li; Shumei Zou; Ye Zhang; Xiyan Li; Lei Yang; Junfeng Guo; Jie Dong; Qun Li; Libo Dong; Yun Zhu; Tian Bai; Shiwen Wang; Pei Hao; Weizhong Yang; Yanping Zhang; Jun Han; Hongjie Yu; Dexin Li; George F Gao; Guizhen Wu; Yu Wang; Zhenghong Yuan; Yuelong Shu
Journal: N Engl J Med Date: 2013-04-11 Impact factor: 91.245

6. Acute diarrhea in West African children: diverse enteric viruses and a novel parvovirus genus.

Authors: Tung G Phan; Nguyen P Vo; Isidore J O Bonkoungou; Amit Kapoor; Nicolas Barro; Miguel O'Ryan; Beatrix Kapusinszky; Chunling Wang; Eric Delwart
Journal: J Virol Date: 2012-08-01 Impact factor: 5.103

7. Structure, function and diversity of the healthy human microbiome.

Authors:
Journal: Nature Date: 2012-06-13 Impact factor: 49.962

8. Cross-species transmission of a novel adenovirus associated with a fulminant pneumonia outbreak in a new world monkey colony.

Authors: Eunice C Chen; Shigeo Yagi; Kristi R Kelly; Sally P Mendoza; Ross P Tarara; Don R Canfield; Nicole Maninger; Ann Rosenthal; Abigail Spinner; Karen L Bales; David P Schnurr; Nicholas W Lerche; Charles Y Chiu
Journal: PLoS Pathog Date: 2011-07-14 Impact factor: 6.823

9. Astrovirus encephalitis in boy with X-linked agammaglobulinemia.

Authors: Phenix Lan Quan; Thor A Wagner; Thomas Briese; Troy R Torgerson; Mady Hornig; Alla Tashmukhamedova; Cadhla Firth; Gustavo Palacios; Ada Baisre-De-Leon; Christopher D Paddock; Stephen K Hutchison; Michael Egholm; Sherif R Zaki; James E Goldman; Hans D Ochs; W Ian Lipkin
Journal: Emerg Infect Dis Date: 2010-06 Impact factor: 6.883

10. The genome sequence of Lone Star virus, a highly divergent bunyavirus found in the Amblyomma americanum tick.

Authors: Andrea Swei; Brandy J Russell; Samia N Naccache; Beniwende Kabre; Narayanan Veeraraghavan; Mark A Pilgard; Barbara J B Johnson; Charles Y Chiu
Journal: PLoS One Date: 2013-04-29 Impact factor: 3.240

85 in total

Review 1. Next-generation sequencing in clinical virology: Discovery of new viruses.

Authors: Sibnarayan Datta; Raghvendra Budhauliya; Bidisha Das; Soumya Chatterjee; Vijay Veer
Journal: World J Virol Date: 2015-08-12

2. The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns.

Authors: Samia N Naccache; Alexander L Greninger; Deanna Lee; Lark L Coffey; Tung Phan; Annie Rein-Weston; Andrew Aronsohn; John Hackett; Eric L Delwart; Charles Y Chiu
Journal: J Virol Date: 2013-09-11 Impact factor: 5.103

3. A novel outbreak enterovirus D68 strain associated with acute flaccid myelitis cases in the USA (2012-14): a retrospective cohort study.

Authors: Alexander L Greninger; Samia N Naccache; Kevin Messacar; Anna Clayton; Guixia Yu; Sneha Somasekar; Scot Federman; Doug Stryke; Christopher Anderson; Shigeo Yagi; Sharon Messenger; Debra Wadford; Dongxiang Xia; James P Watt; Keith Van Haren; Samuel R Dominguez; Carol Glaser; Grace Aldrovandi; Charles Y Chiu
Journal: Lancet Infect Dis Date: 2015-03-31 Impact factor: 25.071

4. Narrowing of the Diagnostic Gap of Acute Gastroenteritis in Children 0-6 Years of Age Using a Combination of Classical and Molecular Techniques, Delivers Challenges in Syndromic Approach Diagnostics.

Authors: Andrej Steyer; Monika Jevšnik; Miroslav Petrovec; Marko Pokorn; Štefan Grosek; Adela Fratnik Steyer; Barbara Šoba; Tina Uršič; Tjaša Cerar Kišek; Marko Kolenc; Marija Trkov; Petra Šparl; Raja Duraisamy; W Ian Lipkin; Sara Terzić; Mojca Kolnik; Tatjana Mrvič; Amit Kapoor; Franc Strle
Journal: Pediatr Infect Dis J Date: 2016-09 Impact factor: 2.129

Review 5. The evolution of Ebola virus: Insights from the 2013-2016 epidemic.

Authors: Edward C Holmes; Gytis Dudas; Andrew Rambaut; Kristian G Andersen
Journal: Nature Date: 2016-10-13 Impact factor: 49.962

6. Comprehensive human virus screening using high-throughput sequencing with a user-friendly representation of bioinformatics analysis: a pilot study.

Authors: Tom J Petty; Samuel Cordey; Ismael Padioleau; Mylène Docquier; Lara Turin; Olivier Preynat-Seauve; Evgeny M Zdobnov; Laurent Kaiser
Journal: J Clin Microbiol Date: 2014-07-09 Impact factor: 5.948

Review 7. Vaccination against infectious diseases: what is promising?

Authors: Hans Wilhelm Doerr; Annemarie Berger
Journal: Med Microbiol Immunol Date: 2014-07-27 Impact factor: 3.402

8. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data.

Authors: Xutao Deng; Samia N Naccache; Terry Ng; Scot Federman; Linlin Li; Charles Y Chiu; Eric L Delwart
Journal: Nucleic Acids Res Date: 2015-01-13 Impact factor: 16.971

9. Identification of Diverse Mycoviruses through Metatranscriptomics Characterization of the Viromes of Five Major Fungal Plant Pathogens.

Authors: Shin-Yi Lee Marzano; Berlin D Nelson; Olutoyosi Ajayi-Oyetunde; Carl A Bradley; Teresa J Hughes; Glen L Hartman; Darin M Eastburn; Leslie L Domier
Journal: J Virol Date: 2016-07-11 Impact factor: 5.103

Review 10. Clinical Metagenomic Next-Generation Sequencing for Pathogen Detection.

Authors: Wei Gu; Steve Miller; Charles Y Chiu
Journal: Annu Rev Pathol Date: 2018-10-24 Impact factor: 23.472