| Literature DB >> 24050114 |
Ruth R Miller1, Vincent Montoya2, Jennifer L Gardy1, David M Patrick1, Patrick Tang3.
Abstract
Traditional pathogen detection methods in public health infectious disease surveillance rely upon the identification of agents that are already known to be associated with a particular clinical syndrome. The emerging field of metagenomics has the potential to revolutionize pathogen detection in public health laboratories by allowing the simultaneous detection of all microorganisms in a clinical sample, without a priori knowledge of their identities, through the use of next-generation DNA sequencing. A single metagenomics analysis has the potential to detect rare and novel pathogens, and to uncover the role of dysbiotic microbiomes in infectious and chronic human disease. Making use of advances in sequencing platforms and bioinformatics tools, recent studies have shown that metagenomics can even determine the whole-genome sequences of pathogens, allowing inferences about antibiotic resistance, virulence, evolution and transmission to be made. We are entering an era in which more novel infectious diseases will be identified through metagenomics-based methods than through traditional laboratory methods. The impetus is now on public health laboratories to integrate metagenomics techniques into their diagnostic arsenals.Entities:
Year: 2013 PMID: 24050114 PMCID: PMC3978900 DOI: 10.1186/gm485
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Metagenomic approaches for pathogen detection and their findings and applications
| • rRNA | • Prokaryotic and eukaryotic identification* | • Characterization of the healthy human gut microbiome (HMP) [ | • Potentially higher sensitivity | • Targeted gene may not be truly universal | |
| | | • Determination of taxonomic relationships | • Ancient gut microbiomes found to be more similar to modern rural than modern cosmopolitan microbiomes [ | • Less expensive as fewer reads are required for taxonomic classification | • Primer bias may alter population structure |
| | • rpoB | • Archaeal and bacterial identification* | • Used to divide the species | • rpoB and cpn-60 offer enhanced taxonomic resolution compared to rRNA [ | • Possibility of variable gene copy numbers amongst targeted species |
| | • cpn-60 | • Determination of taxonomic relationships | | | |
| | • Viral RNA polymerase (RdRP) | • Novel virus discovery | • Identified novel families of picornaviruses off the coast of British Columbia [ | | |
| • Shotgun sequencing | • Functional and taxonomic characterization | • Detection of African swine fever virus-like sequences representing new members of the family Asfariviridae [ | • Recovery of sequences from all microorganisms | • Broad specificity might decrease sensitivity | |
| | | | • Detection of unexpected microbes from stool samples [ | • No | • Library preparation is relatively labor intensive |
| | • Subtraction | • Functional and taxonomic characterization | • Identified divergent regions in non-coding RNAs in | • Random primers reduce potential for bias | • Bioinformatics analysis is more challenging |
| | | | • Association of | | • Relatively expensive as more reads are required than for DAS |
| | • Virus concentration | • Novel virus discovery | • Detection of the novel H1N1 influenza from nasopharyngeal swabs [ | | • Approximately 50% of sequences generally have no significant homology to known proteins in databases (dark matter) [ |
| | | | • Detection of a novel rhabdovirus from serum [ | | |
| | • Hybridization capture | • Investigation of sequences with very low copy number | • Metagenomic analysis of tuberculosis from a mummy [ | | • Increased granularity in population structure determination [ |
| • Investigation of |
*Specific primers need to be made to discriminate between each group. RdRP, RNA-dependent RNA polymerase.
Figure 1Workflow outlining a pipeline of laboratory and bioinformatics methods required for metagenomic pathogen detection. The left side (pale blue) lists each step in the metagenomics workflow and the right side lists the tools used for each stage. Boxes on the right are color-coded to indicate the type of tool used: dark blue, laboratory method; gray, data format; green, computer software; maroon, database. BWA, Burrows-Wheeler Aligner; BLAST, Basic Local Alignment Search Tool; IMG, integrated microbial genomics; MG-RAST, Metagenomics Rapid Annotation Server.
High-throughput sequencing platforms and their potential metagenomic applications in public health
| Second generation sequencers | | | | | | | | |
| Ilumina | HiSeq 2500 | 36-150 | 600 Gb | 11 days | 2.4 × 109 | Very high depth | Long run time | High sensitivity to detect pathogens that are present at very low concentrations in metagenomic samples |
| | | | | | Low error rate | Short read lengths | | |
| | | | | | Lower cost per bp | Errors in regions following GGC motifs [ | | |
| | | | | | Paired-end reads | Decreasing read quality toward ends [ | | |
| MiSeq | 36-250 | 8.5 Gb | 39 hours | 34 × 106 | Desktop machine | Short read lengths | Able to detect pathogens at low levels rapidly | |
| | | | | | Lowest error rate of desktop sequencers | Errors in regions following GGC motifs [ | Can be deployed locally Useful for diseases of unknown etiology | |
| | | | | | Lower cost per bp | Decreasing read quality toward ends [ | | |
| | | | | | | Paired-end reads | | |
| Roche | Genome Sequencer (GS) FLX Titanium | 1,000 | 1 Gb | 23 hours | 1 × 106 | Long read lengths | Errors in homopolymeric regions | Able to |
| | GS Junior System | 500 | 35 Mb | 10 hours | 1 × 105 | Desktop machine | Errors in homopolymeric regions | Able to sequence novel genomes rapidly |
| | | | | | | Longest read length of desktop sequencers | Lower depth compared to GS FLX | Can be deployed locally |
| | | | | | | | | Useful for outbreak investigations |
| Life Technologies | Ion Torrent with Personal Genome Machine (PGM) 318 Chip | 400 | 1-2 Gb | 7 hours | 3-5 × 106 | Desktop machine | Errors in homopolymeric regions | Fastest output is helpful for urgent public health issues |
| | | | | | Fastest run time of desktop sequencers | Biased coverage in AT rich regions [ | Can be deployed locally | |
| Proton | 200 | 10 Gb | 2-4 hours | 6-8 × 107 | Desktop machine | Short read length | Able to detect pathogens at low levels rapidly | |
| Very fast run time | Errors in homopolymeric regions | Can be deployed locally | ||||||
| | | | | | | | | Useful for diseases of unknown etiology |
| Third generation sequencers | | | | | | | | |
| Pacific Biosystems | PacBio RS | 2,000-15,000 | 100 Mb | 2 hours | 50,000 | Very fast run time | High error rate | Able to assemble genomes for novel pathogens rapidly |
| Very long read lengths | Sub-reads often shorter than quoted read lengths | Complementary to other methods | ||||||
| | | | | | | | Requires higher DNA input [ | |
| Oxford Nanopore | MinION | 48,000 | 10s of Gb per 24 hours | Run until complete | Not applicable | Very fast run time | Not yet available | |
| No sample preparation required | ||||||||
Statistics for Illumina, Roche and maximum values for each category in each system are shown as of 2012. bp base pair, Mb megabases, Gb gigabases.
Figure 2Strategy for novel pathogen detection in public health. Currently, specimens are sent for conventional laboratory tests. If one of these tests is positive (dashed arrows), then an actionable result is generated. If these are all negative, then investigational methods such as metagenomics can be employed afterwards (white arrows). With advances in metagenomics, these methods may be performed earlier in the diagnostic algorithm in the future (black arrows) instead of following multiple traditional laboratory tests.
Challenges for traditional pathogen detection in public health
| It is important to identify pathogens as quickly as possible to identify appropriate measures for treatment and prevention of spread | Techniques that require culture can lead to delays, particularly for slow-growing pathogens such as | Metagenomic pathogen discovery is increasing in speed and single genomes can now be sequenced in a few hours | [ | |
| | | Performing multiple tests can delay diagnosis | Metagenomics comprises a single test | |
| For a technique to be viable in a public health laboratory, it must be economically justifiable | Performing multiple tests can be very expensive | Metagenomic approaches are decreasing in cost | [ | |
| | | | A single metagenomics experiment can now be performed for less than $200 | |
| Disease can be caused by pathogens that are present at very low levels. Samples taken may only harbor small numbers of a pathogen | May not detect pathogens that are present at very low levels | It is now possible to perform metagenomic studies from a single cell | [ | |
| | | Biases in culturing and other methods may point to the wrong pathogen | Genomes have been assembled from organisms with relative nucleic acid abundances as low as 0.1% | |
| Early identification of novel pathogens is vital to prevent potential outbreaks | May not identify pathogens that are unknown or too divergent from known organisms | [ | ||
| Identification of transmission guides public health practices for containing outbreaks | Traditional pathogen fingerprinting methods may not have the resolution to detect transmission events | Whole-genome sequences provide the ultimate resolution required to detect transmission events | [ | |
| Complex diseases are often caused by a combination of multiple pathogens, host genetics and environmental factors | Targeted detection of pathogens does not allow identification of multiple pathogens, unless each is specifically investigated | Can detect multiple pathogens in one test, allowing for inference of interactions | [ |
Challenges for the integration of metagenomics into public health
| Next-generation sequencing can be performed on multiple platforms each with different characteristics, and each constantly under improvement | Difficulty comparing results from different platforms and with those from older techniques | Pipelines must be constantly updated to account for new techniques | [ | |
| Universal approach not yet possible | Different platforms should be utilized depending on the question asked | |||
| | | Continuously evolving technology requires skilled workforce rather than established pipelines | | |
| Our ability to generate DNA sequence data has rapidly surpassed our computational abilities to analyze the data | Significant requirements for storage of DNA sequence | Perform analysis using a staged approach | [ | |
| | | Assembling and identifying short reads from next-generation sequencing is computationally intensive | Cloud computing | |
| Multiple reference databases are available, which may generate different results depending on the database used | Certain features of a metagenomic sample might be missed if the wrong database is used | HMP aims to sequence multiple references genomes associated with the human body | [ | |
| | | Limited by the diversity represented in each database | HMP currently has a total of 6,500 reference sequences generated | |
| Read lengths depend on sequencing platform used | Makes | Read lengths are continually increasing | [ | |
| | | More difficult to identify large-scale genomic variations and repetitive regions | Third-generation sequencing platforms promise much longer read lengths | |
| Finding a pathogen in a disease sample does not imply causation | Important to determine causation before changing public health management | Follow-up studies are required - for example, using animal models, or serological or epidemiological methods. | [ | |
| | | False association can lead to costly, useless or even potentially harmful therapies | Results must be independently validated | |
| Metagenomics can detect contaminants from cell cultures, reagents and laboratory equipment | Contaminants may be incorrectly associated with the disease of interest | Negative controls must be used | [ | |
| Researchers must consider the plausibility of the findings | ||||
| | | | Results must be independently validated | |
| Host nucleic acids are almost always sequenced in metagenomics studies | Host genetic sequences are confidential | Host DNA to be available only to researchers in HMP | [ | |
| Human subjects might be traceable from their DNA sequences | Only microbiome data are released to the public |