| Literature DB >> 29703817 |
Brigid M O'Flaherty1,2, Yan Li1, Ying Tao1, Clinton R Paden1,2, Krista Queen1,2, Jing Zhang1,3, Darrell L Dinwiddie4, Stephen M Gross5, Gary P Schroth5, Suxiang Tong1.
Abstract
Next generation sequencing (NGS) technologies have revolutionized the genomics field and are becoming more commonplace for identification of human infectious diseases. However, due to the low abundance of viral nucleic acids (NAs) in relation to host, viral identification using direct NGS technologies often lacks sufficient sensitivity. Here, we describe an approach based on two complementary enrichment strategies that significantly improves the sensitivity of NGS-based virus identification. To start, we developed two sets of DNA probes to enrich virus NAs associated with respiratory diseases. The first set of probes spans the genomes, allowing for identification of known viruses and full genome sequencing, while the second set targets regions conserved among viral families or genera, providing the ability to detect both known and potentially novel members of those virus groups. Efficiency of enrichment was assessed by NGS testing reference virus and clinical samples with known infection. We show significant improvement in viral identification using enriched NGS compared to unenriched NGS. Without enrichment, we observed an average of 0.3% targeted viral reads per sample. However, after enrichment, 50%-99% of the reads per sample were the targeted viral reads for both the reference isolates and clinical specimens using both probe sets. Importantly, dramatic improvements on genome coverage were also observed following virus-specific probe enrichment. The methods described here provide improved sensitivity for virus identification by NGS, allowing for a more comprehensive analysis of disease etiology.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29703817 PMCID: PMC5991510 DOI: 10.1101/gr.226316.117
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Distribution of sequence reads for reference samples enriched with virus-specific probes. The frequency of reads identified by Kraken for each sample with and without enrichment (H: hybridized; NH: nonhybridized) is shown in bar graphs for each viral family/subfamily tested: (A) Coronavirinae; (B) Adenoviridae; (C) Parvovirinae; (D) Picornaviridae; (E) Paramyxoviridae; (F) Pneumoviridae; (G) Orthomyxoviridae. (*) Frequency of reads obtained from BWA-MEM read mapping. Abbreviations of virus names are listed in Supplemental Table S1.
Figure 2.Distribution of sequence reads for reference samples enriched with conserved viral group probes. The frequency of reads identified by Kraken for each sample with and without enrichment (H: hybridized; NH: nonhybridized) is shown in bar graphs for each viral family/subfamily tested: (A) Coronavirinae; (B) Adenoviridae; (C) Polyomaviridae; (D) Parvovirinae; (E) Reoviridae; (F) Picornaviridae; (G) Paramyxoviridae; (H) Pneumoviridae; (I) Orthomyxoviridae. (*) Frequency of reads obtained from BWA-MEM read mapping. Abbreviations of virus names are listed in Supplemental Table S1.
Figure 3.Sensitivity of enrichment in hybridization. Samples were prepared as 10-fold serial dilutions of reference viral nucleic acids spiked into a constant amount of human RNA prior to library preparation. The frequency of reads identified by Kraken is shown in bar graphs for each sample with and without enrichment (H: hybridized; NH: nonhybridized) for (A) virus-specific probe enrichment and (B) conserved viral group probe enrichment. From the same sequencing run, the linear genome coverage is shown (C) for samples with enrichment (diagonal stripes) or without enrichment (white) with virus-specific probes. Viral Ct and (average) depth of coverage are shown below the bar graphs. Abbreviations of virus names are listed in Supplemental Table S1.
Figure 4.Distribution of sequence reads for clinical samples enriched with virus-specific probes. The frequency of reads identified by Kraken for each sample with and without enrichment (H:hybridized; NH: nonhybridized) is shown in bar graphs for each viral family/subfamily tested: (A) Coronavirinae; (B) Adenoviridae; (C) Parvovirinae; (D) Picornaviridae; (E) Paramyxoviridae; (F) Pneumoviridae; (G) Orthomyxoviridae. Viral Ct is shown below the bar graphs. (ND) Ct not available. Abbreviations of virus names are listed in Supplemental Table S1.
Figure 5.Distribution of sequence reads for clinical samples enriched with conserved viral group probes. The frequency of reads identified by Kraken for each sample with and without enrichment (H: hybridized; NH: nonhybridized) is shown in bar graphs for each viral family/subfamily tested: (A) Coronavirinae; (B) Adenoviridae; (C) Parvovirinae; (D) Polyomaviridae; (E) Reoviridae; (F) Picornaviridae; (G) Paramyxoviridae; (H) Pneumoviridae; and (I) Orthomyxoviridae. Viral Ct is shown below the bar graphs. (ND) Ct not available. (*) Frequency of reads obtained from BWA-MEM read mapping. Abbreviations of virus names are listed in Supplemental Table S1.
Virus-specific probe hybridization for mixed clinical samples with known viral infection