| Literature DB >> 30262776 |
Christophe Lambert1, Cassandra Braxton2, Robert L Charlebois3, Avisek Deyati4, Paul Duncan5, Fabio La Neve6, Heather D Malicki7, Sebastien Ribrioux8, Daniel K Rozelle9, Brandye Michaels10, Wenping Sun11, Zhihui Yang12, Arifa S Khan13.
Abstract
High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.Entities:
Keywords: adventitious virus detection; bioinformatics pipeline; high-throughput sequencing
Mesh:
Substances:
Year: 2018 PMID: 30262776 PMCID: PMC6213042 DOI: 10.3390/v10100528
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Predicted sensitivity for porcine circovirus (PCV) at a given sequencing depth, with and without incorporation of steps to reduce cellular nucleic acid background. Calculated deep sequencing results of a sample composed of 1 × 107 CHO cells/mL (2.4 × 109 bp genome) spiked with PCV type 2 (PCV-2) (1.7 × 103 bp genome). For a given sequencing depth, the viral concentration at which a single viral read is expected to be obtained is shown. For example, if we would like to ensure sensitivity for PCV-2 above 1 × 106 copies/mL, we would need to generate at least 1 × 107 total reads. The effect of reducing background CHO cell DNA by 1000-fold is shown in the graph by closed squares. Under these conditions, we can expect a similar sensitivity with only 3 × 105 total reads.
Figure 2Potential pipelines for HTS data analysis for virus detection. Any given pipeline might use one or a combination of such paths, or others. See text for details. (A) Pre-processing, (B) Unmapped sequences, (C) Reference subtraction and counter-screen, and (D) Sequences of unidentified origin.