| Literature DB >> 34960810 |
Izabela Fabiańska1, Stefan Borutzki1, Benjamin Richter1, Hon Q Tran1, Andreas Neubert1, Dietmar Mayer1.
Abstract
High-throughput sequencing (HTS) allows detection of known and unknown viruses in samples of broad origin. This makes HTS a perfect technology to determine whether or not the biological products, such as vaccines are free from the adventitious agents, which could support or replace extensive testing using various in vitro and in vivo assays. Due to bioinformatics complexities, there is a need for standardized and reliable methods to manage HTS generated data in this field. Thus, we developed LABRADOR-an analysis pipeline for adventitious virus detection. The pipeline consists of several third-party programs and is divided into two major parts: (i) direct reads classification based on the comparison of characteristic profiles between reads and sequences deposited in the database supported with alignment of to the best matching reference sequence and (ii) de novo assembly of contigs and their classification on nucleotide and amino acid levels. To meet the requirements published in guidelines for biologicals' safety we generated a custom nucleotide database with viral sequences. We tested our pipeline on publicly available HTS datasets and showed that LABRADOR can reliably detect viruses in mixtures of model viruses, vaccines and clinical samples.Entities:
Keywords: adventitious virus testing; bioinformatics workflow; high-throughput sequencing; virus classification
Mesh:
Year: 2021 PMID: 34960810 PMCID: PMC8704571 DOI: 10.3390/v13122541
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Workflow of LABRADOR workflow. Two approaches of viral sequences classification are highlighted in blue and grey.
Figure 2Flowchart of viral database generation. The database includes nucleotide sequences of viruses listed in the guidelines for biologic safety and known to infect vertebrates (Table S1).
Figure 3Accuracy of virus classification with the LABRADOR pipeline assessed on the MetaShot dataset containing the simulated microbiome reads published by Fosso et al. (2017) [27]. (a) Number of viral species and genera classified by LABRADOR and found in taxonomic standard profile generated for MetaShot dataset [32]. (b) Number of reads that were classified to viral species found by LABRADOR or generated in silico for the MetaShot dataset. Reads mapped to reference sequence in 1-st approach of LABRADOR workflow (classification based on reads characteristic profile) were considered (Table S3).
Precision and recall values for the Metashot dataset. The precision and recall values for Centrifuge, Kraken2, and Lazypipe were published by Plyusnin et al., 2020 [32].
| Taxomic Level | Tool | Precision [%] | Recall [%] |
|---|---|---|---|
| Species | LABRADOR | 95 | 90.5 |
| Lazypipe-nt | 97.2 | 82.1 | |
| Lazypipe | 90.0 | 85.7 | |
| Centifuge | 63.0 | 95.2 | |
| MetaPhlan2 | 84.4 | 45.2 | |
| Kraken2 | 94.1 | 19.0 | |
| Genus | LABRADOR | 95.5 | 93.3 |
| Lazypipe-nt | 95.3 | 91.1 | |
| Lazypipe | 95.3 | 91.1 | |
| Centifuge | 84.9 | 100 | |
| MetaPhlan2 | 88.9 | 71.1 | |
| Kraken2 | 95.5 | 46.7 |