| Literature DB >> 35335664 |
Ellen C Carbo1, Igor A Sidorov1, Anneloes L van Rijn-Klink1, Nikos Pappas2,3, Sander van Boheemen1,4, Hailiang Mei2, Pieter S Hiemstra5, Tomas M Eagan6, Eric C J Claas1, Aloys C M Kroes1, Jutte J C de Vries1.
Abstract
Viral metagenomics is increasingly applied in clinical diagnostic settings for detection of pathogenic viruses. While several benchmarking studies have been published on the use of metagenomic classifiers for abundance and diversity profiling of bacterial populations, studies on the comparative performance of the classifiers for virus pathogen detection are scarce. In this study, metagenomic data sets (n = 88) from a clinical cohort of patients with respiratory complaints were used for comparison of the performance of five taxonomic classifiers: Centrifuge, Clark, Kaiju, Kraken2, and Genome Detective. A total of 1144 positive and negative PCR results for a total of 13 respiratory viruses were used as gold standard. Sensitivity and specificity of these classifiers ranged from 83 to 100% and 90 to 99%, respectively, and was dependent on the classification level and data pre-processing. Exclusion of human reads generally resulted in increased specificity. Normalization of read counts for genome length resulted in a minor effect on overall performance, however it negatively affected the detection of targets with read counts around detection level. Correlation of sequence read counts with PCR Ct-values varied per classifier, data pre-processing (R2 range 15.1-63.4%), and per virus, with outliers up to 3 log10 reads magnitude beyond the predicted read count for viruses with high sequence diversity. In this benchmarking study, sensitivity and specificity were within the ranges of use for diagnostic practice when the cut-off for defining a positive result was considered per classifier.Entities:
Keywords: bioinformatics; next-generation sequencing; pathogen detection; viral metagenomics
Year: 2022 PMID: 35335664 PMCID: PMC8953373 DOI: 10.3390/pathogens11030340
Source DB: PubMed Journal: Pathogens ISSN: 2076-0817
Overview of respiratory PCR panel targets and their test results.
| PCR | Family | Genus | Species | Alternative Naming | # PCR Positive Samples | # PCR Negative Samples | PCR Ct-Values |
|---|---|---|---|---|---|---|---|
| Target Viruses | (Range) | ||||||
| HRV |
|
|
| 14 | 74 | 19–38 | |
| PIV1, PIV3 |
|
|
|
| - | 88 | - |
|
|
| 2 | 86 | 26–36 | |||
| PIV2, PIV4 |
|
|
|
| - | 88 | - |
|
|
| 1 | 87 | 24 | |||
| INF |
|
|
| 3 | 85 | 29–36 | |
|
| - | 88 | - | ||||
| ACoV |
|
|
| 2 | 86 | 32 | |
|
| - | 88 | - | ||||
| BCoV |
|
|
| 2 | 86 | 27 | |
| HMPV |
|
|
| - | 88 | - | |
| RSV |
|
|
| - | 88 | - | |
| Total | Total PCR results: 1144 (13 targets tested in 88 samples) | 24 | 1120 | 19–38 |
Overview of characteristics of the classifiers evaluated.
| Centrifuge | Clark | Kaiju | Kraken 2 | Genome | |
|---|---|---|---|---|---|
| License | Open source | Open source | Open source | Open source | Commercial/free to use web application |
| Version | 1.0.4 | 1.2.6.1 | 1.7.3 | 2.0.8-beta | 1.126 |
| Sequencing technology compatibility | Short/long reads | Short/long reads | Short/long reads | Short/long reads | Short reads (long reads experimentally) |
| Pre-processing | No | No | No | No | Yes |
| Type of alignment | NT | NT | AA | NT | NT/AA |
| Algorithm characteristics | Exact matches of 22 bp with target with default five labels per sequence, LCA optional | Exact matches of 31 bp with target with highest number of hits | Maximum exact matches (MEM) of AA, up to five mismatched optional *. LCA in case of multiple hits | Exact matches of 35 bp. LCA in case of multiple hits | Combined results of NT and AA hits based on scoring. LCA in case of multiple hits |
| Database (compression) | Compressed index NT | Compressed index NT database of only unique sequences | No compression, AA database | Compressed index NT database | No compression, viral subset of Swiss-Prot UniRef90 protein database |
NT; nucleotide, AA; amino acid; LCA, lowest common ancestor. * Greedy-5 mode was used in the current study.
Figure 1ROC curves calculated based on reads of taxonomic assignment at three. taxonomic levels (species, genus, and family) by the five classifiers, based on PCR-targets, (a), without extraction of human reads and (b), after extraction of human reads, (c), after extraction of human reads and normalization of reads by corresponding genome lengths (resolution of 1000 steps from one read to the maximum number of sequence reads for each PCR target per sample).
Figure 2Sensitivity, selectivity, AUC, and ROC distance calculated based on assignment at three taxonomic levels (species, genus, and family) by the five classifiers for three types of pre-processing of the NGS datasets, a, without extraction of human reads and b, after extraction of human reads, c, after extraction of human reads and normalization of reads by corresponding genome lengths.
Figure 3Correlation between the number of sequence reads assigned (species level) and Ct-values of virus-specific PCRs, for the five taxonomic classifiers evaluated, (a), without extraction of human reads and (b), after extraction of human reads, (c), after normalization of reads by corresponding genome lengths.