| Literature DB >> 31528358 |
Maha Maabar1, Andrew J Davison1, Matej Vučak1, Fiona Thorburn2, Pablo R Murcia1, Rory Gunson3, Massimo Palmarini1, Joseph Hughes1.
Abstract
High-throughput sequencing (HTS) enables most pathogens in a clinical sample to be detected from a single analysis, thereby providing novel opportunities for diagnosis, surveillance, and epidemiology. However, this powerful technology is difficult to apply in diagnostic laboratories because of its computational and bioinformatic demands. We have developed DisCVR, which detects known human viruses in clinical samples by matching sample k-mers (twenty-two nucleotide sequences) to k-mers from taxonomically labeled viral genomes. DisCVR was validated using published HTS data for eighty-nine clinical samples from adults with upper respiratory tract infections. These samples had been tested for viruses metagenomically and also by real-time polymerase chain reaction assay, which is the standard diagnostic method. DisCVR detected human viruses with high sensitivity (79%) and specificity (100%), and was able to detect mixed infections. Moreover, it produced results comparable to those in a published metagenomic analysis of 177 blood samples from patients in Nigeria. DisCVR has been designed as a user-friendly tool for detecting human viruses from HTS data using computers with limited RAM and processing power, and includes a graphical user interface to help users interpret and validate the output. It is written in Java and is publicly available from http://bioinformatics.cvr.ac.uk/discvr.php.Entities:
Keywords: diagnosis; high-throughput sequencing; k-mer; virus
Year: 2019 PMID: 31528358 PMCID: PMC6735924 DOI: 10.1093/ve/vez033
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.DisCVR framework. Each colored box represents a component of the tool. Dashed rectangles indicate processes and solid rectangles show input and output.
Figure 2.DisCVR GUI. The top screenshot shows the scoring panel with the top three virus hits, and the bottom screenshot shows the full analysis.
Figure 3.DisCVR validation. Coverage and depth of matched k-mers (top) and reads (bottom) to a reference genome.
Figure 4.ROC curve showing the accuracy of DisCVR, CLARK and Kraken. The transparent shaded area shows the confidence interval of the sensitivity for all three methods. The optimal threshold of 850 k-mers for DisCVR and 150 reads for CLARK and Kraken are shown, with bars representing the confidence interval of the threshold and the specificity and sensitivity shown in brackets. The curve for KrakenHLL was identical to that for Kraken. The diamond indicates the sensitivity and specificity values, counting the false positives with ≥850 k-mers and the second hits with ≥850 k-mers among the true positives for DisCVR, CLARK, and Kraken.
Results of the second hits in the respiratory samples.
| Sample | RT-PCR diagnosis | DisCVR top hit and (no.) | DisCVR second hit and (no.) |
|---|---|---|---|
| Top hit with ≤850 k-mers matching | |||
| 1G2 | PIV-3 | PIV-3 (366) | HRV-A (149) |
| 1I5 | HRV | HRV-A (749) | HRV-C (470) |
| 2B6 | RSV | RSV (742) | IFA H3N2 (262) |
| Second hit with ≥850 k-mers matching | |||
| 1B5 | PIV-3 |
|
|
| 1D3 | HCoV NL63 |
|
|
| Second hit with ≤850 k-mers matching | |||
| 1C2 | HRV |
| HRV-A (269) |
| 1E5 | RSV |
| RSV (415) |
| 1F8 | HCoV NL63 |
| HCoV NL63 (724) |
| 2B9 | HRV |
| HRV- C (94) |
| 2A2 | HCoV 229E | HRV-C (770) | HCoV 229E (176) |
| 2C4 | HCoV 229E | HRV-A (264) | HCoV 229E (5) |
| 2D3 | HCoV OC43 | HRV-A (438) | HCoV OC43 (135) |
| 1F7 | HRV | hMPV (27) | HRV-B (20) |
| 1G1 | ADV/HRV | HCoV OC43 (163) | HRV-B (118) |
| Not detected | |||
| 1C9 | hMPV |
| Enterovirus D (7) |
| 2D4 | PIV-2 | HRV-A (579) | HCoV OC43 (225) |
aNumber of k-mers matching the classification. Hits with ≥850 k-mers are shown in bold.
Coverage of reference genomes of the top hits detected in false positive samples in the respiratory samples.
| Sample | Virus detected by DisCVR | Matched | Genome coverage (%) | No. mapped reads (%) |
|---|---|---|---|---|
| 1B3 | HRV-A | 3,431 | 7.6 | 4 (0.00) |
| 1B4 | HRV-A | 3,652 | 9.39 | 14 (0.00) |
| 1B6 | HRV-A | 2,872 | 6.38 | 16 (0.00) |
| 1B9 | HRV-A | 1,041 | 2.15 | 1,404 (0.10) |
| 1C8 | HRV-A | 2,781 | 8.21 | 8 (0.00) |
| 1D2 | HRV-A | 2,974 | 9.38 | 13 (0.00) |
| 1D5 | HRV-C | 901 | 3.63 | 8 (0.00) |
| 1D6 | HRV-C | 1,103 | 3.27 | 5 (0.99) |
| 1E2 | HRV-C | 1,299 | 1.51 | 1 (0.00) |
| 1E4 | HRV-C | 1,813 | 4.8 | 7 (0.00) |
| 1E9 | HRV-B | 4,306 | 13.69 | 27 (0.01) |
| 1G7 | HRV-B | 1,447 | 1.76 | 5 (0.00) |
| 1H5 | HRV-B | 932 | 3.84 | 4 (0.00) |
| 1I7 | HRV-C | 1,234 | 1.51 | 1 (0.00) |
| 1I9 | HRV-C | 1,845 | 3.1 | 9 (0.00) |
| 2A1 | RSV | 2,123 | 13.37 | 172 (0.02) |
| 2B5 | RSV | 927 | 13.56 | 69 (0.01) |
| 2B8 | RSV | 1,406 | 8.64 | 101 (0.01) |
| 2D1 | HRV-C | 1,620 | 1.59 | 2 (0.00) |
Number of matching k-mers identified by the classification module.
Percentage of total reads mapped by the validation module.