| Literature DB >> 35233021 |
Josefin Olausson1,2, Sofia Brunet1, Diana Vracar1,2, Yarong Tian2, Sanna Abrahamsson2, Sri Harsha Meghadri2, Per Sikora3, Maria Lind Karlberg4, Hedvig E Jakobsson5,6, Ka-Wei Tang1,2.
Abstract
Infection in the central nervous system is a severe condition associated with high morbidity and mortality. Despite ample testing, the majority of encephalitis and meningitis cases remain undiagnosed. Metagenomic sequencing of cerebrospinal fluid has emerged as an unbiased approach to identify rare microbes and novel pathogens. However, several major hurdles remain, including establishment of individual limits of detection, removal of false positives and implementation of universal controls. Twenty-one cerebrospinal fluid samples, in which a known pathogen had been positively identified by available clinical techniques, were subjected to metagenomic DNA sequencing. Fourteen samples contained minute levels of Epstein-Barr virus. The detection threshold for each sample was calculated by using the total leukocyte content in the sample and environmental contaminants found in the bioinformatic classifiers. Virus sequences were detected in all ten samples, in which more than one read was expected according to the calculations. Conversely, no viral reads were detected in seven out of eight samples, in which less than one read was expected according to the calculations. False positive pathogens of computational or environmental origin were readily identified, by using a commonly available cell control. For bacteria, additional filters including a comparison between classifiers removed the remaining false positives and alleviated pathogen identification. Here we show a generalizable method for identification of pathogen species using DNA metagenomic sequencing. The choice of bioinformatic method mainly affected the efficiency of pathogen identification, but not the sensitivity of detection. Identification of pathogens requires multiple filtering steps including read distribution, sequence diversity and complementary verification of pathogen reads.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35233021 PMCID: PMC8888594 DOI: 10.1038/s41598-022-07260-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1DNA metagenomic sequencing workflow. DNA from cerebrospinal fluid specimens, containing leukocytes and pathogens, was extracted and followed by library preparation and sequencing. Datasets generated by the Ion S5 were processed by four different bioinformatics classifiers to profile the microbiome. BLAST was used for verification. Flowchart for identification of pathogens by removing false positive species. Virus contaminants can be removed by comparison of sample datasets with controls by which environmental and bioinformatic misclassifications are identified. Phages can be disregarded as these viruses do not infect human cells. A final manual examination of remaining viral reads is required for coverage and sequence analysis. The bacterial contaminants were removed by applying a filter of cutoff value and comparison between classifiers and controls followed by a manual examination.
Metagenomic sequencing pipeline results.
| Sample | Verified pathogen | Clinical method | qPCR (Geq/ml) | PaRCA (reads) | Kraken2 (reads) | Centrifuge (reads) | CosmosID (reads) | BLAST (reads) | Calculated reads | Range (ppm) | Leukocytes (× 106/l) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | HSV1 | qPCR | 1.0 × 104 | 97 | 105 | 107 | 107 | 108 | 90 | 6.2–7.2 | 41 |
| 2 | VZV | qPCR | 3.9 × 105 | 213 | 219 | 223 | 211 | 213 | 365 | 14.9–16.0 | 272 |
| 3 | VZV | qPCR | 1.9 × 105 | 2196 | 2234 | 2251 | 2170 | 2197 | 3072 | 134.8–147.1 | 17 |
| 4 | JCV | qPCR | 1.9 × 105 | 23,766 | 24,018 | 24,190 | 22,318 | 23,847 | N/A | 1757–2096 | N/A |
| 5 | JCV | qPCR | 4.3 × 103 | 496 | 512 | 515 | 484 | 498 | N/A | 39.8–57.1 | N/A |
| 6 | Cultivation/16S rRNA | N/A | 766,744 | 699,662 | 575,646 | 701,304 | 643,083 | N/A | 30,704–60,611 | 55 | |
| 7 | 16S rRNA qPCR | N/A 3.7 × 102 | 12,988 – | 11,762 – | 12,511 – | 12,277 – | 12,274 – | N/A 0.1 | 679–804 Undet. | 1064 | |
| 8 | Enterovirus | qPCR | 6.6 × 104 | – | – | – | – | – | N/A | Undet. | 95 |
| 9 | Enterovirus EBV | qPCR qPCR | 5.8 × 104 4.1 × 102 | – – | – – | – – | – – | – – | N/A 0.1 | Undet. Undet. | 814 |
| 10 | EBV VZV | qPCR qPCR | 1.9 × 103 4.7 × 103 | 10 7 | 9 7 | 9 7 | 8 7 | 9 7 | 2.5 4.5 | 0.8–1.1 0.7–0.8 | 181 |
| 11 | EBV VZV | qPCR qPCR | 5.0 × 101 2.9 × 103 | – 15 | – 15 | – 15 | – 12 | – 15 | 0.1 5.5 | Undet. 1.2–1.7 | 90 |
| 12 | EBV Yeast sp. | qPCR Cultivation/filmarray | 9.1 × 102 N/A | – – | – – | – – | – – | – – | 0.2 N/A | Undet. Undet. | 164 |
| 13 | EBV | qPCR | 1.9 × 103 | 81 | 85 | 82 | 79 | 82 | 20.5 | 6.7–7.5 | 26 |
| 14 | EBV | qPCR | 3.7 × 102 | – | – | – | – | – | 0.6 | Undet. | 253 |
| 15 | EBV | qPCR | 3.2 × 102 | 6 | 6 | 6 | 6 | 6 | 2.5 | 0.4–0.5 | 44 |
| 16 | EBV | qPCR | 2.7 × 102 | 232 | 228 | 225 | 213 | 223 | 18.5 | 21.2–22.8 | 4 |
| 17 | EBV | qPCR | 1.6 × 102 | 11 | 10 | 11 | 11 | 11 | 0.3 | 1.0–1.2 | 148 |
| 18 | EBV | qPCR | 1.6 × 102 | – | – | – | – | – | N/A | Undet. | < 4 |
| 19 | EBV | qPCR | 8.1 × 101 | – | – | – | – | – | 0.6 | Undet. | 31 |
| 20 | EBV | qPCR | 5.0 × 101 | – | 1 | 1 | – | 1 | 0.99 | 0–0.1 | 14 |
| 21 | EBV | qPCR | 5.0 × 101 | 8 | 8 | 8 | 8 | 9 | 1.5 | 0.7–0.8 | 9 |
Reads from each classifier from verified pathogen. Calculated reads in accordance with the presented algorithm N/A: leukocyte count missing for sample 4 and 5, leukocyte count for sample 18 is below reference value, calculation is not applicable for bacteria, fungi and RNA virus.
16S rRNA 16S rRNA gene Sanger sequencing, HSV1 Herpes simplex virus 1, VZV Varicella Zoster virus, JCV JC polyomavirus, EBV Epstein-Barr virus.
Figure 2Pathogen genome alignment. Coverage density plot of sequencing reads from respective sample and control detected in PaRCA aligned to reference genomes of HSV1 (a), VZV (b), JCV (c), S. pneumoniae (d) and EBV (e, f). Number of reads (y-axis) at each nucleotide position of the genome (x-axis) depicted in blue. Dark blue represents peak, bright blue average and light blue minimum coverage for respective sections of the genome.
Figure 3Calculated pathogen reads and detected pathogens in bioinformatic classifiers. Samples containing HSV1 (dark blue), VZV (blue) and EBV (light blue) with a calculated value of more than one read (a) were plotted against the number of reads detected by PaRCA. Regression line depicting a direct proportionality between the calculated and observed variables. The Spearman's rank correlation coefficient and p-value is indicated. Mean number ± SEM of viral (b) and bacterial species (c) classified in samples and controls using the different bioinformatic classifiers. Dark blue bars show the total number of species classified, bright blue bars show the amount of bacterial species over the fraction cutoff (≥ 0.01% of the dataset), light blue bars show number of species not removed using controls. Purple bars show controls and light gray show controls over the fraction cutoff (≥ 0.01%). Ordinary one-way ANOVA with Tukey’s multiple comparisons, *p value < 0.05, **p value < 0.01, ***p value < 0.001, ****p value < 0.0001.
Figure 4Viral species identified in datasets. Heatmap showing the ten most abundant viral species (y-axis) in each of the 21 samples and 10 controls (x-axis) detected using PaRCA. AcMNPV: Autographa californica multiple nucleopolyhedrovirus. Controls: P; P3HR1, N; Namalwa, W; water.