| Literature DB >> 31727957 |
Malik Alawi1,2, Lia Burkhardt1, Daniela Indenbirken1, Kerstin Reumann1, Maximilian Christopeit3, Nicolaus Kröger3, Marc Lütgehetmann4, Martin Aepfelbacher4, Nicole Fischer5,6, Adam Grundhoff7,8.
Abstract
We describe DAMIAN, an open source bioinformatics tool designed for the identification of pathogenic microorganisms in diagnostic samples. By using authentic clinical samples and comparing our results to those from established analysis pipelines as well as conventional diagnostics, we demonstrate that DAMIAN rapidly identifies pathogens in different diagnostic entities, and accurately classifies viral agents down to the strain level. We furthermore show that DAMIAN is able to assemble full-length viral genomes even in samples co-infected with multiple virus strains, an ability which is of considerable advantage for the investigation of outbreak scenarios. While DAMIAN, similar to other pipelines, analyzes single samples to perform classification of sequences according to their likely taxonomic origin, it also includes a tool for cohort-based analysis. This tool uses cross-sample comparisons to identify sequence signatures that are frequently present in a sample group of interest (e.g., a disease-associated cohort), but occur less frequently in control cohorts. As this approach does not require homology searches in databases, it principally allows the identification of not only known, but also completely novel pathogens. Using samples from a meningitis outbreak, we demonstrate the feasibility of this approach in identifying enterovirus as the causative agent.Entities:
Mesh:
Year: 2019 PMID: 31727957 PMCID: PMC6856179 DOI: 10.1038/s41598-019-52881-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic representation of the data processing steps performed by DAMIAN. Depicted are the individual modules in the DAMIAN workflow starting with FASTQ as input files for the analysis. Individual samples are generally processed independently first. An optional cohort analysis can later be performed on any number of previously processed samples.
Time frame in which clinically relevant results were obtained by DAMIAN.
| sample IDa | diagnostic entity | detected pathogen(s) | timeb |
|---|---|---|---|
| 104 | bronchoalveolar lavage | Influenza A | 73 |
| 3157 | bronchoalveolar lavage | Influenza A | 40 |
| 4505 | bronchoalveolar lavage | Chlamydophila psittaci | n/a |
| 9790 | stool | human Parechovirus | 44 |
| 9792 | stool | Sapporovirus | 38 |
| human Parechovirus | 38 | ||
| 1 | stool | Norwalk Virus | 162 |
| 7653 | cerebrospinal fluid | Enterovirus B | 17 |
| SRR1553464 | serum | Zaire Ebolavirus | 13 |
| SRR533978 | serum | Bas Congovirus | 18 |
| SRR1564804 | plasma | Chlamydophila psittaci | n/a |
aDiagnostic sample or public available dataset (SRR1553464, SRR533978, SRR1564804) analyzed by DAMIAN.
bTime (in minutes) until the first report of a putative pathogen was received.
n/a: not applicable; time frames are only calculated for viral contigs since DAMIAN prioritizes viral sequences. The analysis was performed using 12 threads of a server with two Intel Xeon E5-2687W v3 CPUs.
Comparison of BAL sample analysis results obtained by DAMIAN, Taxonomer BETA, PathoScope and metaMix.
| 104 | 3157 | 4505 | |
|---|---|---|---|
| DAMIAN | 23,828,285 reads 99,23% human sequences | 3,265,314 reads 51,74% human sequences | 2,370,210 reads 98,13% human sequences |
✓ • H1N1 all 8 segments (8 contigs) 97–100% id. ○ Influenza A ○ A/Singapore/TT198/2011 (H1N1) ○ A/Swine/France/71-130116/2013 (H1N1) ○ A/Swine/France/71-130116/2013 (H1N1) ○ A/Singapore/TT198/2011 (H1N1) ○ A/Santa Clara/YGA_03065/2013(H1N1), ○ A/Arizona/M2/2012(H1N1) ○ A/Swine/France/71-130116/2013 (H1N1) ✓ | ✓ • H3N2 all 8 segments (8 contigs) 99% id. ○ PB2, A/Connecticut/Flu140/2013(H3N2) ○ PB1, A/Connecticut/Flu140/2013(H3N2) ○ PA, A/Connecticut/Flu140/2013(H3N2) ○ HA, A/Connecticut/Flu140/2013(H3N2) ○ NP, A/Connecticut/Flu140/2013(H3N2) ○ NA, A/Connecticut/Flu140/2013(H3N2) ○ M2, M1, A/Connecticut/Flu140/2013(H3N2) ○ NEP,NS1, A/Connecticut/Flu140/2013(H3N2) ✓ ✓ ✓ | ✓ • Chlamydophila psittacci 6BC, 4 contigs, 16S and 23S rRNA • Chamydophila VS225, 1 contig, 16S rRNA • Chlamydophila Mat116, 1 contig, 16S rRNA | |
Taxonomer BETA | 9,900,000 reads sampled*; 5% classified Bacteria: 62,128 reads; Viruses: 10,723 reads; Fungi: 86 reads | 3,200,000 reads samples; 13% classified Bacteria: 300,309 reads; Viruses: 7,686 reads; Fungi: 17,878 reads | 2,300,000 reads samples; 4% classified Bacteria: 24,960 reads; Viruses: 950 reads; |
✓ • H1N1 (1,227 reads) ✓ α-retrovirus (9,677 reads) ✓ dsDNA virus (505 reads) | ✓ • H3N2 (354 reads) ✓ ✓ α-retrovirus (907 reads) ✓ Caudovirales (1,299 reads) ✓ ✓ | ✓ • • ✓ Proteobacteria (510 reads) ✓ Firmicutes (33 reads) ✓ α-retrovirus (169 reads) ✓ Herpesviridae (52 reads) | |
| PathoScope | 45,576 aligned reads; 2,234 hits ✓ • Subtypes H3N2; H5N1; H1N1; H9N2; H2N2 ✓ Hepatitis C (224 reads) • Genotype 2; 1; 6 ✓ Encephalomyocarditis Virus (115 reads) | 184,719 aligned reads; 2,434 hits ✓ ✓ Subtypes H3N2 ✓ Avian leukosis virus (1,256 reads) ✓ ✓ Veillonella parvula (130,363 reads) ✓ Enterococcus faecium (21,178 reads) | 4,408 aligned reads; 1,752 hits ✓ |
| metaMix | 26 hits; 7,328,046 human reads ✓ • H1N1, 1 contig; A/Canela/LACENRS-418/2013 ✓ Bacteria 3 contigs (372 reads) | 44 hits; 201,764 human reads ✓ • H3N2, 2 contigs; A/Bage/LACENRS-205/2013; A/Porto Alegre/LACENRS-275/2013 ✓ Bacteria; 15 contigs (109,048 reads) | 16 hits; 142,965 human reads ✓ |
*Files >5GB are not supported by taxonomer BETA version; 10,000,000 reads were randomly sampled to meet 5 GB maximum size for upload. *Files >5GB are not supported by taxonomer BETA version; 10,000,000 reads were randomly sampled to meet 5 GB maximum size for upload. # fractional read abundance given by PathoScope.
Comparison of analysis results for stool samples obtained by DAMIAN, Taxonomer, PathoScope and metaMix.
| SRR533978 | SRR1553464 | SRR1564804 | |
|---|---|---|---|
| DAMIAN | 2,538,346 reads 7.7% human sequences | 1,752,608 reads 1.34% human sequences | 627,013 reads 0.83% human sequences |
✓ ✓ Paraburkholderia tropica (335,121 reads, 23.9%) ✓ Paraburkholderia fungorium (186,569 reads, 13.3%) | ✓ ✓ Ralstonia (8,482 reads, 489 contigs) | ✓ ✓ ✓ | |
| Taxonomer BETA | 196 K reads sampled, 23% classified threshold 50 reads | 179 K reads sampled, 67% classified threshold 50 reads | 184 K reads sampled, 50% classified threshold 50 reads |
✓ ✓ ✓ Neisseria meningitis 6,334 reads ✓ Mycobacteria (2,814 reads) ✓ Microbacterium laevaniformans (13,314 reads) ✓ Candida albicans (199 reads) | ✓ ✓ Bradyrhizobium (1,595 reads) ✓ Actinobacteria (1,776 reads) | ✓ ✓ ✓ Actinobacteria (7,830 reads) ✓ Bacilli (4,025 reads) ✓ Alphaproteobacteria (10,169 reads) | |
| PathoScope | 537 hits ✓ Burkholderia gladioli BSR 3 (88,777 reads) ✓ Staphylococcus epidermidis (15,834 reads) ✓ Acidovorax sp. JS42 (15,306 reads) ✓ Hepatitis C Virus (58 reads) | 843 hits ✓ ✓Ralstonia pickettii (12,823 reads) ✓HHV-4 (4 reads) ✓Human Adenovirus C (1 read) | 802 hits ✓ ✓ Ralstonia pickettii (30,029 reads) ✓Staphylococcus aureus (10,436 reads) ✓ ✓ |
| metaMix | 109 hits; 11,130 human reads ✓ ✓ Paraburkholderia 11 contigs; 89,485 reads ✓ Burkholderia 24 contigs; 79,270 reads | 141 hits; 406 human reads ✓ ✓ Ralstonia (9 contigs; 6,090 reads) ✓ Bradyrhizobium (11 contigs; 2,334 reads) | 31 hits; 123 human reads ✓ ✓ ✓ |
Figure 2Application of DAMIAN to RNA-Seq libraries from diagnostic BAL samples from patients with viral respiratory infections. Donut shaped charts represent the distribution of host (grey) versus non-host (orange) reads. The pie chart illustrated the taxonomic classification of non-host reads; represented are the relative abundance of contigs assigned to these species. Reads not aligning to sequences in the NCBI database are indicated in black, bacterial sequences are represented in yellow, viral contaminants are shown in pink. The pathogen most likely contributing to the clinical symptoms is indicated in read. In each sample, the contigs of the putative pathogen identified in the sample are aligned to the closest relative: (A) Influenza A, H1N1 (full-length segments); (B) Influenza A, H3N2 (full-length segments); PIV3 and HSV-1.
Figure 3Application of DAMIAN to RNA-Seq libraries from diagnostic BAL samples from patients with bacterial respiratory infections. Similar to Fig. 2, the pie charts represent the distribution of host and non-host reads (left) and the taxonomic classification of non-host reads (right). The contigs of the putative pathogen identified are aligned to the closest relative, Chlamydophila psittaci.
Figure 4Application of DAMIAN to RNA-Seq libraries from diagnostic stool samples from patients with acute gastroenteric disease. Sequences are depicted as described in Fig. 2. The putative pathogens identified are (A) Sapovirus 1, Parechovirus 1 and (B) Parechovirus 6.
Figure 5Identification of full-length genomes of three different Norovirus strains from a stool sample from a patient with acute gastroenteric disease. Primate Norovirus, Genbank entry KX396056 is identical to NC_031324 describing a human norovirus in diarrheic chimps; next closest assignment NC_039897.1, human Norovirus GI, 92% sequence identity.
Comparison of stool sample analysis results obtained by DAMIAN, Taxonomer, PathoScope and metaMix.
| 9790 | 9792 | 1 | |
|---|---|---|---|
| DAMIAN | 1,667,291 reads 0,64% human sequences | 1,347,375 reads 0,16% human sequences | 23,292,070 reads 1,36% human sequences |
✓ ✓ Bacteroides ✓ Bifidobacterium | ✓ • ✓ ✓ Bifidobacterium | ✓ • ✓ Bacteria | |
Taxonomer BETA | 1,600,000 reads samples; 77% classified Bacteria: 1,207,356 reads; Viruses: 2,493 reads; Fungi: 419 reads | 1,300,000 reads samples; 86% classified Bacteria: 925,978 reads; Viruses: 13,005 reads; | 9,900,000 reads sampled*; 85% classified Bacteria: 8,200,128 reads; Viruses: 100,803 reads; Fungi: 86 reads |
✓ • • ✓ α-retrovirus (1166 reads) ✓ ds DNA viruses (1,334 reads) ✓ ss DNA viruses (353 reads) ✓ Bacteroidetes (492,974 reads) ✓ Actinobacteria (99,321 reads) ✓ Proteobacteria (89,830 reads) ✓ Firmicutes (388,188 reads) | ✓ • ✓ ✓ Pandoravirus (247 reads) ✓ Actinobacteria (538,844 reads) Bifidobacteriales (463,853 reads) ✓ Firmicutes (101,073 reads) | ✓ • G1/10360/2010/NM 950 reads • GI/DH1751/2009/IND 30 reads • GI.3/13440/2007/RJ/BRA 80 reads • GI.3/C9/GF/1978 10 read • GI.4/1643/2008/US 70 reads • GI.4/15waterBS/T11/ITA 10 read • GII.4 Bejing 40 reads ✓ α-retrovirus (150 read) ✓ Parvovirus NIH-CQV (10 read) Bacteria | |
| PathoScope | 1,226,383 aligned reads; 2,210 hits ✓ α-retrovirus (113 reads) ✓ HHV8 (7 reads) ✓ Pepper mild mottle virus (2 reads) ✓ Actinobacteria, Bifidobacteriaceae (166,773 reads) ✓ Bacteroidetes (784,119 reads) ✓ Firmicutes, Clostridiales (138,760 reads) | 1,116,813 aligned reads, 2,205 hits ✓ ✓ ✓α-retrovirus (16 reads) ✓human herpesvirus 6A (2 reads) >900,000 reads Bifidobacterium | 16,713,832 aligned reads; 2,558 hits ✓ ✓human papillomavirus (8 reads) ✓polyomavirus (4 read) ✓Hepatitis C Virus (35 reads) ✓Human Herpesvirus (89 reads) ✓α-retrovirus (2,423 reads) >5,000,000 reads Bacteroides |
| metaMix | 88 hits; 1,771 human reads Bacteria; 658,685 reads Bacteroides Bifidobacterium | 45 hits; 0 human reads ✓ Bifidobacterium | 138 hits; 0 human reads ✓ ✓ Circoviridae (1 contig; 5,433 reads) |
*Files >5GB are not supported by taxonomer BETA version; 10,000,000 reads were randomly sampled to meet 5 GB maximum size for upload. # Genbank entry KX396056 is identical to NC_031324 describing a human norovirus in diarrheic chimps; next closest assignment NC_039897.1, human Norovirus GI, 92% sequence identity.
Figure 6Application of DAMIAN to RNA-Seq libraries from diagnostic stool samples from patients with encephalitis. Sequences are depicted as described in Fig. 2. Two contigs representing significant sequence homology to Echovirus 30 were identified.
Comparison of CSF sample analysis results obtained by DAMIAN, Taxonomer BETA, PathoScope and metaMix.
| 7653 | |
|---|---|
| DAMIAN | 1,618,480 reads 86,86% human sequences |
✓ • | |
| Taxonomer BETA | 1,600,000 reads sampled; 5% classified |
✓ • Coxsackievirus B2 (10 reads) ✓ α-retrovirus (742 reads) ✓ Caudovirales (745 reads) ✓ Herpesvirales (57 reads) • HHV6A (21 reads) | |
| PathoScope | 6,021 aligned reads; 2,234 hits ✓ • Enterovirus 107 (30 reads) • Enterovirus 100 (4 reads) • Enterovirus B (5 reads) ✓ Encephalomyocarditis Virus (85 reads) ✓ Adenovirus (1 read) • Adenovirus F (1 read) ✓ Hepatitis C (1 read) • Genotype 1 (1 read) ✓ α-retrovirus (398 reads) |
| metaMix | its; 182,276 human reads ✓ |
Figure 7Cluster analysis of CSF samples from encephalitis and control cases. (A) Schematic depiction of the cohort analysis performed on five samples derived from an enterovirus outbreak and 22 unrelated control samples. Single linked cluster analysis produced 13,457 clusters from ~16,500 individual samples. Depending on the distribution of samples that do or do not contribute contigs to a given cluster, these can be assigned to one of a total 267 observed ‘signature’ patterns. The lower panel schematically depicts the highest (score = + 1), lowest (−0.45) and neutral (0) scoring signature patterns, with filled (dark green or grey for encephalitis or cohort samples, respectively) or empty squares symbolizing samples that do or do not contribute contigs, respectively. The total numbers of clusters assigned to each of the three signatures is shown to the left. (B) Distribution map and frequencies of observed signature patterns. Each row depicts one of the 267 observed signature patterns as described above under (A). Signatures (black and light gray rectangles for positive and negative samples, respectively) are ordered by their score (plotted to the right). The ten signatures in which all encephalitis samples contribute contigs are shown enlarged at the top. The colored heat map bar to the left indicates the number of clusters that share a given signature pattern. The taxonomic annotation (lowest common ancestor of individual contig assignments, or ‘unknown’ if contigs do not have significant hits) of the 15 clusters with the highest scoring pattern are indicated at the top.