| Literature DB >> 25266224 |
Tae-Hyuk Ahn1, Juanjuan Chai1, Chongle Pan1.
Abstract
MOTIVATION: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25266224 PMCID: PMC4287953 DOI: 10.1093/bioinformatics/btu641
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Conceptual overview of the Sigma algorithm. The inputs are metagenomic reads and user-defined reference genomes (top panel). The alignment of reads to genomes is used to define a probabilistic model of metagenomic sequencing (middle panel). Genomes are detected with hypothesis testing, quantified with confidence interval estimation, and scanned for sequence variations (bottom panel).
Comparison of Sigma, Pathoscope, MetaPhlAn, and MEGAN in strain-level identification and quantification using a 5-genome synthetic community. (RA%: relative abundance in percentage)
| Simulation Input | Sigma | Pathoscope | MEGAN | MetaPhlAn | |||||
|---|---|---|---|---|---|---|---|---|---|
| Strains | RA(%) | Strains | RA(%) | Strains | RA(%) | Strains/Species | RA(%) | Species | RA(%) |
|
S.enterica serovar Paratyphi A strain ATCC 9150 | 60 |
S.enterica serovar Paratyphi A strain ATCC 9150 | 59.85 |
S.enterica serovar Paratyphi A strain ATCC 9150 | 60.11 | 1.58 | 31.05 | ||
| 45.49 | |||||||||
| 7.78 | |||||||||
E.fergusonii ATCC 35469 | 27 | E.fergusonii ATCC 35469 | 26.94 | E.fergusonii ATCC 35469 | 27.37 | E.fergusonii ATCC 35469 | 17.49 | 28.84 | |
E.coli K12 substr MG1655 | 9 | E.coli K12 substr MG1655 | 8.91 | E.coli K12 substr MG1655 | 9.64 | E.coli K-12 | 0.02 | 13.36 | |
E.coli O157:H7 TW14359 | 3 | E.coli O157:H7 TW14359 | 3.11 | E.coli O157:H7 TW14359 | 2.6 | 2.19 | |||
E.coli O157:H7 Sakai | 1 | E.coli O157:H7 Sakai | 0.94 | E.coli O157:H7 Sakai | 0.018 | 0.02 | |||
| 25.26 | 26.76 | ||||||||
Summary statistics of the identification and quantification results of a 100-genome synthetic community
| Sigma | Patho scope | MEGAN | Meta PhlAn | ||
|---|---|---|---|---|---|
| Expected Genomes/Species | 100 | 100 | 100 | 91 | |
| True Positive | Accurate RA | 100 | 73 | 23 | 6 |
| Inaccurate RA | 0 | 25 | 67 | 71 | |
| False Positive | 0 | 0 | 1 | 18 | |
| False Negative | 0 | 2 | 10 | 14 | |
aMetaPhlAn performed species-level analysis. Statistics in this column are numbers of species.
bTrue positive identifications may have either accurate RA estimations within 0.95 to 1.05 of their expected RAs or inaccurate RA estimations outside this range.
Comparison of analysis time and peak memory usage
| Alignment | Abundance Estimation | Total | ||
|---|---|---|---|---|
| Wall-Clock Time (hr) | Wall-Clock Time (hr) | Memory (GB) | Wall-Clock Time (hr) | |
| Sigma | 18 | 1 | 62 | 19 |
| Pathoscope | 70 | 13 | 118 | 83 |
| MEGAN | 70 | 12 | 93 | 82 |
| MetaPhlAn | N/A | 0.2 | 1 | 0.2 |
Fig. 2.Identification of a Salmonella enterica strain at a serial dilution of relative abundances in a human fecal microbiota background. (a) Likelihood ratios of all aligned Salmonella enterica strains. Only the correct strain (highlighted in red outline) has statistically significant identification with <0.01 p-value down to the 0.001% dataset. (b) Estimated and expected relative abundances (RA) of the spike-in Salmonella enterica strain. Point estimates (red dots) were bracketed by 95% confidence intervals (blue error bars) with small relative standard deviations (RSD) down to 0.001% (0.027X coverage depth).