| Literature DB >> 28103917 |
Jeremy W Cox1,2, Richard A Ballweg2, Diana H Taft2, Prakash Velayutham3, David B Haslam4, Aleksey Porollo5,6.
Abstract
BACKGROUND: Metagenomics is a rapidly emerging field aimed to analyze microbial diversity and dynamics by studying the genomic content of the microbiota. Metataxonomics tools analyze high-throughput sequencing data, primarily from 16S rRNA gene sequencing and DNAseq, to identify microorganisms and viruses within a complex mixture. With the growing demand for analysis of the functional microbiome, metatranscriptome studies attract more interest. To make metatranscriptomic data sufficient for metataxonomics, new analytical workflows are needed to deal with sparse and taxonomically less informative sequencing data.Entities:
Keywords: Altered Schaedler flora; Assembly of shotgun reads; Metagenome; Metataxonomics; Metatranscriptome; Microbiome; RNAseq
Mesh:
Substances:
Year: 2017 PMID: 28103917 PMCID: PMC5244565 DOI: 10.1186/s40168-016-0219-5
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Comparison of the selected metataxonomics workflows on detection of genera within a set of simulated datasets (Table 1). IMSA and Kraken identify too many taxa. Both versions of MEGAN CE find too few taxa, most likely due to the weighted LCA that filters out noise, which also filters out weak signal of organisms present
Simulated datasets used for evaluating and optimizing the IMSA+A protocol
| Experiment | Parameters used to vary coverage | Other parameters controlled for this experiment | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Read length/coverage | Read length | Bacteria Coverage | Bacteria seq depthb | Bacteria gene selection | Bacteria speciesa | Fungi coverage | Fungi seq depthb | Fungi gene selection | Fungi speciesa | Virus coverage | Virus seq depthb | Virus gene selection | Virus strainsa | Human coverage | Human gene coverage | Human Seq depthb | Human reads after subtraction |
| 50 low | 50 | 0.25 | 1.0 | 100% | 30 | 1.1 | 2.4 | 50% | 15 | 16.2 | 0.1 | 100% | 10 | 14.19 | 100% | 66.5 | 0.078 |
| 50 med | 50 | 1.11 | 1.0 | 25% | 30 | 1.1 | 2.4 | 50% | 15 | 16.2 | 0.1 | 100% | 10 | 14.19 | 100% | 66.5 | 0.078 |
| 50 high | 50 | 4.44 | 4.0 | 25% | 30 | 1.1 | 2.4 | 50% | 15 | 16.2 | 0.1 | 100% | 10 | 14.19 | 100% | 66.5 | 0.078 |
| 100 low | 100 | 0.87 | 1.0 | 100% | 30 | 2.3 | 2.4 | 50% | 15 | 106.4 | 0.1 | 100% | 10 | 28.31 | 100% | 66.5 | 0.074 |
| 100 med | 100 | 2.22 | 1.0 | 25% | 30 | 2.3 | 2.4 | 50% | 15 | 106.4 | 0.1 | 100% | 10 | 28.31 | 100% | 66.5 | 0.074 |
| 100 high | 100 | 8.88 | 4.0 | 25% | 30 | 2.3 | 2.4 | 50% | 15 | 106.4 | 0.1 | 100% | 10 | 28.31 | 100% | 66.5 | 0.074 |
| 150 low | 150 | 1.30 | 1.0 | 100% | 30 | 3.4 | 2.4 | 50% | 15 | 159.5 | 0.1 | 100% | 10 | 42.46 | 100% | 66.5 | 0.054 |
| 150 med | 150 | 3.33 | 1.0 | 25% | 30 | 3.4 | 2.4 | 50% | 15 | 159.5 | 0.1 | 100% | 10 | 42.46 | 100% | 66.5 | 0.054 |
| 150 high | 150 | 13.33 | 4.0 | 25% | 30 | 3.4 | 2.4 | 50% | 15 | 159.5 | 0.1 | 100% | 10 | 42.46 | 100% | 66.5 | 0.054 |
aSimulated organisms were the same across experiments as an experimental control
bSequencing depth in millions
Fig. 2Example of processing alignments to generate reports. Alignment to a virus does not contribute to the species count, as there is no corresponding assignment in the taxonomy tree
Fig. 3Overview of the IMSA+A protocol
Average taxonomic classification performance by counting schemea
| Counting Scheme | Bacteria | Bacteria | Fungi | Fungi | Virus | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Species level | Genus level | Species level | Genus level | First taxon level | ||||||
| TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | |
| Unique count >0 | 0.77 ± 0.12 | 0.45 ± 0.20 | 0.84 ± 0.13 | 0.20 ± 0.19 | 0.88 ± 0.11 | 0.62 ± 0.26 | 0.92 ± 0.08 | 0.56 ± 0.26 | 0.97 ± 0.10 | 0.07 ± 0.09 |
| IMSA count >0 | 0.78 ± 0.11 | 0.79 ± 0.16 | 0.84 ± 0.12 | 0.58 ± 0.20 | 0.88 ± 0.11 | 0.70 ± 0.21 | 0.92 ± 0.08 | 0.64 ± 0.23 | 0.97 ± 0.10 | 0.14 ± 0.20 |
|
| 0.376 |
| 0.620 |
| 0.985 | 0.106 | 0.971 | 0.178 | 1.000 | 0.126 |
TPR and FDR are averaged across all 36 experiments (see Additional file 2: Table S1 and Additional file 3: Table S2 for details), statistically significant results highlighted in italics
Average classification performance by metagenome database used
| Database | Bacteria | Bacteria | Fungi | Fungi | Virus | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Species level | Genus level | Species level | Genus level | First taxon level | ||||||
| TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | |
| RefSeq | 0.76 ± 0.12 | 0.56 ± 0.20 | 0.83 ± 0.13 | 0.34 ± 0.19 | 0.78 ± 0.05 | 0.79 ± 0.19 | 0.89 ± 0.08 | 0.72 ± 0.23 | 0.95 ± 0.14 | 0.05 ± 0.07 |
| Custom | 0.78 ± 0.12 | 0.34 ± 0.12 | 0.84 ± 0.13 | 0.07 ± 0.07 | 0.98 ± 0.05 | 0.45 ± 0.21 | 0.96 ± 0.08 | 0.41 ± 0.19 | 0.99 ± 0.03 | 0.08 ± 0.10 |
|
|
|
| 0.507 |
|
|
|
|
| 0.353 | 0.378 |
TPR and FDR are averaged across 18 experiments each, statistically significant results highlighted in bold
Fig. 4The number of genera identified by IMSA+A using different read assemblers. TP and FP counts are averaged over the nine simulated datasets (Table 1). *Viral genera are counted using the first defined taxon count (see Methods for details)
Average classification performance by the assembler used
| Bacteria | Bacteria | Fungi | Fungi | Virus | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Species level | Genus level | Species level | Genus level | First taxon level | ||||||
| Assembler | TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR |
| Inchworm | 0.82 ± 0.02 | 0.40 ± 0.10 | 0.88 ± 0.03 | 0.10 ± 0.09 | 1.00 ± 0.00 | 0.56 ± 0.11 | 0.98 ± 0.03 | 0.52 ± 0.11 | 1.00 ± 0.00 | 0.13 ± 0.12 |
| Oases | 0.74 ± 0.17 | 0.28 ± 0.12 | 0.80 ± 0.17 | 0.05 ± 0.05 | 0.96 ± 0.07 | 0.33 ± 0.23 | 0.93 ± 0.10 | 0.30 ± 0.21 | 0.98 ± 0.04 | 0.03 ± 0.05 |
|
|
|
|
|
| 0.356 |
| 0.365 |
|
| 0.057 |
TPR and FDR are averaged across 9 experiments each, statistical significant results highlighted in italics
Measures of assembly characteristics by the assembler program
| Assembler | Read length | Number of contigs (thousands) | N50 contig length | Median contig length |
|---|---|---|---|---|
| Inchworm | 50 | 385.7 | 68 | 62 |
| Oases | 50 | 6.5 | 409 | 195 |
| Inchworm | 100 | 310.7 | 315 | 192 |
| Oases | 100 | 119.3 | 584 | 283 |
| Inchworm | 150 | 248.6 | 689 | 305 |
| Oases | 150 | 173.4 | 1047 | 501 |
Classification performance of simulated data set with variable gene and relative abundance by IMSA+A (Oases)
| Gene expression and relative abundance | Bacteria | Bacteria | Fungi | Fungi | Virus | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Species level | Genus level | Species level | Genus level | First taxon level | ||||||
| TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | TPR | FDR | |
| Fixeda | 0.74 ± 0.17 | 0.28 ± 0.12 | 0.80 ± 0.17 | 0.05 ± 0.05 | 0.96 ± 0.07 | 0.33 ± 0.23 | 0.93 ± 0.10 | 0.30 ± 0.21 | 0.98 ± 0.04 | 0.03 ± 0.05 |
| Variable | 0.77 | 0.33 | 0.87 | 0.04 | 1.00 | 0.12 | 1.00 | 0.06 | 0.90 | 0.18 |
aAverage of all previous simulated experiments
Fig. 5Genera identified by the sum of unique hit counts for all 12 samples. Genera known to be in the samples are highlighted with a green background. Groupings of the lowest common ancestors are shown using sections with dashed lines
Summary of Comparison of Various Tools on ASF data sample
| Method | Total genera detected | False positives | True positives | Correct next relativea |
|---|---|---|---|---|
| IMSA+A | 19 | 2 | 6 | 11 |
| MEGAN CE DIAMOND | 13 | 6 | 5 | 2 |
| MEGAN CE BLASTN | 15 | 9 | 5 | 1 |
| Kraken | 72 | 55 | 6 | 11 |
aThe number of genera representing organisms closely related to the ASF bacteria without sequenced genomes