| Literature DB >> 31363751 |
Muhammad Zohaib Anwar1, Anders Lanzen2,3, Toke Bang-Andreasen1,4, Carsten Suhr Jacobsen1.
Abstract
BACKGROUND: Metatranscriptomics has been used widely for investigation and quantification of microbial communities' activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provides an understanding of the interactions between different major functional guilds and the environment. Here, we present a de novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure. Metatranscriptomics typically uses short sequence reads, which can either be directly aligned to external reference databases ("assembly-free approach") or first assembled into contigs before alignment ("assembly-based approach"). We also compare CoMW (assembly-based implementation) with an assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases.Entities:
Keywords: alignment; assembly; benchmarking; false-positive results; metatranscriptomics; precision; recall
Mesh:
Year: 2019 PMID: 31363751 PMCID: PMC6667343 DOI: 10.1093/gigascience/giz096
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Flow chart illustrating the evaluation and benchmarking scheme used for the comparison of alternative approaches. Red path indicates the full-length genes workflow, green indicates the steps in the assembly-based workflow CoMW, and blue indicates the steps in the assembly-free approach.
Comparison of precision, recall, F-score, and FDR for the assembly-free and the CoMW (assembly-based) approaches using all 3 databases based on best F-score
| Database | Approach | Threshold | Threshold category | Recall | Precision | F-score | FDR (%) |
|---|---|---|---|---|---|---|---|
| eggNOG | Assembly-free | BTS 120 | Strict (TH) |
| 0.9540 | 0.9707 | 4.5977 |
| CoMW | 1.00E−15 | Strict (TH) | 0.9851 |
|
|
| |
| CAZy | Assembly-free | BTS 110 | Strict (TH) | 0.3510 | 0.5325 | 0.4231 | 46.7433 |
| CoMW | 1.00E−08 | Medium (TM) |
|
|
|
| |
| NCycDB | Assembly-free | BTS 150 | Strict (TH) | 0.1666 | 0.0581 | 0.0862 | 94.1860 |
| CoMW | 1.00E−14 | Strict (TH) |
|
|
|
|
Full table for both approaches and databases can be seen in Tables S1–S3. Boldface emphasizes better precision, recall, F-score, and FDR in each database between both approaches.
Comparison of precision, recall, F-score, and FDR for the assembly-free and CoMW (assembly-based) approaches using the selective removal of functional subsystems from eggNOG database (segmented cross-validation) to evaluate the consistency of both approaches
| Removed subsystem | Approach | Recall | Precision | F-score | FDR (%) |
|---|---|---|---|---|---|
| Cell wall/membrane/envelope biogenesis [M] | Assembly-free | 0.8726 | 0.9580 | 0.9133 | 4.1958 |
| CoMW |
|
|
|
| |
| Replication, recombination, and repair [L] | Assembly-free | 0.8734 | 0.9588 | 0.9141 | 4.1166 |
| CoMW |
|
|
|
| |
| Amino acid transport and metabolism [E] | Assembly-free | 0.8750 | 0.9589 | 0.9150 | 4.1095 |
| CoMW |
|
|
|
| |
| General function prediction only and Function unknown [R], [S] | Assembly-free | 0.8933 | 0.9281 | 0.9104 | 7.1856 |
| CoMW |
|
|
|
|
Boldface emphasizes better consistency compared with full-length genes.
Figure 2:Differential expression comparison of the assembly-free and the CoMW assembly-based approaches using (A) eggNOG database, (B) CAZy, and (C) NCycDB database.
Figure 3:Relative abundance of eggNOG functional subsystems in Arctic permafrost soil identified and quantified using both CoMW and the assembly-free approach compares the differences in observed functional dynamics. Blue dotted line represents trends using CoMW (assembly-based) whereas red solid line represents the assembly-free approach.
Figure 4:Relative abundance of eggNOG functional subsystems in ash-deposited Danish forest soil with time identified using both the CoMW and an assembly-free approach. Blue dotted line represents trends using CoMW (assembly-based) whereas red solid line represents the assembly-free approach.