| Literature DB >> 32637989 |
Connor Morgan-Lang1, Ryan McLaughlin1, Zachary Armstrong2, Grace Zhang3, Kevin Chan3, Steven J Hallam1,2,3,4,5.
Abstract
MOTIVATION: Microbial communities drive matter and energy transformations integral to global biogeochemical cycles, yet many taxonomic groups facilitating these processes remain poorly represented in biological sequence databases. Due to this missing information, taxonomic assignment of sequences from environmental genomes remains inaccurate.Entities:
Mesh:
Year: 2020 PMID: 32637989 PMCID: PMC7695126 DOI: 10.1093/bioinformatics/btaa588
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The workflow of the current study. Sequences for building reference packages were sourced from the NCBI and FunGene databases. Sequences were downloaded from EggNOG for validating reference packages and benchmarking TreeSAPP against GraftM. IMG/M metagenomes were used to explore the global diversity of Mcr
Fig. 2.Classification performance of TreeSAPP, GraftM and DIAMOND as evaluated by the MCC. TreeSAPP was run both with (TreeSAPP-BMGE) and without MSA trimming using BMGE. TreeSAPP-BMGE-Raw represents the classification performance of TreeSAPP with BMGE but without the linear-model-based rank recommendation. Distance from optimal rank is the accepted taxonomic distance in order for a classified sequence to be considered a true positive. Sequences that failed to meet the distance from optimal rank were included in the MCC calculation as false positives
Fig. 3.Average taxonomic distance across all taxa evaluated for 12 functional anchors. Colours correspond to the taxonomic rank evaluated by clade exclusion analysis and serve as a proxy for sequence divergence. Dashes along the y-axis show the distribution of points on a single plane. P_amoA is a reference package with sequences containing both PmoA and AmoA
Fig. 4.Phylogenetic and metabolic analysis of IMG metagenome-derived McrA sequences. (A) All predicted metagenome-derived McrA sequences (14 919) from IMG/M (as of January 10, 2017) were classified using TreeSAPP and visualized in iTOL. The tree shown here contains 228 reference McrA sequences including most newly described lineages from the “divergent McrA” clade hypothesized to be involved in oxidizing higher alkanes. A version of the tree with leaf labels is available as Supplementary Figure S12. (B) Proportion of sequences assigned at each taxonomic rank. (C) Putative methanogenesis and methanotrophic metabolisms supported in each ecosystem category as inferred by their placement on the reference McrA tree. Sequences that mapped deeply and converge across multiple annotated metabolisms were omitted