| Literature DB >> 35422973 |
Matteo Palù1, Arianna Basile1, Guido Zampieri1, Laura Treu1, Alessandro Rossi1, Maria Silvia Morlino1, Stefano Campanaro1,2.
Abstract
Background: The rapid accumulation of sequencing data from metagenomic studies is enabling the generation of huge collections of microbial genomes, with new challenges for mapping their functional potential. In particular, metagenome-assembled genomes are typically incomplete and harbor partial gene sequences that can limit their annotation from traditional tools. New scalable solutions are thus needed to facilitate the evaluation of functional potential in microbial genomes.Entities:
Keywords: Gene annotation; Genome-scale metabolic model; Hidden Markov model; Metabolic pathway; Microbial genome
Year: 2022 PMID: 35422973 PMCID: PMC8976094 DOI: 10.1016/j.csbj.2022.03.015
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Workflow of KEMET reporting the input files, outputs, and main parameters for all the tasks that can be executed: KEGG Module evaluation, identification of missing KOs, and integration of identified KOs in GSMMs. On the right side, the rationale of each task is visually outlined.
Fig. 2Results of KEMET quality tests. (A) Comparison between KEMET and METABOLIC in terms of KEGG Module block structure with respect to the original KEGG Modules obtained through KEGG Mapper. The plot shows the intersections among the Module datasets for the three tools, together with the total number of Modules evaluated by each of them. (B) True positive rate for gene sequence identification by HMMs. Results for both isolated genomes (red) and MAGs (blue) are reported. Gene deletions of different extents were performed prior to running KEMET. When deletions were performed, gene annotation recovery was evaluated both with the gene prediction resulting from the original sequences and from those truncated, in order to account for the impact of deletions on gene prediction. (C) Fraction of correct metabolic phenotypes predicted by GSMMs reconstructed from microbial MAGs (green), the same MAGs with an expanded annotation through KEMET (orange), and the corresponding genomes from isolates (purple), based on the literature. The lines track the performance of individual GSMMs corresponding to the same strain. For readability purposes, only lines between points having performance differences across the datasets were drawn. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)