Wai Kit Ong1, Peter E Midford1, Peter D Karp1. 1. Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA.
Abstract
MOTIVATION: The increasing availability of annotated genome sequences enables construction of genome-scale metabolic networks, which are useful tools for studying organisms of interest. However, due to incomplete genome annotations, draft metabolic models contain gaps that must be filled in a time-consuming process before they are usable. Optimization-based algorithms that fill these gaps have been developed, however, gap-filling algorithms show significant error rates and often introduce incorrect reactions. RESULTS: Here, we present a new gap-filling method that computes the costs of candidate gap-filling reactions from a universal reaction database (MetaCyc) based on taxonomic information. When gap-filling a metabolic model for an organism M (such as Escherichia coli), the cost for reaction R is based on the frequency with which R occurs in other organisms within the phylum of M (in this case, Proteobacteria). The assumption behind this method is that different taxonomic groups are biased toward using different metabolic reactions. Evaluation of the new gap-filler on randomly degraded variants of the EcoCyc metabolic model for E.coli showed an increase in the average F1-score to 99.0 (when using the variable weights by frequency method at the phylum level), compared to 91.0 using the previous MetaFlux gap-filler and 80.3 using a basic gap-filler. Evaluation on two other microbial metabolic models showed similar improvements. AVAILABILITY AND IMPLEMENTATION: The Pathway Tools software (including MetaFlux) is free for academic use and is available at http://pathwaytools.com. Additional code for reproducing the results presented here is available at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The increasing availability of annotated genome sequences enables construction of genome-scale metabolic networks, which are useful tools for studying organisms of interest. However, due to incomplete genome annotations, draft metabolic models contain gaps that must be filled in a time-consuming process before they are usable. Optimization-based algorithms that fill these gaps have been developed, however, gap-filling algorithms show significant error rates and often introduce incorrect reactions. RESULTS: Here, we present a new gap-filling method that computes the costs of candidate gap-filling reactions from a universal reaction database (MetaCyc) based on taxonomic information. When gap-filling a metabolic model for an organism M (such as Escherichia coli), the cost for reaction R is based on the frequency with which R occurs in other organisms within the phylum of M (in this case, Proteobacteria). The assumption behind this method is that different taxonomic groups are biased toward using different metabolic reactions. Evaluation of the new gap-filler on randomly degraded variants of the EcoCyc metabolic model for E.coli showed an increase in the average F1-score to 99.0 (when using the variable weights by frequency method at the phylum level), compared to 91.0 using the previous MetaFlux gap-filler and 80.3 using a basic gap-filler. Evaluation on two other microbial metabolic models showed similar improvements. AVAILABILITY AND IMPLEMENTATION: The Pathway Tools software (including MetaFlux) is free for academic use and is available at http://pathwaytools.com. Additional code for reproducing the results presented here is available at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Peter D Karp; Mario Latendresse; Suzanne M Paley; Markus Krummenacker; Quang D Ong; Richard Billington; Anamika Kothari; Daniel Weaver; Thomas Lee; Pallavi Subhraveti; Aaron Spaulding; Carol Fulcher; Ingrid M Keseler; Ron Caspi Journal: Brief Bioinform Date: 2015-10-10 Impact factor: 11.622
Authors: Zachary A King; Justin Lu; Andreas Dräger; Philip Miller; Stephen Federowicz; Joshua A Lerman; Ali Ebrahim; Bernhard O Palsson; Nathan E Lewis Journal: Nucleic Acids Res Date: 2015-10-17 Impact factor: 16.971
Authors: Ron Caspi; Richard Billington; Carol A Fulcher; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Peter E Midford; Quang Ong; Wai Kit Ong; Suzanne Paley; Pallavi Subhraveti; Peter D Karp Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971