Carlos A Ruiz-Perez1, Roth E Conrad2, Konstantinos T Konstantinidis3,4,5,6. 1. School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332, USA. 2. Ocean Science and Engineering, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332, USA. 3. School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332, USA. kostas.konstantinidis@gatech.edu. 4. Ocean Science and Engineering, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332, USA. kostas.konstantinidis@gatech.edu. 5. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA. kostas.konstantinidis@gatech.edu. 6. Center for Bioinformatics and Computational Genomics, Georgia Institute of Technology, Atlanta, GA, 30332, USA. kostas.konstantinidis@gatech.edu.
Abstract
BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user. RESULTS: Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator . CONCLUSIONS: We demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.
BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user. RESULTS: Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator . CONCLUSIONS: We demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.
Entities:
Keywords:
Comparative genomics; Genome annotation; Metabolic potential; Protein annotation
Authors: E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971
Authors: Despina Tsementzi; Jieying Wu; Samuel Deutsch; Sangeeta Nath; Luis M Rodriguez-R; Andrew S Burns; Piyush Ranjan; Neha Sarode; Rex R Malmstrom; Cory C Padilla; Benjamin K Stone; Laura A Bristow; Morten Larsen; Jennifer B Glass; Bo Thamdrup; Tanja Woyke; Konstantinos T Konstantinidis; Frank J Stewart Journal: Nature Date: 2016-08-03 Impact factor: 49.962
Authors: Daniel H Haft; Michael DiCuccio; Azat Badretdin; Vyacheslav Brover; Vyacheslav Chetvernin; Kathleen O'Neill; Wenjun Li; Farideh Chitsaz; Myra K Derbyshire; Noreen R Gonzales; Marc Gwadz; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Roxanne A Yamashita; Chanjuan Zheng; Françoise Thibaud-Nissen; Lewis Y Geer; Aron Marchler-Bauer; Kim D Pruitt Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971
Authors: Tiago P Ramalho; Guillaume Chopin; Olga M Pérez-Carrascal; Nicolas Tromas; Cyprien Verseux Journal: Appl Environ Microbiol Date: 2022-07-12 Impact factor: 5.005
Authors: Irina S Kulichevskaya; Anastasia A Ivanova; Nataliya E Suzina; Jaap S Sinninghe Damsté; Svetlana N Dedysh Journal: Antonie Van Leeuwenhoek Date: 2022-08-14 Impact factor: 2.158
Authors: Anna Vlasova; Toni Hermoso Pulido; Francisco Camara; Julia Ponomarenko; Roderic Guigó Journal: Genes (Basel) Date: 2021-10-19 Impact factor: 4.096
Authors: Johanne Vad; Laura Duran Suja; Stephen Summers; Theodore B Henry; J Murray Roberts Journal: Front Microbiol Date: 2022-07-15 Impact factor: 6.064