Pedro Queirós1, Francesco Delogu1, Oskar Hickl2, Patrick May2, Paul Wilmes1. 1. Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg. 2. Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg.
Abstract
BACKGROUND: The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. RESULTS: We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. CONCLUSIONS: Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.
BACKGROUND: The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. RESULTS: We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. CONCLUSIONS: Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.
Authors: Geoffrey D Hannigan; David Prihoda; Andrej Palicka; Jindrich Soukup; Ondrej Klempir; Lena Rampula; Jindrich Durcak; Michael Wurst; Jakub Kotowski; Dan Chang; Rurun Wang; Grazia Piizzi; Gergely Temesi; Daria J Hazuda; Christopher H Woelk; Danny A Bitton Journal: Nucleic Acids Res Date: 2019-10-10 Impact factor: 16.971
Authors: Edoardo Pasolli; Francesco Asnicar; Serena Manara; Moreno Zolfo; Nicolai Karcher; Federica Armanini; Francesco Beghini; Paolo Manghi; Adrian Tett; Paolo Ghensi; Maria Carmen Collado; Benjamin L Rice; Casey DuLong; Xochitl C Morgan; Christopher D Golden; Christopher Quince; Curtis Huttenhower; Nicola Segata Journal: Cell Date: 2019-01-17 Impact factor: 41.582
Authors: Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971
Authors: Martin Steinegger; Markus Meier; Milot Mirdita; Harald Vöhringer; Stephan J Haunsberger; Johannes Söding Journal: BMC Bioinformatics Date: 2019-09-14 Impact factor: 3.169
Authors: Tim Van Den Bossche; Benoit J Kunath; Kay Schallert; Stephanie S Schäpe; Paul E Abraham; Jean Armengaud; Magnus Ø Arntzen; Ariane Bassignani; Dirk Benndorf; Stephan Fuchs; Richard J Giannone; Timothy J Griffin; Live H Hagen; Rashi Halder; Céline Henry; Robert L Hettich; Robert Heyer; Pratik Jagtap; Nico Jehmlich; Marlene Jensen; Catherine Juste; Manuel Kleiner; Olivier Langella; Theresa Lehmann; Emma Leith; Patrick May; Bart Mesuere; Guylaine Miotello; Samantha L Peters; Olivier Pible; Pedro T Queiros; Udo Reichl; Bernhard Y Renard; Henning Schiebenhoefer; Alexander Sczyrba; Alessandro Tanca; Kathrin Trappe; Jean-Pierre Trezzi; Sergio Uzzau; Pieter Verschaffelt; Martin von Bergen; Paul Wilmes; Maximilian Wolf; Lennart Martens; Thilo Muth Journal: Nat Commun Date: 2021-12-15 Impact factor: 14.919