| Literature DB >> 34597405 |
Carlos P Cantalapiedra1, Ana Hernández-Plaza1, Ivica Letunic2, Peer Bork3,4,5, Jaime Huerta-Cepas1.
Abstract
Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: 1) de novo gene prediction from raw contigs, 2) built-in pairwise orthology prediction, 3) fast protein domain discovery, and 4) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.Entities:
Keywords: bioinformatics; computational genomics; functional annotation; metagenomics
Mesh:
Year: 2021 PMID: 34597405 PMCID: PMC8662613 DOI: 10.1093/molbev/msab293
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1Workflow and new features of eggNOG-mapper v2. (A) The gene prediction stage uses Prodigal to perform protein prediction from assembled contigs. (B) During the search stage, HMMER3, Diamond, or MMseqs2 can be used to align the input proteins to eggNOG v5. (C) During the orthology inference stage, a report of orthologs is generated based on the desired taxonomic scope. (D) Finally, protein annotations and domains are transferred from orthologs to the queries and reported as tabular and GFF files.
Fig. 2Performance of eggNOG-mapper v2. (A) average minutes to annotate input proteomes. EggNOG-mapper v2 (blue) against eggNOG-mapper v1 (red). (B) average minutes to annotate input genomes. EggNOG-mapper v2 (blue) against Prokka (green). (C) average minutes (in log scale) to annotate input proteins. MMseqs2 (-s 2,4,6; black) against Diamond (iterate/sensitive mode; orange). (D) Specificity (Sp), recall (Re), and F1 score, of PFAM domain annotation either from direct transference from orthologs, or after realignment. Full de novo realignment results were used as reference. (E) average minutes for PFAM domain annotation, using either PFAM full de novo (brown) or realign to orthologs domains (blue) modes. Benchmark setup: tests in (A) and (B) were done on 20 sets of 1–100 random proteomes (A) or genomes (B) from (Almeida et al. 2021), and executed using 10 CPUs and 80 GB of RAM. Tests in (C) were done on 35 random sets of 10–10,000,000 proteins from Progenomes v2 (Mende et al. 2020), using 30 CPUs and 240 GB of RAM. Tests in (Dm) and (E) as in (C), only for sets of 10–100,000 proteins.