| Literature DB >> 30418610 |
Jaime Huerta-Cepas1,2, Damian Szklarczyk3, Davide Heller3, Ana Hernández-Plaza2, Sofia K Forslund1,4, Helen Cook5, Daniel R Mende6, Ivica Letunic7, Thomas Rattei8, Lars J Jensen5, Christian von Mering3, Peer Bork1,9,10,11.
Abstract
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.Entities:
Year: 2019 PMID: 30418610 PMCID: PMC6324079 DOI: 10.1093/nar/gky1085
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Taxonomic levels for which OGs have been independently computed based on (A) prokaryotic, (B) eukaryotic and (C) viral genomes. Names in blue indicate new taxonomic levels with respect to previous eggNOG versions. Numbers indicate the the amount of OGs per level (red), number of species covered (black) and functional annotation coverage (green).
Figure 2.Visualization of the phylogeny associated to the OG ENOG5048VVQ at the vertebrate level (A) extracted from the eggNOG website. Target orthologs were restricted to primates in the phylogenetic tree to facilitate exploration (B). Duplication nodes (in-paralogies) are labeled in red, and speciation events in blue (C). The functional profile of each orthologous sequence is shown in the presence/absence matrix (D). Functional differences can be noticed at both sides of the duplication event separating EPX from MPO sequences (E) in both GO Slim terms (red squares in matrix D) and KEGG Modules (blue squares in matrix D), while having similar domain architectures (F).