| Literature DB >> 24297252 |
Sean Powell1, Kristoffer Forslund, Damian Szklarczyk, Kalliopi Trachana, Alexander Roth, Jaime Huerta-Cepas, Toni Gabaldón, Thomas Rattei, Chris Creevey, Michael Kuhn, Lars J Jensen, Christian von Mering, Peer Bork.
Abstract
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24297252 PMCID: PMC3964997 DOI: 10.1093/nar/gkt1253
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Taxonomic levels for which orthologous groups are provided, with functional annotation coverage displayed. This tree shows the levels of the Tree of Life for which eggNOG v4 provides orthologous groups. For internal nodes, the size of the orange circle increases with the number of species in the core/periphery set, which falls under this taxonomic level, respectively. Blue dot markers or circles denote the 67 of 107 taxonomic levels that are new to eggNOG v4 over eggNOG v3. The bar charts displayed at the edge show what fraction of orthologous groups have meaningful free-text descriptions or COG/KOG/arCOG functional categories assigned, respectively.
Figure 2.Benchmarking and comparing eggNOGv4 and eggNOGv3. (A) The performance of eggNOG database was evaluated at two levels: gene (identifying false and missing assignments) and group (identifying fusions and fissions) level using the Reference Orthologous Groups (RefOGs). Initially, we mapped the reference orthologs to the bilaterian-specific orthologous groups (biNOGs). We score eggNOG performance using (i) all orthologous groups (‘All OGs’) to identify the number of fissions and fusions for every RefOG and (ii) the orthologous group with the larger overlap with RefOG (‘Single OG’, i.e. OG1). Then, we calculated how many genes were predicted accurately (true assignments, TA, black box), how many genes were not predicted as orthologs (missing assignments, MA, striped white box) and how many genes were erroneous orthology predictions (false assignments, FA, white box). Depending on whether the user wants to evaluate the database on a ‘Single OG’ or ‘All OGs’ manner, it will change the numbers of true, missing and false assignments. (B) Comparison of the two most recent eggNOG versions (v3 and v4) in terms of %RefOG coverage (number of true assignments per total number of reference orthologs). Venn diagram shows the species number between the two database releases; there are 47 overlapping species that included the 12 animals that are used in the benchmarking data set. (C) Comparison of eggNOGv3 and eggNOGv4 at the gene level (false and missing assignments). The larger bars indicate a larger number of errors. (D) Comparison of eggNOGv3 and eggNOGv4 at the group level (fusion and fission events). The larger bars indicate a larger number of errors.
Figure 3.Web site screenshots. The navigation tool has been improved to help users find relevant orthologous groups in a simple and intuitive way. The added insight of related groups is displayed inline with the use of chord diagrams. The thickness of the link (chord) between the groups represents the amount of proteins mapped between two orthologous group. The tooltips on the outer edge and chords display the amount of proteins mapped from a group and between groups, respectively.