| Literature DB >> 19900971 |
J Muller1, D Szklarczyk, P Julien, I Letunic, A Roth, M Kuhn, S Powell, C von Mering, T Doerks, L J Jensen, P Bork.
Abstract
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224,847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2,242 035 proteins (built from 2,590,259 proteins) and provides a broad functional description for at least 1,966,709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.Entities:
Mesh:
Year: 2009 PMID: 19900971 PMCID: PMC2808932 DOI: 10.1093/nar/gkp951
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Annotation statistics at different taxonomic levels
| Level | OG count | Description line | Functional categories | ||
|---|---|---|---|---|---|
| Annotated | (%) | Annotated | (%) | ||
| COG + NOG | 64 370 | 4474 + 14 956 | 30.2 | 2824 + 6262 | 14.1 |
| arNOG | 9809 | 4144 | 42.2 | 4540 | 46.3 |
| KOG + euNOG | 22 695 | 4288 + 7566 | 52.2 | 3514 + 4120 | 33.6 |
| fuNOG | 9976 | 5661 | 56.7 | 5775 | 57.9 |
| meNOG | 22 691 | 16 636 | 73.3 | 13 490 | 59.5 |
| inNOG | 8049 | 5034 | 62.5 | 5810 | 72.2 |
| veNOG | 21 357 | 16 722 | 78.3 | 13 291 | 62.2 |
| fiNOG | 13 674 | 8903 | 65.1 | 9580 | 70.1 |
| maNOG | 20 222 | 16 959 | 83.9 | 13 075 | 64.7 |
| roNOG | 14 038 | 11 918 | 84.9 | 10 547 | 75.1 |
| prNOG | 17 966 | 14 773 | 82.2 | 13 124 | 73.0 |
At the levels for COGs (universal) and KOGs (eukaryotes) the additional automatically generated non-supervised orthologous groups NOGs and euNOGs, respectively, are separated.
Figure 1.Statistics on the content of the eggNOG database. The eggNOG assignments for 630 complete genomes were mapped onto the tree of life. The stacked bar charts outside the tree show the proportion of genes from each genome that can be assigned to a functionally annotated orthologous group (green), an unannotated orthologous group (orange) or no orthologous group (gray). The length of each bar is proportional to the logarithm of the number of genes in the respective genome. The pie charts inside the tree show the fractions of orthologous groups at each level in the hierarchy that could be annotated with a functional category (green for NOGs, light green for extended COGs and KOGs) or not (orange for NOGs, light orange for extended COGs and KOGs). An interactive version is available in the ‘Overview’ section at: http://eggnog.embl.de. This figure was made using iTOL.
Figure 2.Screenshot of the detailed results page. The eggNOG database was queried for the term ‘mTERF’, the mitochondrial precursor of the transcription termination factor 1. The navigation tree at the top of the page allows the user to change the view to more coarse-grained orthologous groups, for example, the mammalian orthologous groups. The tab menu, shown here, enables several in-depth interactions with the new data (i.e. MSA or phylogenetic trees, here displayed with SMART domains).