| Literature DB >> 12914651 |
Kira S Makarova1, Eugene V Koonin.
Abstract
Archaea comprise one of the three distinct domains of life (with bacteria and eukaryotes). With 16 complete archaeal genomes sequenced to date, comparative genomics has revealed a conserved core of 313 genes that are represented in all sequenced archaeal genomes, plus a variable 'shell' that is prone to lineage-specific gene loss and horizontal gene exchange. The majority of archaeal genes have not been experimentally characterized, but novel functional pathways have been predicted.Entities:
Mesh:
Year: 2003 PMID: 12914651 PMCID: PMC193635 DOI: 10.1186/gb-2003-4-8-115
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Completely sequenced archaeal genomes
| Species | Abbreviation | Optimal growth temperature (°C) | Lifestyle and other features | Number of proteins* | Number (%) proteins in COGs | Date of genome release | Reference |
| 83 | Anaerobic, sulfate-reducing chemolito- or chemorgano-autotroph, motile | 2,420 | 1,953 (81%) | 1997 | [124] | ||
| 37 | Aerobic chemorganotroph, obligate halophile, with a cell envelope; motile; two extrachromosomal elements | 2,622 | 1,809 (69%) | 2000 | [125] | ||
| 85 | Chemolitoautotroph, strict anaerobe, methanogen, motile; two extrachromosomal elements | 1,758 | 1,448 (82%) | 1996 | [27] | ||
| 110 | Chemolitoautotroph, strict anaerobe, methanogen, with high cellular salt concentration | 1,691 | 1,253 (74%) | 2002 | [45] | ||
| 37 | Chemolitoautotroph, anaerobe possibly capable of aerobic growth; nitrogen-fixing, versatile methanogen; motile, and able to form multicellular structures | 4,540 | 3,142 (69%) | 2002 | [55] | ||
| 37 | As for | 3,371 | N/A | 2002 | [54] | ||
| 65 | Chemolitoautotroph, strict anaerobe, nitrogen-fixing, methanogen | 1,873 | 1,500 (80%) | 1997 | [126] | ||
| 96 | Anaerobic heterotroph, sulfur enhances growth; motile | 1,801 | 1,425 (79%) | 1998 | [127] | ||
| 96 | As for | 1,769 | 1,506 (85%) | 2001 | [128] | ||
| 96 | As for | 2,065 | N/A | 2001 | [129] | ||
| 59 | Facultative anaerobe, chemorganotroph, thermoacidophilic, anaerobically able to metabolize sulfur; motile, with a plasma membrane | 1,482 | 1,261 (85%) | 2000 | [96] | ||
| 60 | As for | 1,499 | 1,277 (85%) | 2000 | [130] | ||
| 100 | Facultative nitrate-reducing anaerobe | 1,840 | 1,236 (67%) | 2002 | [131] | ||
| 90 | Aerobic chemorganotroph; sulfur enhances growth | 2,605 | 1,529 (59%) | 1999 | [132] | ||
| 80 | Aerobe metabolizing sulfur; thermo-acidophilic chemorganotroph; motile | 2,977 | 2,207 (74%) | 2001 | [97] | ||
| 80 | As for | 2,826 | N/A | 2001 | [133] |
*According to the original genome annotation
The top 15 phyletic patterns in proteins from archaea
|
Pattern*
| Number of COGs (and of the complementary pattern, CP) | Comments and examples |
| Archaeal core, including 200 COGs present in both | ||
| This pattern reflects a large number of genes acquired via HGT† in | ||
| This pattern reflects a substantial amount of HGT in | ||
| This pattern consists of COGs including four methanogens and | ||
| This pattern is specific for four methanogens, including unique pathways for coenzyme M biosynthesis and reduction and 14 uncharacterized proteins, many of which are likely to be unique enzymes involved in biosynthesis of other specific coenzymes and their utilization | ||
| A pattern specific for thermophilic methanogens ( | ||
| This pattern reflects a substantial amount of HGT in | ||
| This reflects a substantial amount of HGT in | ||
| A pattern specific for two mesophilic archaea, probably resulting from independent HGT | ||
| This pattern includes genes that might have been acquired via HGT in | ||
| A crenarchaea-specific pattern, including 11 COGs that do not have orthologs outside this lineage. Among genes shared with bacteria but not euryarchaeota are three subunits of aerobic-type CO dehydrogenase and CO dehydrogenase maturation factor. Genes specifically shared with eukaryotes are three ribosomal proteins (S30, S25 and L13E) | ||
| Apparent independent HGT to | ||
| Apparent specific gene loss in the | ||
| Apparent gene loss in | ||
| Apparent HGT in |
*The pattern of appearance within the 13 sequenced archaeal species currently available in the COG database. Species abbreviations are as given in Table 1 and are written vertically. †Abbreviations: A, archaea; B, bacteria; E, eukaryotes; CP, complementary pattern; HGT, horizontal gene transfer.
Figure 1The archaeal gene core: changes resulting from the appearance of new genome sequences. Black bars indicate the current set of pan-archaeal genes (313 COGs); gray indicates COGs that are not part of the current pan-archaeal core but are seen to be conserved after the addition of the given genome sequence. The genomes are listed from left to right in chronological order of release of the complete sequence; species name abbreviations are as in Table 1.
Figure 2Functional breakdown of genes within the conserved archaeal core. 'Universal' indicates genes with orthologs in both bacteria and eukaryotes; 'eukaryotic', genes with orthologs only in eukaryotes; 'bacterial', genes with orthologs only in bacteria; 'archaeal', genes without non-archaeal orthologs. The data on orthology and functional classification are derived from the COGs.
Figure 3The most parsimonious scenario for the evolution of the main lineages of life. The red numbers in ovals near the internal nodes show the size of the reconstructed gene sets of the respective ancestral forms. Green numbers show gene gains and brown numbers gene losses assigned to each of the branches in the tree. LUCA, last universal common ancestor.
Figure 4Functional breakdown of genes in each of the sequenced archaeal genomes. The data are from COGs; species name abbreviations are as in Table 1.
Examples of computational and experimental discovery of unexpected functions in archaea.
| COG numbers [37,38] | Function and comments | References |
| 0012, 1325, 1603, 1369, 0638, 1500, 1097, 689, 2123, 1996, 2136, 2892, 0618, 1782, 1096, 3286, 1761 and more | Archaeal exosome. Orthologs of eukaryotic exosome subunits form the largest conserved superoperon in archaea, after the ribosomal superoperon, suggesting the existence of a physical complex | [88] |
| 1769, 1336, 3337, 1583, 1367, 1604, 1517, 1857, 1688, 1203, 1468, 1518, 2254, 1343, 1353, 1421, 1337, 1567, 1332, 4343 | DNA repair system represented primarily in thermophiles | [59] |
| 0358 | Bacterial-type DNA primase (DnaG orthologs) | [24] |
| 1311 | Small subunit of euryarchaeal DNA polymerase II, predicted PHP family phosphohydrolase (probably phosphatase); eukaryotic homologs appear to be inactivated | [123] |
| 1833 | Uri superfamily endonuclease. | [136] |
| 1628 | Endonuclease V homologs. | K.S.M. and E.V.K., unpublished observations |
| 1679,1786 | Aconitase catalytic core and an interacting 'swiveling domain' | K.S.M. and E.V.K., unpublished observations |
| 1711 | Possible subunit of the DNA replication machinery | K.S.M. and E.V.K., unpublished observations |
| 1310 | Zn2+-dependent hydrolase homologous to the eukaryotic ubiquitin isopeptidase contained in the proteasome and COP9 signalosome | [137,138] |
| 1708 | 'Minimal' nucleotidyltransferases | [100,139] |
| 1830 | Fructose-1,6-bisphosphate aldolases (DhnA family) | [76,77] |
| 1351 | Thymidylate synthase | [61,64] |
| 1685 | Shikimate kinase (predicted on the basis of operon organization) | [140] |
| 3635 | Phosphoglycerate mutase | [24,141] |
| 1384 | Class I lysyl-tRNA synthetase | [62] |
| 1933 | DNA polymerase II | [104] |
| 1980 | Fructose 1,6-bisphosphatase | [142] |
| 1630 | NurA, a novel 5'-3' nuclease encoded next to Rad50 and Mre11 orthologs; present in all sequenced archaeal genomes and some bacteria | [143] and K.S.M. and E.V.K., unpublished observations |
| 1812 | [144] | |
| 1591 | Holliday junction resolvase | [101] |
| 1581 | Alba, a major DNA-binding chromatin protein in Crenarchaeota | [106] |
| 1945 | Pyruvoyl-dependent arginine decarboxylase (PvlArgDC), involved in polyamine biosynthesis | [145] |
Figure 5Prediction of gene functions in archaea by genomic context analysis. (a) The superoperon coding for the predicted archaeal exosome (see [88]). (b) The partially conserved gene neighborhood coding for the predicted repair system found in archaeal and bacterial thermophiles (see [59] for details). (c-e) Predicted operons containing uncharacterized genes in the neighborhood of genes from the following COGs: COG1594, DNA-directed RNA polymerase, subunit M, and transcription elongation factor TFIIS (RPB9); COG0592, encoding a DNA polymerase sliding clamp subunit (PCNA ortholog); COG1631, ribosomal protein L44E; COG1095, DNA-directed RNA polymerase, subunit E' (RPB7); COG2093, DNA-directed RNA polymerase, subunit E" (RPE2); COG2004, ribosomal protein S24E; COG1709, transcriptional regulator; COG3425, 3-hydroxy-3-methylglutaryl CoA synthase (PksG); COG0183, acetyl-CoA acetyltransferase (Fad A/PaaJ orthologs). UC, uncharacterized, shown by white arrows. Species abbreviations are as in Table 1. Genes are shown not to scale and are denoted by their respective genes names (some are discussed further in the text); arrows indicate the direction of transcription. A solid line connects genes in a predicted operon. Species that have the same operon organization as the listed species are indicated in parentheses. Orthologous genes are aligned. Genes with similar general functions are shown by the same shading. Broken lines show that genes are in the same predicted operon but are not adjacent. Small arrows indicate the presence of additional functionally related genes in the same predicted operon; these genes are not shown for lack of space.
Figure 6Lineage-specific expansions of paralogous gene families in archaea. The vertical axis shows the number of members of the indicated COGs. (a) COG0477, permeases of the major facilitator superfamily; COG0531, amino-acid transporters. (b) COG1145, ferredoxin. (c) COG2101, TATA-box binding protein (TBP), a component of transcription initiation factors TFIID and TFIIIB; COG1405, Brf1 subunit of transcription-initiation factor TFIIIB and transcription-initiation factor TFIIB. (d) COG1708, 'minimal' nucleotidyltransferase catalytic subunit; COG2250, 'minimal' nucleotidyltransferase accessory subunit. Species abbreviations are as in Table 1.