| Literature DB >> 17394648 |
Abstract
BACKGROUND: The Archaea are highly diverse in terms of their physiology, metabolism and ecology. Presently, very few molecular characteristics are known that are uniquely shared by either all archaea or the different main groups within archaea. The evolutionary relationships among different groups within the Euryarchaeota branch are also not clearly understood.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17394648 PMCID: PMC1852104 DOI: 10.1186/1471-2164-8-86
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Genome sizes, protein numbers and GC content of sequenced archaeal strains.
| Strain Name | Order | Temperature Range | Genome Size (Mb) | GC content (%) | Protein Number | |
| H | 2.22 | 52 | 2,605 | |||
| H | 1.67 | 67 | 1,841 | |||
| A | 2.23 | 36.7 | 2,223 | |||
| A | 2.99 | 35.8 | 2,977 | |||
| A | 2.69 | 32.8 | 2,825 | |||
| H | 2.09 | 52 | 2,306 | |||
| H | 1.77 | 42 | 1,898 | |||
| H | 1.74 | 42 | 1,955 | |||
| H | 1.91 | 42 | 2,125 | |||
| H | 1.69 | 60 | 1,687 | |||
| T | 1.75 | 49.5 | 1,873 | |||
| M | 1.77 | 27.6 | 1,534 | |||
| M | 1.66 | 33.1 | 1,722 | |||
| H | 1.74 | 31.3 | 1,786 | |||
| M | 3.54 | 45.2 | 3,139 | |||
| T | 1.9 | 53.5 | 1,696 | |||
| M | 2.58 | 45.8 | 2,273 | |||
| M | 5.75 | 42.7 | 4,540 | |||
| M | 4.10 | 41.5 | 3,370 | |||
| M | 4.87 | 39.2 | 3,624 | |||
| H | 2.18 | 46 | 2,420 | |||
| M | 2.57 | 65.9 | 2,622 | |||
| M | 4.27 | 61.1 | 4,240 | |||
| M | 3.18 | 47.9 | 2,646 | |||
| M | 2.75 | 63.1 | 2,822 | |||
| Thermoplasmatales | A | 1.55 | 36 | 1,535 | ||
| A | 1.56 | 50 | 1,482 | |||
| A | 1.58 | 50 | 1,499 | |||
| N/A | H | 0.49 | 31.6 | 536 | ||
Abbreviations for temperature range: H-hyperthermophilic; T-thermophilic; M-mesophilic; A-thermoacidophilic. * is strain M. thermautotrophicus str. Delta H.
Figure 1A neighbour-joining distance tree based on a concatenated sequence alignment for 31 widely distributed proteins. The numbers on the nodes indicate bootstrap scores observed in NJ/ML/MP analyses. The species shaded in yellow were selected as the query genomes for blast searches.
Proteins that are specific for all Archaea
| PAB0063 | [NP_125796] | Cca | COG1746 | PAB0247 | [NP_126062] | DNA binding | COG1571 |
| PAB0252 | [NP_126069] | RNA-binding | CDD16214 | PAB0439 | [NP_126328] | COG1308 | |
| PAB0316 | [NP_126166] | DNA primase | COG0358 | PAB0475 | [NP_126376] | regulator | COG1709 |
| PAB1633 | [NP_126790] | PilT ATPase | COG1855 | PAB1040 | [NP_127251] | SpoU | CDD6631 |
| PAB1716 | [NP_126666] | NMD3 | CDD16276 | PAB1106 | [NP_127361] | CDD9578 | |
| PAB2291 | [NP_125771] | CDD6629 | PAB1706 | [NP_126677] | COG1634 | ||
| PAB0018a | [NP_125721] | RNA binding | COG2888 | PAB2062 | [NP_126118] | CDD16190 | |
| PAB00751 | [NP_125817] | dehydratase | CDD23288 | PAB2104 | [NP_126058] | HTH | COG1395 |
| PAB0301 | [NP_126142] | SK | COG1685 | PAB7388 | [NP_127197] | Ribosomal_LX | CDD2437 |
| PAB0654 | [NP_126650] | CDD8168 | PAB0469.1n | [NP_877631] | CDD8674 | ||
| PAB0950 | [NP_127106] | TFIIE | CDD480 | PAB0547 | [NP_126484] | COG1759 | |
| PAB1112 | [NP_127373] | CDD5727 | PAB0552 | [NP_126501] | Hjr | CDD29957 | |
| PAB1135 | [NP_127406] | CDD8168 | PAB0623 | [NP_126611] | CDD9586 | ||
| PAB1241 | [NP_127355] | CDD9682 | PAB1272 | [NP_127310] | COG1759 | ||
| PAB1387 | [NP_127161] | flaJ | COG1955 | PAB1429 | [NP_127105] | COG2433 | |
| PAB1715 | [NP_126667] | CDD9801 | PAB1721 | [NP_126657] | COG2248 | ||
| PAB1906 | [NP_126377] | CDD2531 | PAB23422 | [NP_125707] | CDD15774 | ||
| PAB7094 | [NP_126085] | Alba | CDD25844 | PAB7309 | [NP_126897] | CDD2523 | |
These proteins were identified by BLASTP searches and their specificity is further confirmed by PSI-BLAST searches. For details, see method section. The protein ID number starting with PAB represents query protein from the genome of P. abyssi GE5, which was used as probe to perform the blast search. Accession numbers for these proteins are shown in square brackets. The possible cellular functions and COG or CDD number of some proteins are noted. For other proteins, the cellular functions are not known.
Note . Two low-scoring homologs to PAB0075 were also found in Dehalococcoides ethenogenes 195 (Chloroflexi) and Dehalococcoides sp. CBDB1.
Note . A homolog to PAB2342 is also found in Oenococcus oeni PSU-1, Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 and Clostridium perfringens str. 13.
Proteins that are specific for Crenarchaeota
| APES019 | [NP_147243] | ribonuclease p3 | APE12412 | [NP_147816] | COG4343 |
| APE0488 | [NP_147273] | COG4914 | APE1561 | [NP_148025] | COG4900 |
| APE0503 | [NP_147284] | COG4755 | APE16273 | [NP_148064] | CDD26669 |
| APE05051 | [NP_147285] | CDD26165 | APE1644 | [BAA80645] | |
| APE0623 | [NP_147373] | COG4888 | APE1701 | [NP_148108] | COG5494 |
| APE0975 | [NP_147640] | COG4879 | |||
| APE0143 | [NP_146996] | COG5491 | APE1848 | [NP_148210] | COG1259 |
| APE0145 | [NP_146997] | APE1936 | [BAA80945] | ||
| APE0168 | [NP_147017] | APE1966 | [NP_148294] | ||
| APE0238 | [NP_147072] | APE1996 | [NP_148313] | ||
| APE0429 | [NP_147222] | APE2102 | [NP_148384] | ||
| APE0663 | [NP_147399] | COG5431 | APE2195 | [NP_148451] | COG2083 |
| APE0902 | [NP_147588] | APE2325 | [NP_148539] | ||
| APE1113 | [NP_147720] | APE2340 | [NP_148552] | ||
| APE1364 | [NP_147897] | APE2435 | [NP_148607] | COG4920 | |
| APE1626 | [NP_148063] | APE2454 | [BAA81469] | ||
| APE1817 | [NP_148186] | COG5399 | APE2463 | [NP_148628] | |
| APE0106 | [NP_146969] | APE1230 | [NP_147806] | ||
| APE0730 | [NP_147451] | APE1236 | [NP_147812] | ||
| APE0874 | [NP_147564] | APE2409 | [NP_148589] | ||
| APE1194 | [NP_147776] | COG5625 | APE2602 | [NP_148718] | |
| APE1228 | [NP_147804] | ||||
| Saci_0004 | [YP_254727] | Saci_1129 | [YP_255774] | ||
| Saci_0005 | [YP_254728] | Saci_1813 | [YP_256412] | COG4113 | |
| Saci_0035 | [YP_254758] | Saci_1883 | [YP_256481] | = Saci_1813 | |
| Saci_0223 | [YP_254935] | CDD46009 | Saci_2070 | [YP_256657] | |
| Saci_0224 | [YP_254936] | = Saci_0223 | Saci_2080 | [YP_256667] | = Saci_1813 |
| Saci_0660 | [YP_255337] | Saci_2195 | [YP_256774] | = Saci_0223 | |
| Saci_0857 | [YP_255517] | Saci_2357 | [YP_256931] | = Saci_0223 | |
The protein ID number starting with APE and Saci represents query protein from the genome of A. pernix K1 and S. acidocaldarius DSM 639. "=" means paralogous genes.
Note . A low scoring homolog to APE0505 is also found in Ferroplasma acidarmanus Fer1. Note . A low scoring homolog to APE1241 is also found in Archaeoglobus fulgidus DSM 4304.
Note . A low scoring homolog to APE1627 is also found in Aquifex aeolicus VF5.
Proteins that are specific for Euryarchaeota
| PAB0082 | [NP_125825] | Tgt COG1549 | PAB2435 | [NP_126297] | CDD25834 |
| MMP0243* | [NP_987363] | CDD9595 | P AB0315 | [NP_126165] | CDD29150 |
| PAB1089 | [NP_127334] | COG2150 | Ta0062* | [NP_393541] | CDD26662 |
| PAB24041 | [NP_125813] | Pol II COG1933 | |||
| PAB0161 | [NP_125931] | COG1326 | PAB1338 | [NP_127222] | CDD9842 |
| PAB0172 | [NP_125944] | ATPase COG2117 | PAB1517 | [NP_126975] | COG1356 |
| PAB01881 | [NP_125970] | CDD8172 | PAB1804 | [NP_126517] | CDD15772 |
| PAB0951 | [NP_127107] | COG4044 | PAB2224 | [NP_125887] | CDD5728 |
| PAB10552 | [NP_127280] | COG4743 | VNG1263c* | [AAG19620] | CDD2419 |
| PAB1284 | [NP_127297] | RecJ COG1107 | VNG2408c* | [AAG20496] | COG3365 |
| MMP1287* | [NP_988407] | CDD2419 | |||
The protein ID number starting with MMP, Ta and VNG represents query protein from the genome of M. maripaludis S2, T. acidophilum and Halobacterium sp. NRC-1. * means protein is missing in the genomes of 4 Thermococci species.
Note . Homologs to PAB2404 and PAB0188 are also found in Nanoarchaeum equitans Kin4-M.
Note . Homolog to PAB1055 is also found in Dehalococcoides sp. CBDB1 and D. ethenogenes 195.
Proteins that are specific for methanogens (Methanoarchaeota)
| MMP0001 | [NP_987121] | COG4014 | MMP1346 | [NP_988466] | MtrX | COG4002 |
| MMP00215 | [NP_987141] | COG4079 | MMP1555 | [NP_988675] | MCR_B | CDD25889 |
| MMP0143 | [NP_987263] | COG4069 | MMP1556 | [NP_988676] | MCR_D | CDD3015 |
| MMP0154 | [NP_987274] | COG4070 | MMP1557 | [NP_988677] | MCR_C | CDD15906 |
| MMP03115 | [NP_987431] | COG4048 | MMP1558 | [NP_988678] | MCR_G | CDD29638 |
| MMP0312 | [NP_987432] | COG4050 | MMP1559 | [NP_988679] | MCR_A | CDD8362 |
| MMP0337 | [NP_987457] | COG4029 | MMP1560 | [NP_988680] | MtrE | CDD9765 |
| MMP0421 | [NP_987541] | COG4052 | MMP1561 | [NP_988681] | MtrD | CDD9766 |
| MMP05635 | [NP_987683] | COG4090 | MMP1562 | [NP_988682] | MtrC | CDD17461 |
| MMP0642 | [NP_987762] | COG4020 | MMP1563 | [NP_988683] | MtrB | CDD23666 |
| MMP0656 | [NP_987776] | COG4051 | MMP15644 | [NP_988684] | MtrA | COG4063 |
| MMP0665 | [NP_987785] | COG4066 | MMP1566 | [NP_988686] | MtrG | CDD9769 |
| MMP06985 | [NP_987818] | COG4033 | MMP1593 | [NP_988713] | COG1571 | |
| MMP07015 | [NP_987821] | COG4081 | MMP1644 | [NP_988764] | COG4022 | |
| MMP1223 | [NP_988343] | COG4065 | MMP1704 | [NP_988824] | COG4008 | |
| MMP13095 | [NP_988429] | COG4073 | ||||
| MMP0372 | [NP_987492] | MTD CDD2518 | MMP0962 | [NP_988082] | COG4855 | |
| MMP04001 | [NP_987520] | COG1707 | MMP09765 | [NP_988096] | COG1810 | |
| MMP04995 | [NP_987619] | ArsR CDD28947 | MMP09845 | [NP_988104] | CO_dh | CDD3060 |
| MMP06072 | [NP_987727] | NrpR COG1693 | MMP14995 | [NP_988619] | HTH | COG4800 |
| MMP09615 | [NP_988081] | CDD15263 | MMP15673 | [NP_988687] | MtrH | CDD25859 |
| Mbur_0042 | [YP_564815] | Mbur_0546 | [YP_565273] | |||
| Mbur_0348 | [YP_565093] | Mbur_0652 | [YP_565373] | |||
| Mbur_0350 | [YP_565095] | Mbur_0992 | [YP_565682] | |||
| Mbur_0545 | [YP_565272] | Mbur_1754 | [YP_566394] | CDD48145 | ||
| Mbur_0387 | [YP_565131] | CDD28974 | Mbur_1911 | [YP_566543] | ||
The protein ID number starting with Mbur represents query protein from the genome of M. burtonii. "=" means paralogous genes.
Note . A homolog to MMP0400 is found in Solibacter usitatus Ellin6076 and Rubrobacter xylanophilus DSM 9941;
Note . A homolog to MMP0607 is found in Dehalococcoides sp. CBDB1 and D. ethenogenes 195;
Note . A homolog to MMP1567 is found in 2 Desulfitobacterium hafniense strains (Firmicutes), and the CmuB protein from 3 species belonging to Rhizobiales of α-proteobacteria also show great similarity with MtrH;
Note . A homolog to MMP1564 is also found in Dechloromonas aromatica RCB;
Note . These 10 proteins are absent in the genome of Methanosphaera stadtmanae DSM 3091.
Proteins that are specific to certain subgroups of methanogens
| MMP0125 | [NP_987245] | COG4018 | MMP1451 | [NP_988571] EhaD | COG4039 | |||
| MMP0935 | [NP_988055] | CDD26896 | MMP1452 | [NP_988572] EhaE | COG4038 | |||
| MMP1243 | [NP_988363] | CDD30112 | MMP1453 | [NP_988573] EhaF | COG4037 | |||
| MMP1449 | [NP_988569] EhaB | COG4041 | MMP1454 | [NP_988574] EhaG | COG4036 | |||
| MMP1450 | [NP_988570] EhaC | COG4040 | MMP1498 | [NP_988618] | CDD26800 | |||
| MMP0127 | [NP_987247] Hmd | CDD8560 | MMP1459 | [NP_988579] EhaL | COG4035 | |||
| MMP0267 | [NP_987387] | COG4053 | MMP1497 | [NP_988617] | COG4019 | |||
| MMP0618 | [NP_987738] | COG4075 | MMP1598 | [NP_988718] | CDD15766 | |||
| MMP1217 | [NP_988337] | COG4024 | MMP1664 | [NP_988784] | COG4071 | |||
| MMP1448 | [NP_988568] EhaA | COG4042 | MMP1716 | [NP_988836] HmdII | CDD8560 | |||
| MK0046 | [NP_613333] | MK0502 | [NP_613787] | MK0927 | [NP_614210] | |||
| MK0108 | [NP_613395] | MK0749 | [NP_614033] | MK1599 | [NP_614882] = MK0927 | |||
| MK0147 | [NP_613434] | MK0750 | [NP_614034] | MK1282 | [NP_614565] = MK0502 | |||
| MK0241 | [NP_613528] | MK0751 | [NP_614035] | MK1513 | [NP_614796] | |||
| MK0431 | [NP_613716] | MK0854 | [NP_614137] | COG0707 | MK1541 | [NP_614824] | ||
| Mbur_0178 | [YP_564939] | Mbur_1314 | [YP_565982] | Mbur_1890 | [YP_566523] | |||
| Mbur_0218 | [YP_564978] | Mbur_1506 | [YP_566163] | Mbur_1953 | [YP_566584] | |||
| Mbur_0544 | [YP_565271] | Mbur_1512 | [YP_566169] | COG4742 | Mbur_1956 | [YP_566587] | ||
| Mbur_0997 | [YP_565686] | Mbur_1689 | [YP_566333] | Mbur_2254 | [YP_566865] | |||
| Mbur_1283 | [YP_565952] | Mbur_1863 | [YP_566496] | |||||
| MMP0124 | [NP_987244] | MMP1073 | [NP_988193] | COG1320 | MMP1460 | [NP_988580] EhaM | ||
| MMP0223 | [NP_987343] | MMP1110 | [NP_988230] | CDD2427 | MMP1633 | [NP_988753] | ||
| MMP0940 | [NP_988060] | |||||||
| MMP1065 | [NP_988185] | MMP1467 | [NP_988587] EhaT | MMP1568 | [NP_988688] | COG4010 | ||
| MMP1118 | [NP_988238] | CDD28974 | ||||||
| Mbur_0145 | [YP_564912] | Mbur_1977 | [YP_566606] | Mbur_2094 | [YP_566718] | |||
| Mbur_1266 | [YP_565937] | Mbur_2017 | [YP_566644] | Mbur_2402 | [YP_567003] | |||
| Mbur_1788 | [YP_566426] | |||||||
The protein ID number starting with MK represents query protein from the genome of M. kandleri AV19.
Proteins restricted to several archaeal lineages
| PAB0076 | [NP_125818] | CDD15620 | PAB1291 | [NP_127284] | CDD41906 |
| PAB0138 | [NP_125896] | CDD9576 | PAB1584 | [NP_126876] | COG4072 |
| PAB0965 | [NP_127127] | CDD15705 | PAB1860 | [NP_126440] | |
| PAB19271 | [NP_126347] | CDD29323 | PAB0813 | [NP_126902] | COG1630 |
| PAB1994 | [NP_126245] | CDD9568 | PAB0853 | [NP_126970] | |
| PAB0036 | [NP_125764] | PAB1251 | [NP_127332] | endonuclease COG3780 | |
| PAB0054 | [NP_125787] | CDD41919 | PAB1779 | [NP_126559] | CDD43950 |
| PAB0176 | [NP_125948] | CDD43579 | PAB18062 | [NP_126515] | CDD43599 |
| PAB1127 | [NP_127394] | CDD30177 | PAB2413 | [NP_126288] | COG1710 |
| PAB0981 | [NP_127155] | PAB1672 | [NP_126731] | ||
| PAB0982 | [NP_127156] | PAB3017 | [NP_125737] | ||
| PAB0985 | [NP_127159] | PAB7298 | [NP_126858] | ||
| VNG0240C3 | [AAG18840] | COG4031 | VNG2315H | [AAG20425] | MC1 CDD45747 |
| VNG1236C | [AAG19598] | VNG2508C | [AAG20570] | Cyo COG4083 | |
| VNG1611C | [AAG19875] | COG4749 | VNG2524H | [AAG20585] | |
| VNG1670C | [AAG19921] | COG3612 | VNG2669G | [AAG20696] | |
| VNG1891H | [AAG20086] | ||||
| Ta0035 | [NP_393514] | COG5592 | Ta1440 | [NP_394894] | |
| Ta0164 | [NP_393642] | Ta1453 | [NP_394906] | ||
| Ta0165 | [NP_393643] | Ta1507 | [NP_394957] | CDD29645 | |
| Ta0267 | [NP_393747] | CDD43623 | Saci_0040 | [YP_254763] | |
| Ta0308 | [NP_393788] | Saci_0054 | [YP_254777] | ||
| Ta0347 | [NP_393826] | TauA CDD31059 | Saci_0055 | [YP_254778] | |
| Ta0547 | [NP_394021] | Saci_0322 | [YP_255031] | ||
| Ta0548m4 | [NP_394022] | Saci_0323 | [YP_255032] | ||
| Ta0583 | [NP_394007] | Saci_0979 | [YP_255633] | SdhD | |
| Ta0759 | [NP_394223] | Saci_1065 | [YP_255715] | ||
| Ta0793a | [NP_394256] | Saci_1491 | [YP_256105] | CDD40171 | |
| Ta0938 | [NP_394396] | Saci_1560 | [YP_256166] | ||
| Ta0939 | [NP_394397] | PQQC CDD45213 | Saci_1747 | [YP_256346] | SoxE CDD46414 |
| Ta1156 | [NP_394612] | Saci_1952 | [YP_256548] | ||
| Ta1345 | [NP_394801] | Saci_2078 | [YP_256665] | ||
Note . A homolog to PAB1927 is also found in Rubrobacter xylanophilus DSM 9941;
Note . A homolog to PAB1806 is also found in Aquifex aeolicus VF5;
Note . A homolog to VNG0240c is also found in Methanopyrus kandleri;
Note . Two low-scoring homologs for Ta0548 are also found in Gloeobacter violaceus PCC 7421.
Figure 2Interpretive diagrams showing the suggested evolutionary stages where genes for some of the signature proteins that are specific for the Crenarchaeota and Euryarchaeota as well as some of the Crenarchaeota subgroups, likely originated. The top diagram (A) indicates the evolutionary interpretation of the signature proteins in the absence of any other information, whereas that below (B) indicates our interpretation of this data taking into consideration other relevant information discussed in the text. The branching pattern shown here is unrooted and the proteins that are shared by all archaea were introduced in a common ancestor of all archaea. The dotted line for N. equitans in (B) indicates that its placement within Euryarchaeota lineage is uncertain. The abbreviations T and AF in these figures as well as others refer to tables and Additional files.
Figure 3An interpretive diagram showing the evolutionary stages where genes for different proteins that are specific for methanogenic archaea likely originated. The 10 proteins that are uniquely shared by A. fulgidus and various methanogenic archaea indicate that this lineage is the closest ancestor of all methanogens.
Figure 4A summary diagram showing the branching order of different groups within archaea based upon species distribution patterns of various archaeal-specific proteins. The arrows mark the suggested evolutionary stages where proteins that are uniquely shared by the indicated groups were introduced. The details of these proteins can be found in the indicated tables (T) or Additional files (AF). The branching pattern shown here is unrooted. The dotted line for N. equitans indicates that its placement within Euryarchaeota is uncertain. The dotted line extending from the proteins found in all archaea indicates that one cannot use this to root the archaeal tree.