| Literature DB >> 21169198 |
Takuro Nunoura1, Yoshihiro Takaki, Jungo Kakuta, Shinro Nishi, Junichi Sugahara, Hiromi Kazama, Gab-Joo Chee, Masahira Hattori, Akio Kanai, Haruyuki Atomi, Ken Takai, Hideto Takami.
Abstract
The domain Archaea has historically been divided into two phyla, the Crenarchaeota and Euryarchaeota. Although regarded as members of the Crenarchaeota based on small subunit rRNA phylogeny, environmental genomics and efforts for cultivation have recently revealed two novel phyla/divisions in the Archaea; the 'Thaumarchaeota' and 'Korarchaeota'. Here, we show the genome sequence of Candidatus 'Caldiarchaeum subterraneum' that represents an uncultivated crenarchaeotic group. A composite genome was reconstructed from a metagenomic library previously prepared from a microbial mat at a geothermal water stream of a sub-surface gold mine. The genome was found to be clearly distinct from those of the known phyla/divisions, Crenarchaeota (hyperthermophiles), Euryarchaeota, Thaumarchaeota and Korarchaeota. The unique traits suggest that this crenarchaeotic group can be considered as a novel archaeal phylum/division. Moreover, C. subterraneum harbors an ubiquitin-like protein modifier system consisting of Ub, E1, E2 and small Zn RING finger family protein with structural motifs specific to eukaryotic system proteins, a system clearly distinct from the prokaryote-type system recently identified in Haloferax and Mycobacterium. The presence of such a eukaryote-type system is unprecedented in prokaryotes, and indicates that a prototype of the eukaryotic protein modifier system is present in the Archaea.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21169198 PMCID: PMC3082918 DOI: 10.1093/nar/gkq1228
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 2.Sequence alignments of Ub, E1, E2 (super-) and JAMM family proteins. (A) Sequence alignments of eukaryotic and archaeal Ub superfamily proteins; proteins from Saccharomyces cerevisiae; S.cere Smt3 (6320718) and S.cere Rpl40 (6322043), from human; Human sumo2 (54792071), Human sumo1 (54792065), Human NEDD8 (5453760) and Human Ufm1 (7705300), from Cyanidioschyzon merolae; C.mero smt3 (CME004C), C.mero ubl (CML042C) and C.mero Rps27 (CMN125C), from Tetrahymena thermophila; T.ther ubl1 (229594936) and T.ther ubl2 (118367859), from Cryptosporidium parvum; C. parv ubl1 (126654302), C.parv Rps27 (66357428) and C.parv ubl2 (66363058), from Giardia lamblia; G.lamb sumo (159114790), G.lamb Epl40 (159108136), G.lamb ub1 (159112981), G.lamb ub2 (159111413), from Trypanosoma brucei; T.bruc ub (72387960) and T.bruc ubl (72387818), from C. subterraneum; eukaryote-type Ubl (CSUB_C1474) and prokaryote-type Ubls (ThiS/MoaD) (CSUB_C0525, CSUB_C0702, CSUB_C1012, CSUB_C1603), from H. volcanii; SAMPs, HVO_0202 (302595884) and HVO_2619 (302595883), from Bacillus subtilis; B.sub ThiS (CAB13025), from Streptomyces avermitilis; S.aver ThiS (BAC73805), from Nitrosomonas europaea; N.euro ThiS (CAD84196), from Escherichia coli; E.coli MoaB (AAN79339), from Pyrococcus furiosus; P.furi MoaB (1VJK_A), from Methanosarcina acetivorans; M.acet MoaB (AAM05120), from Aromatoleum aromaticum; A.arom NrfH (CAI07579) and from Pseudomonas syringae; P.syri NrfH (AAY39230). Asterisks indicate the C-terminal Gly-Gly motif. (B) Sequence alignments of adenylation and catalytic cysteine domains in E1 superfamily proteins; proteins from human; Human E1L (23510338), Human sumoE1 (60594167), Human UBA1 (23510338), Human UBA2 (4885649), Human UBA3 (38045942), Human UBA5 (13376212), Human ATG7 (119584500) and Human MOCS3 (7657339), from Schizosaccharomyces pombe; S.pomb E1L (162312305) and S.pomb UBA3 (19113852), from S. cerevisiae; S.cere Aos1 (6325438), S.cere UBA1 (6322639), S.cere UBA2 (6320598), S.cere ATG7 (6321965), S.cere UBA4 (6321903) and S.cere YgdLl (6322825), from T. thermophila; T.ther E1L (118383519), T.ther E1B (118351055), T.ther UBA4 (118351953) and T.ther YgdLl (118400480), from Trypanosoma cruzi; T.cruz E1 (71411317), from Plasmodium yoelii; P. yoel UBA2 (82595829) and P.uoel MoeB (83315401), from Trichomonas vaginalis; T.vagi APG7 (123446747), from C. subterraneum; E1l (CSUB_C1476) and MoeB (CSUB_C1135), from H. volacanii; HVO_0558 (292654724), Cupriavidus metallidurans; C.meta ThiF (4039868), from Clostridium perfringens; C.perf (86559649), from Shewanella sp. ANA3; S.ANA3 (117676291), from Rhizobium etli; R.etli (86359719), from Anabaena variabilis; A.vari (ABA25158), from Polaromonas naphthalenivorans; P.naph (121605347), from Nostoc sp. PCC7120; Nostoc (BAB77147), from Xanthomonas axonopodis; X.axon MoeB (21242767), from E. coli; E.coli MoeB (1JW9_B) from C. symbiosum; C.symb ThiF (ABK78649), from P. furiosus; P.furi MoeB (18977661), from Geobacillus kaustophilus; G.kaus MoeBl (56419161), Desulfuromonas acetoxidans; D.acet ThiF (95930339), from Desulfovibrio desulfuricans; D.desu ThiF (78357502), from Bacteroides thetaiotaomicron; B.thet (29349047), from M. tuberculosis; M.tube Rv (15609475), from Cytophaga hutchinsonii; C.hutc (110639176), and from Bacillus thuringiensis; B.thur (110639176). Asterisks and plus indicate adenylation active sites and thiolating cysteine, respectively. Mg2+ chelating motifs (CxxC) are shown by octothorpes. (C) Alignment of E2 superfamily proteins; proteins from human; Human E2A (32967280), Human E2D (5454146), Human E2N (61175265), Human E2G1 (13489085), Human E2G2 (29893557), Human E2K (163660385), Human E2H (4507783), Human E2M (4507791), Human E2J2 (37577124), Human E2J (37577122) and Human Tsg101 (5454140), from Arabidopsis thaliana; A.thal E2I (15230881), A.thal E2C (18403097) and A.thal E2J (18401338), from Chlamydomonas reinhardtii; C.rein E2K (159463008), from C. merolae; C.mero E2D (CMB015C) and C.mero E2N (CMR010C), from Plasmodium falciparum; P.fal E2D (124805463), from S. cerevisiae; S.cere E2A (6321380), S.cere E2D (6319556), S.cere E2N (6320297), S.cere E2I (6320139), S.cere E2C (6324915), S.cere E2G2 (6323664), S.cere E2K (6320382), S.cere E2H (6579192), S.cere E2M (6323337) and S.cere E2J2 (6320947), from S. pombe; S.pomb E2G1 (6323664), from T. thermophila; T.ther E2M (118382495), from T. vaginalis; T.vagi E2M (123484378), from G. lamblia; G. lamb E2D (159111264), from C. subterraneum; CSUB_C1475, from Ruegeria sp; Rueger (22726448), from Arthrobacter sp.; Arthro (A0AW81), from E. coli; E.coli (37927532), from Syntrophus aciditrophicus; S.acid (85859492), from Rhodobacter sphaeroides; R.spha (77387013), from Clostridium perfringens; C.perf (86559649), from Dechloromonas aromatica; D.arom (71847775), from Anabaena variabilis; A.vari (75705484), from Bacteroides thetaiotaomicron; B.thet (29339960), from Synechocystis sp. PCC6803; Synech (38423903), from Burkholderia cepacia; B.cepa (A4JA91), and from Rhizobium sp. NGR234; Rhizob (2496664). Astetisk and octothorpes indicate catalytic cysteine residue and residues forming a conserved stabilizing contact in E2 from eukaryotes, respectively. Flap histidine and asparagine residues are shown by plus. Identical and similar amino acids are shaded in black and gray, respectively. (D) Sequence alignment of JAMM family proteins; proteins from human; Human COPS5 (12654695) and Human PSMD14 (5031981), from A. thaliana; A.thal CSN5A (15219970), from S. cerevisiae; S. cere RPN11 (14318526), from T. brucei; T.bruc RPN11 (18463065) and T.bruc SCN5 (72393165), from G. lamblia; G.lamb RPN11 (159114272), from S. pombe; S.pomb AMSHP (19115685), from C. subterraneum; CSUB_C1473, from Archaeoglobus flugidus; A.flugi JAB (11499780), from Pyrococcus horikoshii; P.hori JAB (3257912), from Pseudomonas aeruginosa; P.aeru JAB (15597298), from Pyrobaculum aerophilum; Py.aer JAB (18313041), from E. coli; E.coli RadC (15801143), from B. subtilis; B.subt RadC (16079856), from M. acetivorans; M.acet RadC (20090827), from Thermotoga maritima; T.mari RadC (15644305), from Aquifex aeolicus; A.aeol (2984019); from Deinococcus radiodurans; D.radi (15805429), from Pseudomonas putida; P.puti (84994017), from Salinibacter rubber; S.rubb (83814538), from M. tuberculosis; M.tube (13880984), from Nocardia farcinica; N.farc (54014564), from Wolinella succinogenes; W.succ, and from Geobacter metallireducens; G.meta. Asterisks indicate the JAMM motif residues. Identical and similar amino acids are shaded in black and gray, respectively.
Figure 1.Circular representation of the C. subterraneum composite genome. From the inside, the first and second circles show the GC skew (values >0 or <0 are indicated in green and pink, respectively) and the G+C percent content (values greater or smaller than the average percentage in the overall chromosome are shown in blue and sky blue, respectively) in a 10-kb window with 100-bp step, respectively. The third and fourth circles show the presence of RNAs (rRNA and tRNA); CDSs aligned in the clock-wise and counterclock-wise directions are indicated in the upper and lower sides of the circle, respectively. Colors of CDSs indicate their functional categories; red for information storage and processing, green for metabolism, blue for cellular processes and signaling, and gray for poorly characterized function.
Distribution patterns of representative components for DNA replication/repair, cell division, translation and transcription among Crenarchaeota, Euryarchaeota, Thaumarchaeota, Korarchaeota and C. subterraneum
| Major DNA polymerases | BII, D | BI, BII | BI, D | BII, D | BI, BII, D |
| Chromosome segregation ATPase | + | − | + | + | + |
| ERCC4 like helicase (COG01111) | + | − | + | − | |
| Topoisomerase I | IA, IB | IA | IA | IA | IA |
| FtsZ | + | − | + | + | + |
| Hisotne | + | − | + | + | + |
| RNA polymerase RpoA | fusion | split | split | fusion | fusion |
| RNA polymerase RpoB | fusion | split | split/fusion | fusion | fusion |
| RNA polymerase RPB8 | − | + | − | − | + |
| Ribosomal protein S25, S26, S30 | + | + | − | + | + |
| Ribosomal protein L14e, 34e | + | + | + (some) | − | + |
| Ribosomal protein L13e | − | + | − | (+) | + |
| Ribosomal protein LXa | − | + | + (most) | − | − |
| Ribosomal protein L39e | − | + | + | + | − |
+, present; −, absent.
aCharacterization of DNA polymerase is based on Ref. (47).
bOnly C-terminal domain is found in C. symbiosum and N. maritimus.
cOnly found in Thermofilum pendens and Caldivirga maquilingensis.
dFusion form is observed in Thermococcales and Thermoplasmatales.
eOnly found in N. gargensis.
Figure 3.The gene cluster of the Ub-like protein modifier system in C. subterraneum. CDSs without gene annotation encode hypothetical proteins. CDSs; rpn11l (CSUB_C1473), ubl (CSUB_C1474), e2l (CSUB_C1475), e1l (CSUB_C1476) and srfp (CSUB_C1477) encode eukaryotic RPN11, Ubl, E2l and E1l and small RING finger protein, respectively.
Figure 4.Phylogenetic analyses of Archaea including C. subterraneum. (A) Maximum likelihood phylogenetic tree of concatenated (SSU+LSU) rRNA genes using 3063 identical nucleotide positions. Bacterial sequences were used as out-group. Numbers indicate bootstrap values from 100 replications. (B) Maximum likelihood phylogenetic tree of concatenated universally conserved 45 ribosomal proteins and nine RNA polymerase subunits using aligned identical 5993 amino acid residues. Eukaryotic sequences were used as out-group. Numbers indicate bootstrap values (%) from 200 replications. (C) Maximum likelihood phylogenetic tree made from archaeal translation EF2 proteins based on 590 identical residues. Numbers indicate bootstrap values (%) from 200 replications.
Figure 5.Maximum likelihood phylogenetic tree of concatenated (SSU+LSU) DNAP. Number of identical amino acid residues used were 829. Numbers indicate bootstrap values (%) from 200 replications.
Figure 6.Venn diagrams presenting number of arCOGs among crenarchaeotic lineages; Caldiarchaeum, Korarchaeum and Thaumarchaeota. (A, B and C) Venn diagrams presenting number of arCOGs represents genome core genes of hyperthermophilic Crenarchaeota (HC: red) and Euryarchaeota (E: blue) in the genomes of the novel crenarchaeal lineages; Caliarchaeum subterraneum (Caldi), Thaumarchaeota (Thaum) and K. cryptofilum (Kor). A total of 11 hyperthermophilic-crenarchaeal and 27 euryarchaeal genomes in arCOG database were used in this analysis. (A) Genes that are represented in all sequenced genome used in arCOG from the represented division, but that are missing in at least some organisms of the other division. (B) Genes present in more than two-thirds of the genomes from one division and absent in the other division. (C) Genes that are present in at least one representative of each order of one division, but are absent from all genomes in the other division. (D) A Venn diagram presenting number of arCOGs shared among three crenarchaeotic lineages; Caldiarchaeum, Korarchaeum and Thaumarchaeota.