| Literature DB >> 22065628 |
Christoph Lederer1, Dominik Heider, Johannes van den Boom, Daniel Hoffmann, Jonathan W Mueller, Peter Bayer.
Abstract
Peptidyl-prolyl cis/trans isomerases (PPIases) are enzymes assisting protein folding and protein quality control in organisms of all kingdoms of life. In contrast to the other sub-classes of PPIases, the cyclophilins and the FK-506 binding proteins, little was formerly known about the parvulin type of PPIase in Archaea. Recently, the first solution structure of an archaeal parvulin, the PinA protein from Cenarchaeum symbiosum, was reported. Investigation of occurrence and frequency of PPIase sequences in numerous archaeal genomes now revealed a strong tendency for thermophilic microorganisms to reduce the number of PPIases. Single-domain parvulins were mostly found in the genomes of recently proposed deep-branching archaeal subgroups, the Thaumarchaeota and the ARMANs (archaeal Richmond Mine acidophilic nanoorganisms). Hence, we used the parvulin sequence to reclassify available archaeal metagenomic contigs, thereby, adding new members to these subgroups. A combination of genomic background analysis and phylogenetic approaches of parvulin sequences suggested that the assigned sequences belong to at least two distinct groups of Thaumarchaeota. Finally, machine learning approaches were applied to identify amino acid residues that separate archaeal and bacterial parvulin proteins from each other. When mapped onto the recent PinA solution structure, most of these positions form a cluster at one site of the protein possibly indicating a different functionality of the two groups of parvulin proteins.Entities:
Keywords: PPIase; Pin1; Thaumarchaeota; archaeal protein; single-domain parvulin
Year: 2011 PMID: 22065628 PMCID: PMC3204937 DOI: 10.4137/EBO.S7683
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Occurrence of prolyl isomerases in archaeal genomes.
| 8 | 1 per genome | None | None | None | (Hyper) thermophilic | ||
| 8 | 1 per genome | None | None | None | (Hyper) thermophilic | ||
| 12 | 1 per genome | None | None | None | (Hyper) thermophilic | ||
| 3 | 1 per genome | None | None | None | (Hyper) thermophilic | ||
| 4 | 1 per genome | 1 in Picrophilus | None | None | thermophilic | ||
| 9 | 1 in Pyrococcus | None | None | None | Pyrococcus: hyperthermophilic; | ||
| 13 | 1–3 per genome | 1 per genome | None | None | Mesophilic/moderatly thermophilic | ||
| 12 | 2 in Methanocaldococcus; | None in Methanocaldococcus; | None | None | Methanococcus: meophilic | ||
| 1 | 4 | 1 | None | None | Mesophilic | ||
| 1 | 1 | None | None | None | thermophilic | ||
| 7 | 4 in Methanosarcina; | 1 per genome; 2 in | None | none; 1 in | Mesophilic | ||
| 8 | 1 per genome | 1 per genome | None | none | Mesophilic | ||
| 6 | 1–4 per genome | 1–2 per genome | None | 1 in 4 genomes | Mesophilic | ||
| 3 | 2 in Micrarchaeum; | None | None | 1 in two genomes | Mesophilic | ||
| – | 3 | 1 per genome | 1 per genome | 1 per genome; | 1 per genome | Mesophilic/psychrophilic | |
| – | 1 | 1 | None | None | None | (Hyper) thermophilic | |
| – | 1 | None | None | None | None | (Hyper) thermophilic | |
| 1 | 1, 2 or 3 (ambiguously annotated) | None | None | None | (Hyper) thermophilic | ||
Notes: Refer to Supplementary Table 1 for details.
Please note that additional genome sequences became available very recently.
Figure 1.Genomic context analysis of the archaeal parvulin locus. The genetic background was analysed as described in the main text. White-backed arrows indicate genes occurring only once. Other colour codes are indicated within the figure. All abbreviations in this schematic are given below. 12 parvulin containing contigs starting with “AACY” from the Sorcerer II voyage,49 ACXJ01008586 from samples collected from thick floating biofilms in the Richmond Mine.6 ABEF01053500 from a subtropical gyre in 4000 m depth was deposited by Ed DeLong and colleagues from the Hawaiian research station ALOHA. The parvulin containing fosmid AD1000-56-E4 derives from plankton collected in 1000 m depth in the Adriatic Sea.8 The metagenomic contigs AACY023784421 and ACXJ01008586 were added to their groups on basis of their parvulin primary sequence, for ACXJ01008586 this stands in agreement with its origin. The marine metagenomic contig AACY023450473 contains 341 amino acids of the N-terminus of an aminotransferase class I/II. This sequence shows highest similarity to the protein YP_001737635 from Candidatus Korarchaeum cryptofilum OPF8 (39 percent amino acids identity over 331 amino acids). Hence, this contig may belong to the Korarchaeota. AACY023772022, AACY020521263, AACY020172942 and AACY020179599 were clustered with Nitrosopumilus maritimus, because they share the same PPOX gene preceding parvulin. Comparably, ABEF01053500 was grouped with Cenarchaeum symbiosum due to an antisense hypothetical protein DUF2203 preceding the parvulin gene. With 80% sequence identity of their parvulin proteins, Nitrosopumilus maritimus and Cenarchaeum symbiosum were grouped together in Thaumarchaeota I. AACY022114635, AACY020912937, AACY023104196 and AACY020565072 were also clustered due to the gene directly upstream of parvulin, the hypothetical protein homologous to nmar_0940. AACY021994642 was also grouped with these contigs because of its high parvulin primary sequence similarity. AACY023721900 and umc-AD1000-56-E4 have a totally different upstream region, but they can clearly been classified as Thaumarchaeota II due to the typical downstream DHCP reading frame and their parvulin primary sequence.
Abbreviations: 1, hypothetical protein censya_1187 (Cenarchaeum specific), [GeneID: 6371367], Cenarchaeum symbiosum A, [Ref.Seq.: NC_014820.1]; 2, hypothetical protein censya_1186 (Cenarchaeum specific), [GeneID: 6371366], Cenarchaeum symbiosum A, [Ref.Seq.: NC_014820.1]; ADC, acetolactate decarboxylase, [GeneID: 5411158], Methanoregula boonei 6A8, [Ref.Seq.: NC_009712.1]; AEF, auxin efflux carrier, [GeneID: 7271583], Methanosphaerula palustris E1-9c, [Ref.Seq.: NC_011832.1]; AKR, adenylate kinase related protein, [GenBank: EET90508.1], Candidatus Micrarchaeum acidiphilum ARMAN-2, [GenBank: GG697236.1]; AM, antibiotic biosynthesis monooxygenase, [GenBank: EEZ92921.1], Candidatus Parvarchaeum acidiphilum ARMAN-4, [GenBank: GG730045.1]; AT1/2, aminotransferase class I and II, n.a., n.a., [GenBank: AACY023450473.1]; ATS, asparagyl-tRNA-synthetase, n.a., n.a., [GenBank: ACXJ01008586.1]; Aumc, hypothetical protein (uncultured marine crenarchaeota-umc specific), n.a., n.a., [GenBank: ABEF01053500.1]; C2, cupin 2 conserved barrel, [GeneID: 9742629], Methanoplanus petrolearius DSM 11571, [Ref.Seq.: NC_014507.1]; CK, carbamate kinase, [GeneID: 9742627], Methanoplanus petrolearius DSM 11571, [Ref.Seq.: NC_014507.1]; DAM, DNA adenine methylase, [GeneID: 9742631], Methanoplanus petrolearius DSM 11571, [Ref.Seq.: NC_014507.1]; DOS, dihydropterate synt, [GeneID: 5411815], Methanoregula boonei 6A8, [Ref. Seq.: NC_009712.1]; GCN5, GCN5-related N-acetyltransferase, [GeneID: 7271586], Methanosphaerula palustris E1-9c, [Ref.Seq.: NC_011832.1]; h1, hypothetical protein nmar_0943 (Nitrosopumilus specific), [GeneID: 5773171], Nitrosopumilus maritimus SCM1, [Ref.Seq.: NC_010085.1]; h2, hypothetical protein nmar_0945 (Nitrosopumilus specific), [GeneID: 5774572], Nitrosopumilus maritimus SCM1, [Ref.Seq.: NC_010085.1]; hA1, hypothetical protein UNLARM2_0040 (ARMAN specific), [GenBank: EET90506.1], Candidatus Micrarchaeum acidiphilum ARMAN-2, [GenBank: GG697236.1]; HAD, HAD-superfamily hydrolase, [GenBank: EET90509.1], Candidatus Micrarchaeum acidiphilum ARMAN-2, [GenBank: GG697236.1]; HEAT, HEAT-repeat containing protein, [GeneID: 4795375], Methanocorpusculum labreanum Z, [Ref.Seq.: NC_008942.1]; HIT, histidin triade protein, [GenBank: ACF09643.1], uncultured marine crenarchaeote AD1000-56-E4, [GenBank: EU686623.2]; HMCS, 3-hydroxy-3-methylglutaryl CoA synthase, [GenBank: EET90505.1], Candidatus Micrarchaeum acidiphilum ARMAN-2, [GenBank: GG697236.1]; hP1, hypothetical protein BJBARM4_0439 (parvarchaeum specific), [GenBank: EEZ92926.1], Candidatus Parvarchaeum acidiphilum ARMAN-4, [GenBank: GG730045.1]; hP2, hypothetical protein BJBARM4_0438 (parvarchaeum specific), [GenBank: EEZ92925.1], Candidatus Parvarchaeum acidiphilum ARMAN-4, [GenBank: GG730045.1]; hP3, hypothetical protein BJBARM4_0436 (parvarchaeum specific), [GenBank: EEZ92923.1], Candidatus Parvarchaeum acidiphilum ARMAN-4, [GenBank: GG730045.1]; hy1, hypothetical protein (specific for Thaumarchaeota and umcs), [GenBank: ACF09649.1], uncultured marine crenarchaeote AD1000-56-E4, [Gen-Bank: EU686623.2]; hy2, hypothetical protein (specific for Thaumarchaeota and umcs), [GenBank: ACF09648.1], uncultured marine crenarchaeote AD1000-56-E4, [GenBank: EU686623.2]; hy3, hypothetical protein (umc specific), [GenBank: ACF09644.1], uncultured marine crenarchaeote AD1000-56-E4, [GenBank: EU686623.2]; hyp, hypothetical protein with putative conserved domain DUF726, n.a., n.a., [GenBank: AACY020565072.1]; hyp1, hypothetical protein MBOO_0211 (no blastp-hit), [GeneID: 5411157], Methanoregula boonei 6A8, [Ref.Seq.: NC_009712.1]; hyp2, hypothetical protein MBOO_0214 (no blastp-hit), [GeneID: 5411814], Methanoregula boonei 6A8, [Ref.Seq.: NC_009712.1]; M10, methan mark 10, [GeneID: 4795594], Methanocorpusculum labreanum Z, [Ref.Seq.: NC_008942.1]; MCM, MCM family protein, [GeneID: 7271582], Methanosphaerula palustris E1-9c, [Ref.Seq.: NC_011832.1]; MCST, methyl-accepting chemotaxis sensory transducer with Pas/Pac sensor, [GeneID: 7271585], Methanosphaerula palustris E1-9c, [Ref.Seq.: NC_011832.1]; MDP, metal-dependent protease (COG 1310), [GenBank: ACF09647.1], uncultured marine crenarchaeote AD1000-56-E4, [GenBank: EU686623.2]; NED, NAD-dependent epimerase/dehydratase, [GenBank: EET90510.1], Candidatus Micrarchaeum acidiphilum ARMAN-2, [GenBank: GG697236.1]; OCT, ornithine carbamoyltransferase, [GeneID: 9742628], Methanoplanus petrolearius DSM 11571, [Ref.Seq.: NC_014507.1]; PDP, pirin-domain protein, [GenBank: EEZ92922.1], Candidatus Parvarchaeum acidiphilum ARMAN-4, [GenBank: GG730045.1]; RIII, ribonuclease III, [GeneID: 4795607], Methanocorpusculum labreanum Z, [Ref.Seq.: NC_008942.1]; S1, S1-tex like protein, n.a., n.a., [GenBank: AACY023784421.1]; SdM, SAM dependent methyltransferase, [GeneID: 4795613], Methanocorpusculum labreanum Z, [Ref.Seq.: NC_008942.1]; SMC, SMC-domain containing protein, [GeneID: 4795178], Methanocorpusculum labreanum Z, [Ref.Seq.: NC_008942.1]; TolB, TolB-like protein, [GeneID: 9742632], Methanoplanus petrolearius DSM 11571, [Ref.Seq.: NC_014507.1].
Figure 2.Mean Shannon entropy of the archaeal parvulin and its genomic neighbours. At the parvulin locus of N. maritimus, we found the following neighbouring proteins to be present in at least 5 different contigs: UbiA, DUF, hyp, PPOX, and DHCP. The mean Shannon entropy (unit: bit) of these sequences was calculated as a measure of sequence diversity and compared with the same measure for sdPar from the corresponding organisms.
Abbreviations: UbiA, UbiA prenyltransferase; DUF, hypothetical protein of unknown function (COG4911/DUF2203); hyp, hypothetical protein nmar_0940; PPOX, pyridoxamine 5′-phosphate oxidase-related FMN-binding protein; Par, parvulin; DHCP, DEAD/DEAH box containing protein.
Figure 3.Archaeal branch of an MLP tree combined with genomic context. The figure displays an expanded section of the maximum likelihood phylogeny tree from Figure 3. The outgroup has been omitted and the whole bacterial clade has been collapsed for clarity. Red signs indicate three deletion events suggested by the genomic context: The deletion of the hyp0940 and PPOX genes is described in the main text. The putative PPOX deletion seems to be a basal event for the Thaumarchaeota II subgroup. The large genetic rearrangement concerning the uncultured marine Crenarchaeota fosmid AD1000-56-E4 makes this sequence unique in the group of Thaumarchaeota. Next to the Thaumarchaeota, the corresponding parvulin loci with the available genomic contexts are displayed. The groups predicted from the genomic context are also well defined in the MLP tree.
Figure 4.Proposed primers for further metagenomic analyses. To get these primers the nucleotide sequences of the parvulin surrounding genes have been aligned with ClustalW. Several requirements (length between 18 and 24 bases, average GC content over 40%, average salt adjusted melting temperature between 50 °C and 65 °C) has been applied. The resulting primers for genes surrounding the thaumarchaeal parvulin are shown in this figure. For positions that were ambiguous, the respective IUPAC code for degenerate bases have been used: A or C, M; A or G, R; A or T, W; G or C, S; C or T, Y; G or T, K; A, G or C, V; A, C or T, H, A, G or T, D; G, C or T, B; A, G, C or T, N.
Figure 5.A random forest (RF) can be trained to discriminate between bacterial and archaeal single-domain parvulins. Two descriptors were used (hydrophobicity and net charge) to describe the protein sequences. The very same dataset used for the MLP tree was used here. Prediction scale represents “0.0”, bacterial, and “1.0”, archaeal. The separation according to the MLP tree is represented in green (archaeal) and red (bacterial). For the hydrophobicity descriptor, the RF perfectly separate the two classes (F1 score = 1.0). For the net charge descriptor, the RF reaches an F1 score of 0.979 (cut off = 0.2).
Figure 6.Random forest approach to identify special features of archaeal parvulins. A random forest was used to distinguish archaeal and bacterial parvulins. (A) importance values were derived from the random forest analysis and plotted on the primary sequence of the parvulin protein from Cenarchaeum symbiosum. (B) the identified positions of all 23 archaeal proteins were used to create a protein logo. (C) mapping of important residues on the structure [PDB:2QRS] of the C. symbiosum parvulin.