Literature DB >> 34202810

Open Issues for Protein Function Assignment in Haloferax volcanii and Other Halophilic Archaea.

Friedhelm Pfeiffer1, Mike Dyall-Smith1,2.   

Abstract

BACKGROUND: Annotation ambiguities and annotation errors are a general challenge in genomics. While a reliable protein function assignment can be obtained by experimental characterization, this is expensive and time-consuming, and the number of such Gold Standard Proteins (GSP) with experimental support remains very low compared to proteins annotated by sequence homology, usually through automated pipelines. Even a GSP may give a misleading assignment when used as a reference: the homolog may be close enough to support isofunctionality, but the substrate of the GSP is absent from the species being annotated. In such cases, the enzymes cannot be isofunctional. Here, we examined a variety of such issues in halophilic archaea (class Halobacteria), with a strong focus on the model haloarchaeon Haloferax volcanii.
RESULTS: Annotated proteins of Hfx. volcanii were identified for which public databases tend to assign a function that is probably incorrect. In some cases, an alternative, probably correct, function can be predicted or inferred from the available evidence, but this has not been adopted by public databases because experimental validation is lacking. In other cases, a probably invalid specific function is predicted by homology, and while there is evidence that this assigned function is unlikely, the true function remains elusive. We listed 50 of those cases, each with detailed background information, so that a conclusion about the most likely biological function can be drawn. For reasons of brevity and comprehension, only the key aspects are listed in the main text, with detailed information being provided in a corresponding section of the Supplementary Materials.
CONCLUSIONS: Compiling, describing and summarizing these open annotation issues and functional predictions will benefit the scientific community in the general effort to improve the evaluation of protein function assignments and more thoroughly detail them. By highlighting the gaps and likely annotation errors currently in the databases, we hope this study will provide a framework for experimentalists to systematically confirm (or disprove) our function predictions or to uncover yet more unexpected functions.

Entities:  

Keywords:  Gold Standard Protein; Haloferax volcanii; annotation error; genome annotation; haloarchaea

Mesh:

Substances:

Year:  2021        PMID: 34202810      PMCID: PMC8305020          DOI: 10.3390/genes12070963

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

Haloferax volcanii is a model organism for halophilic archaea [1,2,3,4,5,6], for which an elaborate set of genetic tools has been developed [7,8,9]. Its genome has been sequenced and carefully annotated [1,10,11]. A plethora of biological aspects have been successfully tackled in this species, with examples including DNA replication [4]; cell division and cell shape [12,13,14,15,16]; metabolism [17,18,19,20,21,22,23,24,25]; protein secretion [26,27,28,29]; motility and biofilms [30,31,32,33,34,35]; mating [36]; signaling [37]; virus defense [38]; proteolysis [39,40,41,42,43,44]; posttranslational modification (N-glycosylation; SAMPylation) [45,46,47,48,49,50]; gene regulation [21,25,51,52,53,54,55]; microproteins [56,57,58] and small noncoding RNAs (sRNAs) [59,60,61,62]. Genome annotations are frequently compromised by annotation errors [11,63,64,65]. Many of these errors are caused by an invalid annotation transfer between presumed homologs, which, once introduced, are further spread by annotation robots. This problem can be partially overcome by using a Gold Standard Protein (GSP)-based annotation strategy [11]. Since the GSP has itself been subjected to an experimental analysis, its annotation cannot be caused by an invalid annotation transfer process. The GSP strategy was already applied to a detailed analysis of the metabolism of halophilic archaea [66]. However, with a decreasing level of sequence identity, the assumption of isofunctionality becomes increasingly uncertain. Although this may be counterbalanced by additional evidence, e.g., gene clustering, experimental confirmation would be the best option for validation of the annotation. There are additional and much more subtle genome annotation problems. In some cases, GSPs are true homologs, and the annotated function in the database is correct. Nevertheless, the biological context in the query organism makes it unlikely that the homologs are isofunctional, e.g., when the substrate of the GSP is lacking in the query organism. Additionally, paralogs may have distinct but related functions that cannot be assigned by a sequence analysis but may be assigned based on phylogenetic considerations. Here, again, experimental confirmation is the preferred option for validation of the annotation. A lack of experimental confirmation may keep high-level databases like KEGG or the SwissProt section of UniProt from adopting assignments based on well-supported bioinformatic analyses, so that the database entries continue to provide information that is probably incorrect. We refer to annotation problems in these databases solely to underscore that the biological issues raised by us are far from trivial. There is no intention to question the exceedingly high quality of the SwissProt and KEGG databases [67,68] and their tremendous value for the scientific community. We have actively supported them by providing feedback and encourage others to do the same, e.g., with the recently implemented “Add a publication” functionality in the UniProt entries that allows users to connect a protein to a publication that describes its experimental characterization (https://community.uniprot.org/bbsub/bbsubinfo.html). In this study, we describe a number of annotation issues for haloarchaea, with a strong emphasis on Hfx. volcanii. We denote such cases as “open annotation issues” with the hope of attracting members of the Haloferax community and other groups working with halophilic archaea to apply experimental analyses to elucidate the true function(s) of these proteins. This will increase the number of Gold Standard Proteins that originate from Hfx. volcanii or other haloarchaea, reduce genome annotation ambiguities and perhaps uncover novel metabolic processes.

2. Materials and Methods

2.1. Curation of Genome Annotation and Gold Standard Protein Identification

The Gold Standard Protein-based curation of haloarchaeal genomes has been described previously [11] (see, also, next paragraph). Since then, a systematic comparison to the KEGG data was performed for a subset of the curated genomes [69]. The Hfx. volcanii genome annotation is continuously scrutinized, especially when a closely related genome is annotated [70]. In brief, the core rule of Gold Standard Protein-based genome annotation is to assign a specific function only when a homologous protein has been confirmed experimentally to have this function. Two types of data must be available for that homolog: (a) a reference describing the experimental characterization and (b) an entry in a sequence database, so that the level of sequence similarity can be determined. The decision on whether isofunctionality can be assumed at this level of sequence similarity and, thus, if the annotation can be transferred represents an informed prediction by the annotator based on the available evidence. This decision may be taken only once for a set of closely related orthologs, such as those from halophilic archaea.

2.2. Additional Bioinformatics Tools

The key databases were UniProtKB/SwissProt [68], InterPro [71], KEGG [67] and OrthoDB [72]. The SyntTax server was used for inspecting the conservation of the gene neighborhood [73]. As general tools, the BLAST suite of programs [74,75] was used for sequence comparisons.

3. Results

The open issues are organized below under Section 3.1, the respiratory chain and oxidative decarboxylation; Section 3.2, amino acid metabolism; Section 3.3, heme and cobalamin biosynthesis; Section 3.4, coenzyme F420; Section 3.5, tetrahydrofolate as opposed to methanopterin; Section 3.6, NAD and riboflavin; Section 3.7, lipid metabolism; Section 3.8, genetic information processing and Section 3.9, stand-alone (miscellaneous) cases. We collected this set of open annotation issues during our continuous efforts to keep the Hfx. volcanii genome up-to-date since its initial publication in 2010 [1]. Not covered in this study are enigmatic reactions and pathways (e.g., archaeal signal peptidase II or the haloarchaeal O-glycosylation pathway) for which no support from experimentally characterized homologs (GSP proteins) is available.

3.1. The Respiratory Chain and Oxidative Decarboxylation

In the respiratory chain, the coenzymes that were reduced during catabolism (e.g., glycolysis) are reoxidized, with the energy being saved as an ion gradient. The textbook examples of a respiratory chain are the five mitochondrial complexes [76,77]: complex I (NADH dehydrogenase), complex II (succinate dehydrogenase), complex III (cytochrome bc1 complex), complex IV (cytochrome-c oxidase as a prototype for a terminal oxidase) and complex V (F-type ATP synthase). In mitochondria, a significant part of the NADH that feeds into the respiratory chain originates from oxidative decarboxylation: the conversion of pyruvate to acetyl-CoA by the pyruvate dehydrogenase complex and conversion of α-ketoglutarate to succinyl-CoA by the homologous 2-oxoglutarate dehydrogenase complex. While complexes I and II transfer reducing elements to a lipid-embedded two-electron carrier (ubiquinone), the bc1 complex transfers the electrons to the one-electron carrier cytochrome-c, a heme (and, thus, iron) protein, which then transfers electrons to the terminal oxidase. Bacteria like Escherichia coli and Paracoccus denitrificans have related complexes and enzymes: NADH dehydrogenase (encoded by the nuo operon), succinate dehydrogenase (encoded by sdhABCD) and the related fumarate reductase (encoded by frdABCD) [78], several terminal oxidases (e.g., products of cyoABCDE and cydABC) and an F-type ATP synthase (encoded by atp genes). E. coli lacks a bc1 complex, which, however, occurs in Paracoccus denitrificans [79]. E. coli contains the canonical complexes of oxidative decarboxylation (the pyruvate dehydrogenase complex, encoded by aceEF+lpdA, and the 2-oxoglutarate dehydrogenase complex, encoded by sucAB+lpdA). The respiratory chain of Hfx. volcanii and other haloarchaea deviates considerably from those of mitochondria and bacteria such as Paracoccus and E. coli (reviewed by [80]), and a number of questions remain unresolved. We focus on the equivalents of complexes I, III and IV, because these have unresolved issues. We also cover some aspects relevant for the NADH levels (oxidative decarboxylation enzymes and type II NADH dehydrogenase). We do not cover complexes that have already been studied in haloarchaea: complex II (succinate dehydrogenase) [81,82,83] and complex V (ATP synthase) [84,85]. (a) In haloarchaea, oxidative decarboxylation is not linked to the reduction of NAD to NADH but to the reduction of a ferredoxin (encoded by fdx, e.g., OE_4217R, HVO_2995), which has a redox potential similar to that of the NAD/NADH pair [86]. The enzymes for oxidative decarboxylation are pyruvate–ferredoxin oxidoreductase (porAB, e.g., OE_2623R/2622R and HVO_1305/1304) and 2-oxoglutarate–ferredoxin oxidoreductase (korAB, e.g., OE_1711R/1710R and HVO_0888/0887), and these have been characterized from Halobacterium salinarum [87,88,89]. (b) It is yet unresolved how ferredoxin Fdx is reoxidized, but this might be achieved by the Nuo complex. This ferredoxin may well be involved in additional metabolic processes. In Hfx. volcanii, ferredoxin Fdx (HVO_2995) plays an essential role in nitrate assimilation [90]. However, in Hbt. salinarum, this metabolic process for Fdx reoxidation does not exist. (c) The nuo cluster of haloarchaea resembles that of E. coli, a type I NADH dehydrogenase, with the genes and gene order highly conserved and just a few domain fissions and fusions. However, haloarchaea lack NuoEFG [91], which is a subcomplex that mediates interaction with NADH [92,93]. Thus, the haloarchaeal nuo complex is unlikely to function as NADH dehydrogenase, despite its annotation as such in KEGG (as of April 2021). (d) Other catabolic enzymes generate NADH, which must also be reoxidized. Based on inhibitor studies, NADH is not reoxidized by a type I but, rather, by a type II NADH dehydrogenase in Hbt. salinarum [82]. A tentative gene assignment has been made for Natronomonas pharaonis [66]. However, for reasons detailed in Supplementary Text S1 Section S1, this assignment is highly questionable, so this issue calls for an experimental analysis. (e) About one-third of the haloarchaea, especially the Natrialbales, do not code for a complex III equivalent (the cytochrome bc1 complex encoded by petABC), according to OrthoDB analysis. The bc1 complex is required to transfer electrons from the lipid-embedded two-electron carrier (menaquinone in haloarchaea) to the one-electron carrier associated with terminal oxidases (probably halocyanin). How electrons flow in the absence of a complex III equivalent is currently unresolved. The haloarchaeal petABC genes resemble those of the chloroplast b6-f complex rather than those of the mitochondrial bc1 complex (see Supplementary Text S1 Section S1 for more details). (f) A bc cytochrome was purified from Nmn. pharaonis, but with an atypical 1:1 ratio between the b-type and c-type hemes [81]. The complex is heterodimeric, with subunits of 18 kDa and 14 kDa. The 18-kDa subunit carries the covalently attached heme group [81]. An attempt was made to identify the genes coding for these subunits [94] (for details, see Supplementary Text S1 Section S1). Two approaches were used to obtain protein sequence data, one being the N-terminal protein sequencing of the two subunits extracted from a SDS-polyacrylamide gel. In the other attempt, peptides from the purified complex were separated by HPLC, and a peptide which absorbed at 280 nm (protein), as well as 400 nm (heme), was isolated. Absorption at 400 nm clearly indicates that the isolated peptide contains a covalently attached heme group. The sequences from the two approaches overlapped and resulted in a contiguous sequence of 41 aa, with only the penultimate position remaining undefined [94]. Based on this information, a PCR probe was generated (designated “cyt-C Sonde”) that allowed the gene to be identified and sequenced, including its genomic neighborhood. It turned out that the genes coding for the four subunits of succinate dehydrogenase (sdhCDBA) were isolated. The obtained protein sequence corresponds to the N-terminal region of sdhD (with the initiator methionine cleaved off) and only two sequence discrepancies, in addition to the unresolved penultimate residue. In the PhD thesis [94], this unambiguous result was rated to be a failure (and the data were never formally published). The reason is that SdhD is free of cysteine residues, while standard textbooks state that a pair of cysteines is required for covalent heme attachment [95]. The lack of the required cysteine pair was taken to indicate that the results were incorrect and that the identified genes did not encode the cytochrome bc that the study was seeking [94]. In contrast, we speculate that the results were completely correct, despite being in conflict with the cysteine pair paradigm. In our opinion, a paradigm shift is required. The obtained results call for a yet-unanticipated novel mode of covalent heme attachment, exemplified by the 18-kDa subunit of Natronomonas succinate dehydrogenase subunit SdhD. It should be noted that the 41-aa protein sequence, which was obtained, turned out to contain three histidine residues upon translation of the gene, but none of these were detected upon Edman degradation. In Halobacterium, a small c-type cytochrome was purified (cytochrome c552, 14.1 kDa) [96]. Heme staining after SDS-PAGE indicated a covalent heme attachment, but no sequence or composition data were reported, so that it was not possible to identify the protein based on the available information. We speculate that the Halobacterium cytochrome c552 also represents SdhD (as detailed in Supplementary Text S1 Section S1). In that case, the proposed novel type of covalent heme attachment would not be restricted to Nmn. pharaonis but might be a general property of haloarchaea. This would also solve the “Halobacterium paradox” [95]. (g) The haloarchaeal one-electron carrier is the copper protein halocyanin rather than the iron-containing heme protein cytochrome-c. A halocyanin from Nmn. pharaonis (NP_3954A) was characterized, including its redox potential [97,98,99]. A gene fusion supports the close connection of a halocyanin with a subunit of a terminal oxidase. For further details, see Supplementary Text S1 Section S1. (h) Terminal oxidases are highly diverse in haloarchaea, and we restricted our analysis to three species (Nmn. pharaonis, Hfx. volcanii and Hbt. salinarum), because in each of these, at least one terminal oxidase has been experimentally studied (Table 1). The details are described in Supplementary Text S1 Section S1 with subunits of all analyzed terminal oxidases listed in Supplementary Table S1.
Table 1

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.1).

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
1aHVO_1305HVO_1304 porAB yes67%80%OE2623ROE2622RB0R4 × 6B0R4 × 5[87][88][89]155559962668266266827
1aHVO_0888HVO_0887 korAB yes77%77%OE1711ROE1710RB0R3G0B0R3F9[88][89]62668266266827
1a/1bHVO_2995 fdx yes88%OE4217RB0R7I9[100][101][86]964365188650201489role in oxidative decarboxylation
1a/1bHVO_2995(cont.) selfD4GY89[90]22103537role in nitrate assimilation
1cHVO_0979(complex) nuoB possibly50%tlr0705Q8DKZ4[102][103][104]159102823057354532001694reoxidizes ferredoxin
1cHVO_0979(cont.) no48%b2287P0AFC7[92][93]76072279485311reoxidizes NADH in E.coli
1dNP_3508A ndh1 special26% (N-term 140 aa)-Q7ZAG8 function of Q7ZAG8 was reassigned (from ndh1 to sqr) after annotation transfer
1dNP_3508A(cont.) possibly30%BpOF4_04810A7LKG4[105]18359284type II NADH dehydrogenase
1eHVO_2620HVO_0842HVO_0841 petABD yes39%SYNPCC7002_A0842P28056[106]11245788HVO_0842 (petB) related to cytochrome b6
1fHVO_2810 sdhD yes66%NP_4268AQ3INS7[81][94]9109654PhD_Mattar
1gHVO_0943 cbaD yes57%NP_2966AA0A1U7EWW4[107]9428682
HVO_0943(cont.) -63%OE_4073R(C-term)B0R7A9 -halocyanin/cbaD fusion protein, uncharacterized
1gHVO_2150 hcpG -44%OE_4073R(N-term)B0R7A9 -halocyanin/cbaD fusion protein, uncharacterized
1hHVO_0945(complex) cbaA yes64%NP_2966AA0A1U7EWW4[107]9428682
1hHVO_0907(complex) coxA1 self [108]11790755
1hHVO_0907(cont.) yes70%VNG_0657G (OE_1979R)P33588[109][110]25422391659810
1hHVO_1645(complex) coxAC2 yes43%APE_0793.1Q9YdX6[111]12471503
1hHVO_0462HVO_0461 cydAB yes32%24%--Q09049Q05780[112]1655703
1hHVO_0462HVO_0461(cont.) yes30%27%b0733b0734P0ABJ9P0ABK2[113]6307994
1hNP_4296ANP_4294A coxA3 coxB3 yes28%33%TTHA1135TTHA1134Q5SJ79Q5SJ80[114][115]28427477657607
1iHVO_2958HVO_2959 oadhAB1 selfD4GY15D4GY17[116]19910413Ile indirectly assigned as substrate
1iHVO_2958HVO_2959(cont.) self [117][118][119]108326331757121017906130no substrate was identified; pyruvate and alphaKG excluded
1iHVO_2595HVO_2596 oadhAB2 self [120][119][116]120039541790613019910413no substrate was identified; pyruvate and alphaKG excluded
1iHVO_0669HVO_0668 oadhAB3 self [119][116]1790613019910413no substrate was identified; pyruvate and alphaKG excluded
1iHVO_2209 oadhA4 self not yet analyzed experimentally
1iHVO_2958HVO_2959(cont.) yes/no38%52%TA1438TA1437Q9HIA3Q9HIA4[121]17894823substrates are Ile, Leu, Val
1iHVO_2595HVO_2596(cont.) no41%41%--Q57102Q57041[122]1898934substrate is acetoin
1iHVO_2595HVO_2596(cont.) unknown40%43%BSU08060BSU08070O31404O34591[123]10368162substrate is acetoin
1iHVO_0669HVO_0668(cont.) unknown54%47%BSU08060BSU08070O31404O34591[123]10368162substrate is acetoin
1iHVO_0669HVO_0668(cont.) unknown49%43%--Q57102Q57041[122]1898934substrate is acetoin
1iHVO_2209(cont.) unknown38%TA1438Q9HIA3[121]17894823substrates are Ile, Leu, Val

The column Section refers to the table listing the protein and to the section in the Results and in Supplementary Text S1. As an example, 2c covers topic (c) from the decimal-numbered Results Section 3.2. Amino Acid Biosynthesis. In Supplementary Text S1, this is covered under Section S2 subsection S2.c. The corresponding proteins are listed in Table 2. For a few proteins, two sections are indicated (e.g., 1a/1b). The column Code refers to a haloarchaeal protein by its locus tag, which is mainly from Haloferax volcanii (HVO) but, also, from Halobacterium salinarum (OE), Natronomonas pharaonis (NP) and Halohasta litchfieldiae (halTADL). When the reconstruction of a complete pathway is presented, the unassigned genes are indicated as a “pathway gap”. In one case, we indicate the absence of a haloarchaeal ortholog by a dash. In the case of a complex, we either list more than one code or we list only one subunit together with the term (complex). All subunits of these complexes are listed groupwise in Table S1. A protein may be shown in more than one row. From the 2nd row onwards, this is indicated by the term (cont.). The column Gene lists the assigned gene or a dash if no gene has been assigned. The assigned gene is only indicated in the first row of a protein. A set of four columns is used to relate a query protein to an experimentally characterized homolog, a GSP (Gold Standard Protein) (isofunc, %seq_id, Locus tag, UniProt). The column isofunc indicates if the query protein and its Gold Standard Protein homolog are isofunctional. The meanings of the terms used in this column in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 (yes, no, yes/no, probably, possibly, unclear, unknown, prediction, special and “-“) are described at the end of this legend. The column %seq_id indicates the protein sequence identity between the query protein and the homologous GSP. The column Locus tag contains the locus tag, if assigned. The column UniProt contains the UniProt accession of the GSP. GSPs are experimentally characterized as described in a publication. The column Reference links to the reference list of the manuscript. The column PMID lists the PubMed ID of the publication, if available. Otherwise, this is indicated as “not in PubMed”. Additionally, one PhD thesis is indicated (PhD_Mattar). The column Comment provides various types of additional information. The terms used in the column isofunc in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 have the following meanings: The term “yes” indicates that we consider the two proteins as isofunctional and annotate the query protein accordingly. The term “no” is used when we conclude that the proteins differ in function. Additional terms are used for more difficult cases. The term “yes/no” is used for GSPs that are multifunctional, and we assign only a subset of these functions to the query protein. The term “probably” is used when we consider it likely that the proteins are isofunctional and annotated the query protein accordingly (with the term probable added to the protein name). The term “possibly” is used when we see a good chance that the proteins are isofunctional but consider it too speculative to annotate the protein accordingly. The term “unclear” is used when we consider it likely that the same overall reaction is catalyzed but when reaction details, e.g., the energy-providing compound, are unresolved. The term “unknown” is used when it is not possible to predict the substrate of the query protein. The term “prediction” is used if a function assignment is based on bioinformatic analyses but not yet on an experimentally characterized homologous protein. The term “special” is used when multiple arguments have to be considered, with the full details provided in the corresponding section of Supplementary Text S1. Finally, a hyphen (“-“) is used when isofunctionality does not apply, e.g., when a homologous Gold Standard Protein could not be identified.

(i) NAD-dependent oxidative decarboxylation is a canonical reaction to convert pyruvate into acetyl-CoA and α-ketoglutarate into succinyl-CoA. In haloarchaea, the conversion of pyruvate to acetyl-CoA and α-ketoglutarate to succinyl-CoA is dependent on ferredoxin, not on NAD (see above). Nevertheless, most haloarchaeal genomes also code for homologs of enzymes catalyzing NAD-dependent oxidative decarboxylation, such as the E. coli pyruvate dehydrogenase complex. In most cases, the substrates could not be identified, an exception being a paralog involved in isoleucine catabolism [116]. In several cases, the enzymes were found not to show catalytic activity with pyruvate or α-ketoglutarate (see Supplementary Text S1 Section S1 for details). Additionally, a conditional lethal porAB mutant was unable to grow on glucose or pyruvate, thus excluding that alternative enzymes for the conversion of pyruvate to acetyl-CoA exist in Hfx. volcanii [22]. Nonetheless, despite experimental results to the contrary, pyruvate has been assigned as a substrate for some of the homologs of the pyruvate dehydrogenase complex in KEGG (as of April 2021).

3.2. Amino Acid Metabolism

While most amino acid biosynthesis and degradation pathways can be reliably reconstructed, a few open issues remain, which are discussed below. (a) The first and last steps of arginine biosynthesis deal with blocking and unblocking of the α-amino group of the substrate (glutamate) and a product intermediate (ornithine). As detailed in Supplementary Text S1 Section S2, it is highly likely that glutamate is attached to the γ-carboxyl group of a carrier protein, and ornithine is released from that carrier protein. This is based on characterized proteins from Thermus thermophilus [124], Thermococcus kodakarensis [125] and Sulfolobus acidocaldarius [126]. The assignment is strongly supported by clustering of the arginine biosynthesis genes. Some of the homologs are bifunctional, being involved in arginine biosynthesis but, also, in lysine biosynthesis via the prokaryotic variant of the α-aminoadipate pathway. This ambiguity is not assumed to occur in haloarchaea, which use the diaminopimelate pathway for lysine biosynthesis [127] (see Supplementary Text S1 Section S2 for further discussion of this issue). Expanding the above, we provided full details underlying our reconstruction of arginine and lysine biosynthesis in Hfx. volcanii in Table 2.
Table 2

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.2). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneisofunc%seq_idLocus tagUniProtReferencePMIDComment
2aHVO_0047 argW no54%TT_C1544Q72HE5[128]25392000for Arg, not for Lys biosynthesis
2aHVO_0047(cont.) yes/no39%Saci_0753Q4JAQ0 only for Arg, not for Lys biosynthesis
2aHVO_0047(cont.) yes/no61%TK0279Q5JFV9[125]27566549only for Arg, not for Lys biosynthesis
2aHVO_0046 argX no44%TT_C1543Q72HE6[124]19620981for Arg, not for Lys biosynthesis
2aHVO_0046(cont.) yes30%Saci_1621Q4J8E7 only for Arg, not for Lys biosynthesis
2aHVO_0046(cont.) yes/no37%TK0278Q5JFW0[125]27566549only for Arg, not for Lys biosynthesis
2aHVO_0044 argB no41%TT_C1541O50147[124][128]1962098125392000for Arg, not for Lys biosynthesis
2aHVO_0044(cont.) yes/no33%Saci_0751Q4JAQ2[126]23434852only for Arg, not for Lys biosynthesis
2aHVO_0044(cont.) yes/no32%TK0276Q5JFW2[125]27566549only for Arg, not for Lys biosynthesis
2aHVO_0045 argC no48%TT_C1542O50146[124][129]1962098126966182for Arg, not for Lys biosynthesis
2aHVO_0045(cont.) yes/no42%Saci_0750Q4JAQ3[126]23434852only for Arg, not for Lys biosynthesis
2aHVO_0045(cont.) yes/no46%TK0277Q5JFW1[125]27566549only for Arg, not for Lys biosynthesis
2aHVO_0043 argD no45%TT_C1393Q93R93[130]11489859for Arg, not for Lys biosynthesis
2aHVO_0043(cont.) yes/no40%Saci_0755Q4JAP8[126]23434852only for Arg, not for Lys biosynthesis
2aHVO_0043(cont.) yes/no42%TK0275Q5JFW3[125]27566549only for Arg, not for Lys biosynthesis
2aHVO_0042 argE no36%TT_C1396Q8VUS5[124][131]1962098128720495for Arg, not for Lys biosynthesis
2aHVO_0042(cont.) yes/no29%Saci_0756Q4JAP7[126]23434852only for Arg, not for Lys biosynthesis
2aHVO_0042(cont.) yes/no37%TK0274Q5JFW4[125]27566549only for Arg, not for Lys biosynthesis
2aHVO_0041 argF yes50%P18186BSU11250[132]4216455
2aHVO_0041(cont.) yes47%OE_5205RB0R9X3[133]7868583
2aHVO_0049 argG yes35%-P00966[134]8792870human
2aHVO_0049(cont.) yes23%b3172P0A6E4[135]10666579 E. coli
2aHVO_0048 argH yes38%MMP0013O74026[136]10220900
2aHVO_0008 lysC yes32%BSU28470P08495[137]15033471
2aHVO_2487 asd yes51%MJ0205Q57658[138]16225889
2a/9eHVO_1101 dapA yes45%PA1010Q9I4W3[139]21396954
2aHVO_1100 dapB yes33%b0031P04036[140]7893644
2aHVO_1099 dapD yes32%b0166P0A9D8[141]6365916
2aHVO_1096 dapE yes29%b2472P0AED7[142]3276674function supported by gene clustering
2aHVO_1097 dapF yes35%b3809P0A6K1[143]6378903
2aHVO_1098 lysA yes38%b2838P00861[144]14343156
2aHVO_A0634-unknown25%b2472P0AED7[142]3276674function assigned to HVO_1096 in dap cluster
2bHVO_0790 fba2 special67%OE_1472FB0R334[145]25216252EC 2.2.1.10 activity of OE_1472F not yet confirmed in vitro
2bHVO_0790(cont.) special45%MJ0400Q57843[146]15182204substrate uncertain
2bHVO_0792 aroB yes69%OE_1475FB0R336[145]25216252OE_1475F only partially characterized
2bHVO_0792(cont.) yes44%MJ1249Q58646[146]15182204
2bHVO_0602 aroD1 yes44%OE_1477RB0R338[145]25216252
2bHVO_0602(cont.) yes31%MMP1394Q6LXF7[147]15262931
2cHVO_0009 tnaA yes41%b3708P0A853[148][149]265959014284727
2dHVO_A0559 hutH yes42%BSU39350P10944[150][151]245491314066617
2dHVO_A0562 hutU yes62%BSU39360P25503[152]4990470
2dHVO_A0560 hutI yes42%BSU39370P42084[153]16990261
2dHVO_A0561 hutG yes33%BSU39380P42068[152]4990470
2eHVO_0431-- no GSP available
2eHVO_0644 leuA1 yes/no47%MJ1392Q58787[154]9864346HVO_0644 monofunc (CimA) or bifunc (CimA+LeuA);MJ1392 CimA
2eHVO_0644(cont.) unclear44%MJ1195Q58595[155]9665716HVO_0644 monofunc (CimA) or bifunc (CimA+LeuA);MJ1195 LeuA
2e/2fHVO_1510 leuA2 yes47%MJ1195Q58595[155]9665716HVO_1510 LeuA; MJ1195 LeuA
2e/2fHVO_1510(cont.) no41%MJ1392Q58787[154]9864346HVO_1510 LeuAMJ1392 CimA
2eHVO_A0489 - no31%MJ1392Q58787[154]9864346HVO_A0489 general function only;MJ1392 CimA
2eHVO_A0489(cont.) no30%MJ1195Q58595[155]9665716HVO_A0489 general function only;MJ1195 LeuA
2eHVO_1153-- function unassigned;no GSP
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.2). For a description of this table, see the legend to Table 1. (b) Archaea use a different precursor for aromatic amino acid biosynthesis than the classical pathway. This has been resolved for Methanocaldococcus jannaschii and for Methanococcus maripaludis [146,156]. However, the initial steps may differ from those reported for Methanocaldococcus in that fructose 1,6-bisphosphate, rather than 6-deoxy-5-ketofructose, might be a substrate [145]. Up to now, a clean deletion of the corresponding enzymes and confirmation with in vitro assays has not yet been achieved (for details, see Supplementary Text S1 Section S2). (c) The gene for tryptophanase (tpa) is stringently regulated in Haloferax, which is the basis for using its promoter in the toolbox for regulated gene expression [157]. The shutdown of this gene avoids tryptophan degradation when supplies are scarce. Tryptophanase cleaves tryptophan into indole, pyruvate and ammonia. The fate of indole is, however, yet unresolved. (d) A probable histidine utilization cluster exists, based on the characterized homologs from Bacillus subtilis, but has not yet been experimentally verified. (e) Among the 16 auxotrophic mutants observed in a Hfx. volcanii transposon insertion library [9], some could grow only in the presence of one (or several) supplied amino acids. In many cases, the affected genes were known to be involved in the corresponding pathway, but the others may lead to novel function assignments. One affected gene resulted in histidine auxotrophy, and the product of this gene (HVO_0431) is an interesting candidate. The InterPro domain assignment (HAD family hydrolase) fits into the only remaining pathway gap in histidine biosynthesis (histidinol-phosphatase). In this context, it should be noted that the enzyme that catalyzes the preceding reaction (encoded by hisC) is part of a highly conserved three-gene operon involved in polar lipid biosynthesis (see below). For details, see Supplementary Text S1 Section S2. One affected gene resulted in isoleucine auxotrophy. The product of this gene (HVO_0644) is currently annotated to catalyze two reactions, one being an early step in isoleucine biosynthesis (EC 2.3.1.182) and the other being the first step after leucine biosynthesis branches off from valine biosynthesis (EC 2.3.3.13) (see below, (f)) (for details, see Supplementary Text S1 Section S2). (f) Hfx. volcanii codes for two paralogs with an attributed function as 2-isopropylmalate synthase (EC 2.3.3.13). This is the first reaction specific to leucine biosynthesis when the pathway branches off valine biosynthesis. One paralog, HVO_0644, is annotated as bifunctional, also catalyzing a chemically similar reaction that is an early step in isoleucine biosynthesis (EC 2.3.1.182). When the gene encoding HVO_0644 is disrupted by transposon integration, cells cannot grow in the absence of isoleucine. It is unclear if the protein is really bifunctional and is really involved in leucine biosynthesis, catalyzing the reaction of EC 2.3.3.13. The other paralog, HVO_1510, belongs to an ortholog set with major problems concerning the start codon assignment. The ortholog set from the 16 genomes listed in Supplementary Table S2 was analyzed. When only canonical start codons are considered (ATG, GTG and TTG), the orthologs from Haloferax mediterranei, Nmn. pharaonis, Natronomonas moolapensis and Halohasta litchfieldiae either lack a long highly conserved N-terminal region or they are disrupted (pseudogenes), being devoid of a potential start codon. The gene from Hfx. volcanii has a start codon (GTG) that is consistent with that of Haloferax gibbonsii strain LR2-5 (but a GTA in Hfx. gibbonsii strain ARA6). In this region, the gene from Hfx. mediterranei is closely related but has in-frame stop codons. HVO_1510 is considerably longer than the orthologs from Haloquadratum walsbyi, Haloarcula hispanica and Natrialba magadii. The first alternative start codon for HVO_1510 codes for Met-93. This protein was proteomically identified in three ArcPP datasets [2], and peptides upstream of Met-93 were identified. This gene might be translated from an atypical start codon, either an in-frame CTG or an out-of-frame ATG, which would require ribosomal slippage (for details, see Supplementary Text S1 Section S2 and Supplementary Figure S1). It is tempting to speculate that translation occurs only when leucine is not available.

3.3. Coenzymes I: Cobalamin and Heme

The classical heme biosynthesis pathway branches off cobalamin biosynthesis at the level of uroporphyrinogen III. A second pathway exists in bacteria (CPD pathway). Haloarchaea use the alternative heme biosynthesis pathway [158], which has an additional common step with cobalamin biosynthesis, the conversion of uroporphyrinogen III to precorrin-2. For heme biosynthesis, precorrin-2 is converted into siroheme. This pathway was reconstructed [159], except for the iron insertion step. For de novo cobalamin biosynthesis, haloarchaea use the cobalt-early pathway. A key reaction in this pathway variant, catalyzed by CbiG, is cobalt-dependent. Thus, cobalt must be inserted early and is present in all intermediates [160]. Several aspects of heme and cobalamin biosynthesis in haloarchaea have yet to be resolved. This is illustrated in Figure 1.
Figure 1

Illustration of the haloarchaeal cobalamin and heme biosynthesis pathways and of the major cobalamin biosynthesis gene cluster. (A) Biosynthesis pathways. This illustration is based on the corresponding KEGG map 00860. Small circles represent pathway intermediates and have their names assigned. Pathway intermediates upstream of precorrin-2 are not displayed. The circle for sirohydrochlorin is highlighted in red, as this is the branchpoint for heme and cobalamin biosynthesis in haloarchaea. Enzymatic reactions are shown by arrows, the EC numbers being provided in rectangular boxes. Rectangles are colored when the enzyme has been reconstructed for haloarchaea (blue: heme biosynthesis; dark yellow: de novo cobalamin biosynthesis; light yellow: late cobaltochelatase, which may be a salvage reaction). Gene names in green are adopted from KEGG and represent those from bacterial model pathways. Consecutive arrowheads indicate reaction series that are not shown in detail for space reasons. Additionally, some enzymes of the heme biosynthesis pathway are omitted for space reasons. For enzymatic reactions that are considered to be open issues, Hfx. volcanii locus tags are provided. For two pathway gaps (white boxes in the cobalt-early pathway), the type of reaction is indicated (oxidoreductase and ~CH3, indicating a methylation reaction). The question mark after HVO_B0058 indicates that this protein, currently co-attributed to EC 2.1.1.272, is a candidate for the yet-unassigned EC 2.1.1.195 reaction. We note that haloarchaea might use a deviating biosynthesis pathway, e.g., by swapping the methylation and oxidoreductase reactions (not illustrated). (B) The major cobalamin cluster, encoded on megaplasmid pHV3. Arrows are used to indicate the coding strand and are roughly drawn to scale. If assigned, the gene name is provided in addition to the Hfx. volcanii locus tag. Locus tags in red indicate genes that are part of the cobalamin cluster.

(a) Hfx. volcanii contains two annotated cbiX genes. For the reasons detailed in Supplementary Text S1 Section S3, we predict that one is a cobaltochelatase, involved in cobalamin biosynthesis, while the other is a ferrochelatase, responsible for the conversion of precorrin-2 to siroheme in the alternative heme biosynthesis pathway. (b) De novo cobalamin biosynthesis has been extensively reconstructed upon curation of the genome annotation [11]. All enzymes of the pathway and their associated GSPs are listed in Table 3. Only two pathway gaps remained, and because these are consecutive, it may be possible that the haloarchaeal pathway is noncanonical and proceeds via a novel biosynthetic intermediate. There are only four genes with yet-unassigned functions in the Hfx. volcanii cobalamin gene cluster, and their synteny is well-conserved in the majority of haloarchaeal genomes. Thus, these genes are obvious candidates for filling the pathway gaps (for details, see Supplementary Text S1 Section S3).
Table 3

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.3). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
3aHVO_B0054 cbiX1 yes30%-O87690[163]12408752cobaltochelatase
3aHVO_B0054(cont.) yes27%MTH_1397O27448[164]12686546cobaltochelatase
3aHVO_1128 cbiX2 no29%AF0721O29537[165]16835730cobaltochelatase
3aHVO_1128(cont.) no28%MTH_1397O27448[164]12686546cobaltochelatase
3aHVO_1128(cont.) no29%AF0721O29537[165]16835730cobaltochelatase
3aNP_0734A cbiX3 - function unassigned;no GSP; distantly related to paralogs
3aHVO_2312 sirC yes/no31%Mbar_A1461Q46CH4[166]21197080precorrin-2 DH; no analysis for Fe-chelatase
3aHVO_2312(cont.) yes/no29%STM3477P25924[167][168]1459539532054833matches to the N-term domain which is bifunctional as precorrin-2 DH and Fe-chelatase
3aHVO_2312(cont.) yes/no29%-P61818[163][169]1240875218588505precorrin-2 DH; devoid of Fe-chelatase activity
3bHVO_B0061 cbiL no32%STM2024Q05593[170]1451790equivalent reaction on cobalt-free substrate
3bHVO_B0057 cbiH2 yes45%-O87689[160]23922391corresponds to N-term of O87689 which has a C-term extension
3bHVO_B0057(cont.) no40%STM2027Q05590[171][172]933140316198574equivalent reaction on cobalt-free substrate
3bHVO_B0058 cbiH1 special32%-O87689[160]23922391corresponds to N-term of O87689 which has a C-term extension; more distant to O87689 than CbiH2
3bHVO_B0058(cont.) no30%STM2027Q05590[171][172]933140316198574equivalent reaction on cobalt-free substrate
3bHVO_B0060 cbiF no40%STM2029P0A2G9[170][173]145179016866557equivalent reaction on cobalt-free substrate
3bHVO_B0060(cont.) yes38%-O87686[160]23922391
3bHVO_B0059 cbiG yes24%-O87687[160]23922391
3bpathway gap EC 2.1.1.195
3bpathway gap EC 1.3.1.106
3bHVO_B0062 cbiT yes36%-O87694[160]23922391corresponds to the C-term of bifunctional O87694
3bHVO_B0048 cbiE yes28%-O87694[160]23922391corresponds to the N-term of bifunctional O87694
3bHVO_B0049 cbiC yes33%-O87692[160]23922391
3bHVO_A0487 cbiA no37%STM2035P29946[174]15311923equivalent reaction on cobalt-free substrate
3bHVO_B0052-- function unassigned;no GSP
3bHVO_B0053-- function unassigned;no GSP
3bHVO_B0055-- function unassigned;no GSP
3bHVO_B0056-- function unassigned;no GSP
3cHVO_A0488 cobA yes31%MM_3138Q8PSE1[175]16672609
3cHVO_A0488(cont.) yes30%STM1718P31570[176]12080060
3cHVO_2395 pduO yes37%-Q9XDN2[177]11160088PduO and CobA are isofunctional;In Q9XDN2, the PduO domain (N-term) is fused to a DUF336 domain
3cHVO_A0553 cbiP yes63%VNG_1576GOE_3246FQ9HPL5B0R5X2[178]14645280
3cHVO_0587 cbiB yes58%VNG_1578HOE_3253FQ9HPL3B0R5X4[178]14645280
3cHVO_0592 cbiZ yes57%VNG_1583COE_3261FQ9HPL3B0R5X8[179]14990804
3cHVO_0589 cobY yes47%VNG_1581COE_3257FQ9HPL1B0R5X6[180]12486068
3cHVO_0588 cobS yes30%STM2017Q05602[181]17209023
3c- STM0643P39701[182]7929373EC 3.1.3.73; CobC; no homolog in haloarchaea
3cHVO_0586 - prediction---[161]12869542EC 3.1.3.73; prediction for HSL01294 (VNG_1577C)
3cpathway gap EC 2.7.1.177
3cHVO_0591 cobD1 yes31%STM0644P97084[183]9446573
3cHVO_0593 cobD2 yes no GSP; 51% seq_id to HVO_0591 (cobD1)
3cHVO_0590 cobT prediction [161]12869542prediction for VNG_1572C
3c halTADL_3045 cobT yes39%STM0644Q05603[184]8206834
3dHVO_B0051 cobN yes34%-P29929[185]1429466
3dHVO_B0051(cont.) no29%-Q55284[186][187]86631869716491Mg chelatase
3dHVO_B0050 chlID no46%slr1030P51634[186][187]86631869716491match to N-term;Mg chelatase
3dHVO_B0050(cont.) no33%slr1777P52772[186][187]86631869716491match to complete sequence, incl distant match to N-term;Mg chelatase
3eHVO_2227 ahbA yes35%-I6UH61[158]21969545
3eHVO_2313 ahbB yes32%-I6UH61[158]21969545
3fHVO_1121 ahbC yes47%Mbar_A1793Q46BK8[158][188]2196954524669201
3fHVO_2144 ahbD self [162]29284023EC 1.3.98.6
3fHVO_2144(cont.) yes42%Mbar_A1458Q46CH7[188]24669201
3fHVO_1871 chdC self [162]29284023EC 1.3.98.5
3fHVO_1871(cont.) yes46%BSU37670P39645[189]28123057
(c) The cobalamin biosynthesis and salvage reactions (those beyond ligand cobyrinate a,c diamide) involve “adenosylation of the corrin ring, attachment of the aminopropanol arm, and assembly of the nucleotide loop that bridges the lower ligand dimethylbenzimidazole and the corrin ring” [161]. The enzymes of these branches of cobalamin biosynthesis and their associated GSPs are listed in Table 3. Only two pathway gaps remain open. For one of these, a candidate was proposed upon a detailed bioinformatic analysis [161] (for further details, see Supplementary Text S1 Section S3). (d) In the cobalt-late (aerobic) pathway variant, the intermediates are cobalt-free, and cobalt is inserted only late in the pathway. Even though haloarchaea do not use the cobalt-late pathway, so that a late cobaltochelatase is not required, they code for a homolog of the large subunit of a characterized heterotrimeric late cobaltochelatase. The adjacent gene is homologous to small subunits of other chelatases. We speculate that this late cobaltochelatase may be involved in cobalamin salvage. The chelatase has a mosaic subunit structure, as also reported previously [161] (see Supplementary Text S1 Section S3 for details). (e) In the alternative heme biosynthesis pathway, siroheme is decarboxylated to 12,18-didecarboxysiroheme, which is attributed to the proteins encoded by ahbA and ahbB. These are homologous to each other and are organized as two two-domain proteins. It is unclear if AhbA and AhbB function independently or if they form a complex. (f) Two of the three heme biosynthesis pathways (AHB and CPD) share a common last step (decarboxylation of Fe-coproporphyrin III to protoheme (heme b)). They use, however, distinct types of enzymes (AHB: ahbD, EC 1.3.98.6, adenosylmethionine-dependent heme synthase, a radical SAM enzyme; CPD: chdC, EC 1.3.98.5, peroxide-dependent heme synthase). Nearly all haloarchaea contain a chdC gene, and two-thirds also contain an ahbD gene. Hfx. volcanii was shown to use AhbD under anaerobic conditions and ChdC under aerobic conditions [162]. Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.3). For a description of this table, see the legend to Table 1.

3.4. Coenzymes II: Coenzyme F420

Even though coenzyme F420 is predominantly associated with methanogenic archaea [190,191], it occurs also in bacteria, and a small amount of this coenzyme has been detected in non-methanogenic archaea, including halophiles [192]. The genes required for the biosynthesis of this coenzyme are encoded in haloarchaeal genomes, but the origin and attachment of the phospholactate moiety are not completely resolved (see below). To the best of our knowledge, only a single coenzyme F420-dependent enzymatic reaction has yet been reported for halophilic archaea [193]. Thus, the importance of this coenzyme in haloarchaeal biology is currently enigmatic and awaits experimental analysis. (a) The pathway that creates the carbon backbone of this coenzyme has been reconstructed. We list the enzymes with their associated GSPs in Table 4. Coenzyme F420 contains a phospholactate moiety, which was reported to originate from 2-phospho-lactate [194], but this compound is metabolically not well-connected. As summarized in Supplementary Text S1 Section S4, there are various new insights regarding this pathway from recent studies in other prokaryotes [195,196]. To the best of our knowledge, the haloarchaeal coenzyme F420 biosynthesis pathway has never been experimentally analyzed.
Table 4

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.4). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
4aHVO_2198 cofH yes35%MJ1431Q58826[198][199]1459344825781338
4aHVO_2201 cofG yes43%MJ0446Q57888[198][200][199]145934482307241525781338
4aHVO_2202 cofC yes25%MJ0887Q58297[194][195][196]182606423095285731469543
4aHVO_2479 cofD yes39%MM_1874Q8PVT6[201][196]1825272431469543
4aHVO_2479(cont.) yes32%MJ1256Q58653[202]11888293
4aHVO_1936 cofE yes47%AF_2256O28028[203]17669425
4aHVO_1936(cont.) yes38%MJ0768Q58178[204]12911320
4bHVO_0433 npdG yes38%AF_0892O29370[205]not in PubMed
4bHVO_B0113-no27%Rv0132cP96809[206]24349169too distant to assume isofunctionality
4bHVO_B0342-unknown29%-O93734[207][208]870672415016352too distant to assume isofunctionality
4bNP_1902A-no28%-Q9UXP0[209][210]17354369933933too distant to assume isofunctionality
4bNP_4006A-no27%MJ0870Q58280[211]16048999too distant to assume isofunctionality
4c/5cHVO_1937 mer no38%MTH_1752O27784[212][213][214]2298726764917710891279
4dHVO_2911 phr2 yes62%VNG_1335GOE_2907RQ9HQ46B0R5D6[215][216]268116412773185
4dHVO_2843 phr1 no45%sll1629P77967[217]12535521sll1629 implicated in transcription regulation
4dHVO_2843(cont.) possibly45%At5g24850Q84KJ5[218][219]1283440517062752mediates photo-repair of ssDNA
4dHVO_1234 phr3 possibly40%Atu4765A9CH39[220]23589886
(b) The prediction of coenzyme F420-specific oxidoreductases in Mycobacterium and actinobacteria has been reported [197], leading to patterns and domains that are also found in haloarchaea. Several such enzymes are described in Supplementary Text S1 Section S4. (c) HVO_1937 might be a coenzyme F420-dependent 5,10-methylenetetrahydrofolate reductase (see, also, below: C1 metabolism, and Supplementary Text S1 Section S4). (d) The precursor for coenzyme F420 may be used by a photolyase involved in DNA repair. Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.4). For a description of this table, see the legend to Table 1.

3.5. Coenzymes III: Coenzymes of C1 Metabolism: Tetrahydrofolate in Haloarchaea and Methanopterin in Methanogens

Halophilic and methanogenic archaea use distinct coenzymes as one-carbon carriers (C1 metabolism): tetrahydrofolate in haloarchaea and methanopterin in methanogens [221,222]. Several characterized methanogenic proteins that act on or with methanopterin have comparably close homologs in haloarchaea (Table 5), which results in the misannotation of haloarchaeal proteins (e.g., in SwissProt) as being involved in methanopterin biology. We assume that the haloarchaeal proteins function with the haloarchaeal one-carbon carrier tetrahydrofolate and that this shift in coenzyme specificity is possible due to the structural similarity between methanopterin and tetrahydrofolate (a near-identical core structure consisting of a pterin heterocyclic ring linked via a methylene bridge to a phenyl ring) (Figure 2). A detailed review on the many variants of the tetrahydrofolate biosynthetic pathway is available [223].
Table 5

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.5). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
5aHVO_0709 pabA no47%TTHA1843P05379[225]2844259Trp biosynthesis
5aHVO_0709(cont.) yes/no39%BSU00750P28819[226]2123867TrpG works with TrpE and with PabB
5aHVO_0710 pabB no46%TTHA1844P05378[225]2844259Trp biosynthesis
5aHVO_0710(cont.) yes44%BSU00740P28820[227]19275258PabB; para-aminobenzoate biosynthesis
5aHVO_0708 pabC no36%AF_0933O29329[228]30733943branched-chain amino acids
5bHVO_2348 mptA self [229]19478918gene deletion phenotypes
5bHVO_2348(cont.) yes41%MJ0775Q58185[230]17497938common part of methanopterin and tetrahydrofolate biosynthesis
5bHVO_A0533-unknown27%MJ0837Q58247[231]19746965if isofunctional would resolve a pathway gap
5bHVO_2628-no31%AF_2089O28190[232]12142414first committed step to methanopterin biosynthesis
5bHVO_2628(cont.) no26%MJ1427Q58822[233]15262968first committed step to methanopterin biosynthesis
5cHVO_2573 mch no45%MK0625P94954[234]9676239acts on a one-carbon attached to methanopterin
4c/5cHVO_1937 mer no38%MTH_1752O27784[212][213][214]2298726764917710891279acts on a one-carbon compound attached to methanopterin
Figure 2

The structure of the C1 coenzymes tetrahydrofolate and methanopterin and two enzymes that act on the attached C1 compound. (A) The structures of tetrahydromethanopterin (top) and tetrahydrofolate (bottom) illustrate the similarities and differences between these C1 coenzymes. The common pteridine-based ring system is highlighted in yellow, and the initial biosynthesis step that generates this ring system is catalyzed by homologous enzymes (topic (b)). Two methanopterin-specific methyl groups are outlined by dashed ovals. N5 and N10, which are involved in the binding of the C1 compound, are colored red. (B) Two enzymatic reactions that alter the oxidation level of the C1 compound are illustrated. The methanogenic and haloarchaeal enzymes are homologous, even though they use distinct C1 coenzymes (topic (c)). It should be noted that MTH-1752 uses coenzyme F420 (not illustrated, Section 3.4, topic (c)), and this might also hold true for HVO_1937.

(a) Folate biosynthesis requires aminobenzoate. We proposed candidates for a pathway from chorismate to para-aminobenzoate [66,224] (for details, see Supplementary Text S1 Section S5). However, these predictions have not been adopted by KEGG (accessed April 2021), and without experimental confirmation, this is unlikely to ever happen. (b) GTP cyclohydrolase MptA (HVO_2348) catalyzes a reaction in the common part of tetrahydrofolate and methanopterin biosynthesis. The enzymes specific for methanopterin biosynthesis are absent from haloarchaea, and thus, the assignment of HVO_2348 to the methanopterin biosynthesis pathway in UniProt is invalid (accessed March 2021). The next common pathway step (EC 3.1.4.56) has been resolved in M. jannaschii (MJ0837) but is still a pathway gap in halophilic archaea. MJ0837 is very distantly related to HVO_A0533, which is a promising candidate for experimental analysis. HVO_2628 shows 30% protein sequence identity with the enzyme catalyzing the first committed step to methanopterin biosynthesis. As detailed in Supplementary Text S1 Section S5, we consider it likely that it does not catalyze that reaction. (c) Two enzymes that alter the oxidation level of the coenzyme-attached one-carbon compound probably function with tetrahydrofolate, even though their methanogenic homologs function with methanopterin. In contrast to their assignments in KEGG and UniProt (as of March 2021), their probable functions are thus methenyltetrahydrofolate cyclohydrolase (HVO_2573) and 5,10-methylenetetrahydrofolate reductase (HVO_1937) (see Figure 2 and Supplementary Text S1 Section S5). Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.5). For a description of this table, see the legend to Table 1.

3.6. Coenzymes IV: NAD and FAD (Riboflavin)

(a) The energy source for NAD kinase may be ATP or polyphosphate. This is unresolved for the two paralogs of probable NAD kinase (HVO_2363, nadK1 and HVO_0837, nadK2). These show only 25% protein sequence identity to each other (see Supplementary Text S1 Section S6). Polyphosphate was not found in exponentially growing Hfx. volcanii cells [235], and thus ATP is the more likely energy source. (b) HVO_0781 is encoded in nearly all haloarchaeal genomes, according to OrthoDB, and shows very strong syntenic coupling with the adjacent gene, HVO_0782, according to SyntTax analysis. Characterized homologs to HVO_0781 cleave S-adenosyl-methionine into methionine and adenosine, a reaction that seems wasteful. If so, then this gene would not be expected to be retained in most species and neither would it maintain a strongly conserved gene clustering (see Supplementary Text S1 Section S6). HVO_0782 is an enzyme involved in NAD biosynthesis, which is encoded in most haloarchaeal and archaeal genomes. Thus, HVO_0781 is also a candidate for being involved in NAD biosynthesis. (c) We described the reconstruction of riboflavin biosynthesis based on a detailed bioinformatic reconstruction [236]. The enzymes and their associated GSPs are listed in Table 6. Three pathway gaps remain, with candidate genes predicted for two of these [236] (for details, see Supplementary Text S1 Section S6).
Table 6

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.6). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
6aHVO_2363 nadK1 unclear37%Rv1695P9WHV7[237]11006082can use ATP and PP
6aHVO_2363(cont.) unclear31%AF_2373O30297 ATP or PP usage unresolved
6aHVO_0837 nadK2 unclear28%Rv1695P9WHV7 can use ATP and PP
6aHVO_0837(cont.) unclearpartialAF_2373O30297 ATP or PP usage unresolved
6bHVO_0782 nadM yes53%MJ0541Q57961[238][239]940103010331644
6bHVO_0781 - unknown42%Sare_1364A8M783[240]18720493
6bHVO_0781(cont.) unknown35%PH0463O58212[241]18551689
6cHVO_0327 ribB yes43%MJ0055Q60364[242]12200440
6cHVO_0974 ribH yes45%MJ0303Q57751[243]12603336
6cHVO_1284 arfA self [244]21999246gene deletion leads to riboflavin auxotrophy
6cHVO_1284(cont.) yes44%MJ0145Q57609[245]12475257
6cHVO_1235-prediction [236]28073944arfB candidate
6cHVO_1341 arfC yes36%MJ0671Q58085[246][247]1188910318671734
6cHVO_2483 - prediction34%MJ0699Q58110[236]28073944also predicted for MJ0699
6cpathway gap EC 3.1.3.104
6cHVO_0326 rbkR yes37%TA1064Q9HJA6[236]28073944bifunctional as gene regulator and enzyme
6cHVO_0326(cont.) yes/no32%MJ0056Q60365[248]18073108enzyme only; lacks an N-terminal HTH domain
6cHVO_1015 ribL yes50%MJ1179Q58579[249]20822113
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.6). For a description of this table, see the legend to Table 1.

3.7. Biosynthesis of Membrane Lipids, Bacterioruberin and Menaquinone

Archaeal membrane lipids contain ether-linked isoprenoid side chains (see [250] and the references cited therein). The isoprenoid precursor isopentenyl diphosphate is synthesized in haloarchaea by a modified version of the mevalonate pathway [251]. Isoprenoid units are then linearly condensed into the C20 compound geranylgeranyl diphosphate. The haloarchaeal core lipid, archaeol, consists of 2,3-sn-glycerol with two C20 isoprenoid side chains attached by ether linkages. In some archaea, especially alkaliphiles, C25 isoprenoids are also found (see, e.g., [252,253]). Additionally, a number of distinct headgroups are found in polar lipids (phospholipids) (reviewed in [250]) (Figure 3). Even though polar lipids are used as important taxonomic markers [254], their biosynthetic pathways are not completely resolved.
Figure 3

Biosynthesis of polar lipids. A key intermediate is CDP-archaeol, which is generated from archaeol (displayed as fully saturated) by CarS. Members of the InterPro:IPR000462 family then transfer the CDP-archaeol to the hydroxyl group (alcohol group) of the target molecule (backbone: serine, glycerol and myo-inositol). Subsequent modifications contribute to the diversity of polar lipids.

Haloarchaea typically have a red color, which is due to carotenoids, mainly the C50 carotenoid bacterioruberin [255,256,257]. For carotenoid biosynthesis, two molecules of geranylgeranyl diphosphate, a C20 compound, are linked head-to-head to generate phytoene, which is desaturated to lycopene [66,258]. The pathway from lycopene to the C50 compound bacterioruberin has been experimentally characterized [257,259]. (a) We assigned HVO_2725 (idsA1, paralog of NP_3696A) and HVO_0303 (idsA2, paralog of NP_0604A) for the linear isoprenoid condensation reactions, resulting in a C20 isoprenoid (EC 2.5.1.10 and EC 2.5.1.29, short-chain isoprenyl diphosphate synthase) (see, also, Supplementary Text S1 Section S7). Some archaea, mainly haloalkaliphiles, also contain C25 isoprenoid side chains. Geranylfarnesyl diphosphate synthase, the enzyme that generates the C25 isoprenoids, has been purified and enzymatically characterized from Nmn. pharaonis [260], but data required for the assignment to a specific gene have not been collected. Three paralogous genes from Nmn. pharaonis are candidates for this function (NP_0604A, NP_3696A and NP_4556A). Since NP_0604A and NP_3696A have orthologs in Hfx. volcanii, a species devoid of C25 lipids, we assigned the synthesis of C25 isoprenoids (geranylfarnesyl diphosphate synthase activity) to the third paralog, NP_4556A. UniProt assigned C25 biosynthesis activity to NP_3696A for undescribed reasons (as of April 2021), and KEGG does not make this assignment for any of the three paralogs (as of April 2021). Our assignments are supported by an analysis of the key residues that determine the length of the isoprenoid chain [261]. These authors labeled the cluster containing NP_3696A (WP011323557.1) as “C15/C20” and the cluster containing NP_4556A (WP011323984.1) as “C20->C25->C30?”. (b) Typical polar lipids in haloarchaea (Figure 3) are phosphatidylglycerophosphate methyl ester (PGP-Me) and phosphatidylglycerol (PG) but, also, phosphatidylglycerosulfate (PGS) [261,262,263]. Other polar lipids are archaetidylserine and its decarboxylation product archaetidylethanolamine, both of which are found in rather low quantities in Haloferax [264]. A third group of polar lipids has a headgroup derived from myo-inositol. The biosynthetic pathway of the headgroup is only partially resolved. One CDP-archaeol 1-archaetidyltransferase that belongs to a highly conserved three-gene operon may attach either glycerol phosphate or myo-inositol phosphate. In Supplementary Text S1 Section S7, we summarize the arguments in favor of each of these candidates, but the true function can only be decided by experimental analysis. (c) Carotenoid biosynthesis involves the head-to-head condensation of the C20 isoprenoid geranylgeranyl diphosphate to phytoene, which is desaturated to lycopene [66,258]. The crtB gene product (e.g., HVO_2524) catalyzes the head-to-head condensation. It is yet uncertain which gene product is responsible for the desaturation of phytoene to lycopene. The further pathway from lycopene to bacterioruberin has been experimentally characterized in Haloarcula japonica [257]. A three-gene cluster (crtD-lyeJ-cruF) codes for the three enzymes of this pathway. The synteny of this three gene cluster is strongly conserved, according to SyntTax analysis. Several genes that are certainly or possibly involved in carotenoid biosynthesis are encoded in the vicinity of this cluster (for details, see Supplementary Text S1 Section S7). (d) Halophilic archaea contain menaquinone as a lipid-based two-electron carrier of the respiratory chain [264,265]. We described the reconstruction of the menaquinone biosynthesis pathway (Table 7), with two pathway gaps remaining open (see Supplementary Text S1 Section S7 for details).
Table 7

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.7). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
7aNP_0604A idsA2 yes32%GACE_1337A0A0A7GEY4[266]30062607ortholog of HVO_0303 (66%); produces a C20 isoprenoid (same assignment for NP_0604A)
7aNP_0604A(cont.) idsA2 no30%APE_1764Q9YB31Q9UWR6[267]10632701produces a C25 isoprenoid (C20 assigned to NP_0604A)
7aNP_3996A idsA3 yes44%GACE_1337A0A0A7GEY4[266]30062607ortholog of HVO_2725 (67%); produces a C20 isoprenoid (same assignment for NP_3996A)
7aNP_3996A(cont.) idsA2 no36%APE_1764Q9YB31Q9UWR6[267]10632701produces a C25 isoprenoid (C20 assigned to NP_3996A)
7aNP_4556A idsA1 no34%GACE_1337A0A0A7GEY4[266]30062607no ortholog in Hfx. volcanii; produces a C20 isoprenoid (C25 assigned to NP_4556A)
7aNP_4556A(cont.) idsA1 yes29%APE_1764Q9YB31Q9UWR6[267]10632701produces a C25 isoprenoid (same assignment for NP_4556A)
7bHVO_0332 carS yes45%AF_1740O28537[268]25219966
7bHVO_1143 assA yes32%MTH_1027O27106[269]12562787gene synonym: pgsA3
7bHVO_1297 aisA yes25%MTH_1691O27726[270]19740749gene synonym: pgsA2
7bHVO_1136 pgsA1 - only distant partial matches to GSPs
7bHVO_1971 pgsA4 unclear26%MTH_1027O27106[269]12562787MTH_1027 is less distant to HVO_1143
7bHVO_0146 asd no39%SMc00551Q9FDI9[271]18708506equivalent function for the bacterial lipid
7bHVO_1295 hisC self [272]2345144complements a His auxotrophy mutant
7bHVO_1295(cont.) yes31%b2021P06986[273]2999081weak support, see text
7bHVO_1296 adk2 unclear34%PAB0757Q9UZK4[274]24823650Pyrococcus: involved in ribosome biogenesis
7bHVO_1296(cont.) unclear32%-Q9Y3D8[275]15630091human: adenylate kinase; HVO_1296 may be inositol kinase
7bHVO_2496 adk1 yes45%BSU01370P16304[276]31111079Bacillus: adenylate kinase
7bHVO_B0213-yes43%AF_1794O28480[277][278]1101522222261071Archaeoglobus: adenylate kinase
7bHVO_1135 - - a SAM-dependent methyltransferase
7cHVO_2524 crtB self [9][279]2548835829038254crtB mutants are colorless
7cHVO_2524(cont.) yes32%Synpcc7942_1984P37269[280]1537409
7cHVO_2527 lyeJ self [259]21840984
7cHVO_2527(cont.) yes65%VNG_1682COE_3380RQ9HPD9B0R651[259]21840984
7cHVO_2527(cont.) yes61%C444_12922M0L7V9[257]25712483
7cHVO_2528 crtD self [279]29038254a HVO_2528 mutant was white
7cHVO_2528(cont.) yes71%C444_12917A0A0A1GKA2[257]25712483
7cHVO_2526 cruF yes59%C444_12927A0A0A1GNF2[257]25712483
7dHVO_1470 menF yes38%PA4231Q51508[281]7500944
7dHVO_1469 menD yes37%BSU30820P23970[282]20600129
7dpathway gap EC 4.2.99.20
7dHVO_1461 menC no29%BSU12980O34508[283]11747447Ala/Glu epimerase
7dHVO_1461(cont.) yes24%BSU30780O34514[284]10194342o-succinylbenzoate synthase
7dHVO_1375 menE yes36%BSU30790P23971[285]27933791
7dHVO_1465 menB yes66%Rv0548cP9WNP5[286]20643650
7dpathway gap EC 3.1.2.28
7dHVO_1462 menA yes37%b3930P32166[287]9573170
7dHVO_0309 menG yes/no44%At3g63410Q9LY74[288]14508009A. thaliana enzyme also involved in tocopherol biosynthesis
7dHVO_0309(cont.) yes27%-O86169[289]9139683
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.7). For a description of this table, see the legend to Table 1.

3.8. Issues Concerning RNA Polymerase, Protein Translation Components and Signal Peptide Degradation

(a) Haloarchaeal RNA polymerase consists of a set of canonical subunits (encoded by rpoA1A2B1B2DEFHKLNP). Hbt. salinarum and a subset of other haloarchaea contain an additional subunit called epsilon [290,291]. Purified RNA polymerase containing the epsilon subunit transcribes native templates efficiently, in contrast to the RNA polymerase devoid of this subunit [291]. The biological relevance of this subunit is enigmatic (see Supplementary Text S1 Section S8). (b) Two distant paralogs are found for haloarchaeal ribosomal protein S10 (uS10) in nearly all haloarchaeal genomes. It is uncertain if both occur in the ribosome, whether they occur together or are mutually exclusive. The latter distribution would result in heterogeneity of the ribosomes. Alternatively, one of the paralogs may exclusively have a non-ribosomal function. In a subset of archaea, two distant paralogs are found for haloarchaeal ribosomal protein S14 (uS14) (ca 20% of the genomes, e.g., in Nmn. pharaonis). For more details, see Supplementary Text S1 Section S8. (c) The ribosomal protein L43e (eL43) shows heterogeneity with respect to the presence of the C2–C2-type zinc finger motif. This zinc finger is found in L43e from all Halobacteriales and all euryarchaeal proteins outside the order Halobacteria but is not found in Haloferacales and is very rare in Natrialbales. Eukaryotic orthologs (e.g., from rat and yeast) contain this zinc finger, and its biological importance has been experimentally shown for the yeast protein [292] (for details, see Supplementary Text S1 Section S8). (d) Diphthamide is a complex covalent modification of a histidine residue of translation elongation factor a-EF2. This pathway has been reconstructed (Table 8) based on distant homologs (enzymes encoded by dph2 and dph5) and by a detailed bioinformatic analysis (enzyme encoded by dph6) [293] (for details, see Supplementary Text S1 Section S8). These uncertain function assignments await experimental confirmation.
Table 8

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.8). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
8aOE_1279R rpoeps self [290][291]24953656852054
8bHVO_0360 rps10a yes94%rrnAC2405P23357[296]1764513
8bHVO_1392 rps10b - no GSP; 24% seq_id to HVO_0360 (rps10a)
8bNP_4882A rps14a yes72%rrnAC1597.1P26816[297]1832208full-length similarity;Haloarculaprotein was not isolated or characterized
8bNP_4882A(cont.) yes57%YDL061CP41058[298]18782943yeast YS29B;N-term 20 aa divergent
8bNP_1768A rps14b unclear80%rrnAC1597.1P26816[297]1832208N-term 20 aa divergent
8cOE_1373R rpl43e yes69%rrnAC1669P60619[299]10937989
8cOE_1373R(cont.) yes39%YPR043WP0CX25[292][300]1058889611866512
8cHVO_0654 rpl43e yes54%rrnAC1669P60619[299]10937989Haloarcula: has zinc finger;Haloferax; lacks zinc finger
8dHVO_1631 dph2 yes35%PH1105O58832[301]20931132
8dHVO_0916 dph5 yes39%PH0725O58456[302]20873788
8dHVO_1077 dph6 yes31%YLR143WQ12429[303][304]2316964423468660
8eHVO_0881 sppA1 yes33%BSU19530O34525[305][306]1045512322472423
8eHVO_1987 sppA2 probably23%BSU19530O34525[305][306]1045512322472423
8eHVO_1107 - prediction no GSP
(e) N-terminal signal sequences target proteins to the secretion machinery. Subsequent to membrane insertion or transmembrane transfer, the signal sequence is cleaved off by a signal peptidase. After cleavage, the signal peptide must be degraded to avoid clogging of the membrane. Degradation is catalyzed by signal peptide peptidase. Candidates for this activity have been predicted from two protein families [294,295] (for details, see Supplementary Text S1 Section S8). Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.8). For a description of this table, see the legend to Table 1.

3.9. Miscellaneous Metabolic Enzymes and Proteins with Other Functions

Here, we list a few other enzymatic or nonenzymatic functions for which candidate genes have been assigned but without experimental validation. (a) Ketohexokinase from Haloarcula vallismortis has been experimentally characterized [307]. However, the activity was not assigned to a gene. Detailed bioinformatic analyses have been made [308,309] and point to a small set of orthologs represented by Hmuk_2662, the ortholog of HVO_1812 (for further details, see Supplementary Text S1 Section S9). (b) The assignment of fructokinase activity to the Hht. litchfieldiae candidate gene halTADL_1913 (UniProt:A0A1H6QYL4) is based on a differential proteomic analysis [309] (see Supplementary Text S1 Section S9 for details). Very close homologs are rare in haloarchaea. For this protein family (carbohydrate kinase), it is unclear if more distant homologs (with about 50% protein sequence identity) are isofunctional. (c) A candidate gene for glucoamylase is HVO_1711 for the reasons described in Supplementary Text S1 Section S9. The enzyme from Halorubrum sodomense has been characterized [310], but the activity has not yet been assigned to a gene. (d) A strong candidate for having glucose-6-phosphate isomerase activity is Hfx. volcanii HVO_1967 (pgi), based on 36% protein sequence identity to the characterized enzyme from M. jannaschii (MJ1605) [311] (Table 9).
Table 9

Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.9). For a description of this table, see the legend to Table 1.

Gold Standard Protein
SectionCodeGeneIsofunc%seq_idLocus TagUniProtReferencePMIDComment
9aHVO_1812 - prediction no GSP
9bhalTADL_1913 - yes37%-P26984[312]1809835
9bhalTADL_1913(cont.) - yes31%OCC_03567Q7LYW8H3ZP68[313]15138858
9cHVO_1711 - probably33%-P29761[314]1633799P29761 matches to C-term half of HVO_1711
9cHVO_1711(cont.) - probably51%SAMN04487937_2677A0A1I6HD35[310]8305855correlation between PMID:8305855 and A0A1I6HD35 likely (see text)
9dHVO_1967 pgi yes36%MJ1605Q59000[311]14655001
9eOE_1665R kdgA no31%PA1010Q9I4W3[139]21396954GSP for dapA (see under 2a)
9eOE_1665R(cont.) probably30%TTX_1156.1TTX_1156aG4RJQ2[315]15869466
9eOE_1665R(cont.) probably25%SSO3197Q97U28[315]15869466
9fHVO_1692 ludB self [21]30707467
9fHVO_1692(cont.) probably35%BSU34040O07021[316]19201793matches up to HVO_1692 pos 490 of 733
9fHVO_1692(cont.) probably35%PST_3338O4VPR6[317]25917905matches up to HVO_1692 pos 400 of 733
9fHVO_1693 ludC self [21]30707467
9fHVO_1693(cont.) probably30%BSU34030O32259[316]19201793
9fHVO_1693(cont.) probably33%PST_3339O4VPR7[317]25917905partial match
9fHVO_1697 - unclear24%PST_3340O4VPR8[317]25917905
9fHVO_1696 lctP probably44%PST_3336O4VPR4[317]25917905
9gHVO_B0300 pucL1 yes49%BSU32450O32141[318]20168977Bacillus: bifunctional, matches to C-term
9gHVO_B0299 pucM yes43%BSU32460O32142[319]16098976
9gHVO_B0301 pucL2 yes43%BSU32450O32141[320]17567580Bacillus: bifunctional, matches to N-term
9gHVO_B0302 pucH1 no33%-Q8VTT5[321]12148274paper in Chinese, abstract in English;pyrimidine degradation
9gHVO_B0302(cont.) yes30%STM0523Q7CR08[322]23287969purine degradation
9gHVO_B0302(cont.) yes29%BSU32410O32137[323]11344136purine degradation
9gHVO_B0306 amaB4 no39%-Q53389[324]22904279carbamoyl-AA hydrolysis
9gHVO_B0306(cont.) yes34%At5g43600Q8VXY9[325][326]1993566123940254purine degradation
9gHVO_B0308 coxS no46%Saci_2270Q4J6M5[327]10095793GAPDH
9gHVO_B0308(cont.) no41%-P19915[328]10482497CO-DH
9gHVO_B0308(cont.) yes39%b2868Q46801[329]10986234xanthine DH
9gHVO_B0309 coxL yes33%b2866Q46799[329]10986234xanthine DH
9gHVO_B0309(cont.) no28%-P19913[328]10482497CO-DH
9gHVO_B0309(cont.) no26%Saci_2271Q4J6M3[327]10095793GAPDH
9gHVO_B0310 coxM no31%Saci_2269Q4J6M6[327]10095793GAPDH
9gHVO_B0310(cont.) no31%-P19914[328]10482497CO-DH
9gHVO_B0310(cont.) yes25%b2867Q46800[329]10986234xanthine DH
9gHVO_B0303 uraA4 yes38%b3654P0AGM9[330]16096267
9hHVO_0197 - possibly39%lp_0105F9UST0[331]27114550LarB family protein
9hHVO_2381 - possibly31%lp_0106/lp_0107F9UST1[331]27114550LarC family protein
9hHVO_0190 - possibly34%lp_0109F9UST4[331]27114550LarE family protein
9iHVO_1660 dacZ self [37]30884174
9iHVO_0756 - prediction [332]32095817
9iHVO_0990 - prediction [332]32095817
9iHVO_1690 - prediction [332]32095817
9jHVO_2763 - self [333]22350204no function could be assigned
9jHVO_2763(cont.) no27%HVO_0144D4GZ88[334]18437358Rnase Z
9kHVO_2410 dabA yes33%Hneap_0211D0KWS7[335]31406332
9kHVO_2411 dabB yes31%Hneap_0212D0KWS8[335]31406332
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.9). For a description of this table, see the legend to Table 1. (e) A candidate gene for specifying an enzyme with 2-dehydro-3-deoxy-(phospho)gluconate aldolase activity is Hbt. salinarum kdgA (OE_1665R). It is rather closely related (36% protein sequence) to Hfx. volcanii HVO_1101 (encoded by dapA), which is involved in lysine biosynthesis, a biosynthetic pathway that is absent from Hbt. salinarum. The function assignment is based on distant homologs from Saccharolobus (Sulfolobus) solfataricus and Thermoproteus tenax, which have been characterized [315] (for details, see Supplementary Text S1 Section S9). (f) Haloarchaea may contain an NAD-independent L-lactate dehydrogenase, LudBC (HVO_1692 and HVO_1693). The deletion of this gene pair impairs growth on rhamnose, which is catabolized to pyruvate and lactate [21]. There is a very distant relationship (for details, see Supplementary Text S1 Section S9) to the LldABC subunits of the characterized L-lactate dehydrogenase from Pseudomonas stutzeri A1501 [317] and to the LutABC proteins from B. subtilis, which have been shown to be involved in lactate utilization [316]. (g) Hfx. volcanii may be able to convert urate into allantoin using the gene cluster HVO_B0299-HVO_B0302. This could be part of a complete degradation pathway for purines, but this has to be considered highly speculative (see Supplementary Text S1 Section S9 and Supplementary Figure S2). (h) Hfx. volcanii may contain an enzyme having a “nickel-pincer cofactor”. The biogenesis of this cofactor may be catalyzed by larBCE (as detailed in Supplementary Text S1 Section S9). (i) Cyclic di-AMP (c-di-AMP) is an important nucleotide signaling molecule in bacteria and archaea. It is generated from two molecules of ATP by diadenylate cyclase (encoded by dacZ) and is degraded to pApA by phosphodiesterases [336]. The level of this signaling molecule is strictly controlled [337,338], thus requiring a sophisticated interplay of cyclase and phosphodiesterase. DacZ from Hfx. volcanii has been characterized, and it was shown that the c-di-AMP levels must be tightly regulated [37]. The degrading enzyme, however, has not yet been identified in Haloferax, but candidates have been proposed [332,336,339] (see Supplementary Text S1 Section S9). (j) HVO_2763 is distantly related to RNase Z (HVO_0144, rnz). The experimental characterization of HVO_2763 [333] excluded activity as an exonuclease but did not reveal its physiological function. Upon transcriptome analysis, the downregulation of several genes was detected. Several of these were uncharacterized at the time of the experiment but have since been shown to be involved in the minor N-glycosylation pathway that was initially detected under low-salt conditions (see Supplementary Text S1 Section S9 for further details). (k) A pair of genes (dabAB, HVO_2410 and HVO_2411) is predicted to function as a carbon dioxide transporter, based on the identification of such transporters in Halothiobacillus neapolitanus [335]. Being a member of the proton-conducting membrane transporter family, this protein may be misannotated as a subunit of the nuo or mrp complexes (see Supplementary Text S1 Section S9 for further details).

4. Conclusions

We described a large number of cases where the protein function cannot be correctly predicted when restricting considerations to the computational analyses without taking the biological contexts into account. An example was the switch from methanopterin to tetrahydrofolate as a C1 carrier in haloarchaea. Homologous enzymes, inherited from the common ancestor, have adapted to the new C1 carrier, rather than being replaced by non-homologous proteins. Function prediction tools may misannotate haloarchaeal proteins to work with methanopterin. Another example was the nuo complex and its misannotation as a type I NADH dehydrogenase. In other cases, even a distant sequence similarity may allow a valid function prediction if additional evidence (e.g., from a gene neighborhood analysis or from a detailed evaluation of the metabolic pathway gaps) is taken into account. Examples include cobalamin cluster proteins, which probably close the two residual pathway gaps, and the predicted degradation pathway for purines. In all these cases, we presented reasonable hypotheses based on the current knowledge, and in many cases, these were so well-supported as to be compelling, but to be certain, experimental data are required. With this overview, we attempted to arouse the curiosity of our colleagues, hoping that they will confirm or disprove our speculations and, thus, advance the knowledge about haloarchaeal biology. Hfx. volcanii is a model species for halophilic archaea, and the more complete and correctly its genome is annotated, the higher will be its value for system biology analyses (modeling) and for synthetic biology (metabolic engineering) and biotechnology.
  334 in total

1.  Isolation and characterization of C50-carotenoid pigments and other polar isoprenoids from Halobacterium cutirubrum.

Authors:  S C Kushwaha; J K Kramer; M Kates
Journal:  Biochim Biophys Acta       Date:  1975-08-25

2.  Characterization of the overproduced NADH dehydrogenase fragment of the NADH:ubiquinone oxidoreductase (complex I) from Escherichia coli.

Authors:  M Braun; S Bungert; T Friedrich
Journal:  Biochemistry       Date:  1998-02-17       Impact factor: 3.162

3.  Biosynthesis of F0, precursor of the F420 cofactor, requires a unique two radical-SAM domain enzyme and tyrosine as substrate.

Authors:  Laure Decamps; Benjamin Philmus; Alhosna Benjdia; Robert White; Tadhg P Begley; Olivier Berteau
Journal:  J Am Chem Soc       Date:  2012-10-24       Impact factor: 15.419

4.  Raman spectroscopic study of the blue copper protein halocyanin from Natronobacterium pharaonis.

Authors:  P Hildebrandt; J Matysik; B Schrader; B Scharf; M Engelhard
Journal:  Biochemistry       Date:  1994-09-27       Impact factor: 3.162

5.  The ATP:Co(I)rrinoid adenosyltransferase (CobA) enzyme of Salmonella enterica requires the 2'-OH group of ATP for function and yields inorganic triphosphate as its reaction byproduct.

Authors:  Maris V Fonseca; Nicole R Buan; Alexander R Horswill; Ivan Rayment; Jorge C Escalante-Semerena
Journal:  J Biol Chem       Date:  2002-06-21       Impact factor: 5.157

6.  The ureide-degrading reactions of purine ring catabolism employ three amidohydrolases and one aminohydrolase in Arabidopsis, soybean, and rice.

Authors:  Andrea K Werner; Nieves Medina-Escobar; Monika Zulawski; Imogen A Sparkes; Feng-Qiu Cao; Claus-Peter Witte
Journal:  Plant Physiol       Date:  2013-08-12       Impact factor: 8.340

7.  High throughput sequencing reveals a plethora of small RNAs including tRNA derived fragments in Haloferax volcanii.

Authors:  Ruth Heyer; Marcella Dörr; Angelika Jellen-Ritter; Bettina Späth; Julia Babski; Katharina Jaschinski; Jörg Soppa; Anita Marchfelder
Journal:  RNA Biol       Date:  2012-07-01       Impact factor: 4.652

8.  Archaeal type IV pili and their involvement in biofilm formation.

Authors:  Mechthild Pohlschroder; Rianne N Esquivel
Journal:  Front Microbiol       Date:  2015-03-24       Impact factor: 5.640

9.  OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs.

Authors:  Evgenia V Kriventseva; Dmitry Kuznetsov; Fredrik Tegenfeldt; Mosè Manni; Renata Dias; Felipe A Simão; Evgeny M Zdobnov
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

10.  The nuts and bolts of the Haloferax CRISPR-Cas system I-B.

Authors:  Lisa-Katharina Maier; Aris-Edda Stachler; Jutta Brendel; Britta Stoll; Susan Fischer; Karina A Haas; Thandi S Schwarz; Omer S Alkhnbashi; Kundan Sharma; Henning Urlaub; Rolf Backofen; Uri Gophna; Anita Marchfelder
Journal:  RNA Biol       Date:  2018-05-21       Impact factor: 4.652

View more
  3 in total

1.  Comparative Analysis of rRNA Removal Methods for RNA-Seq Differential Expression in Halophilic Archaea.

Authors:  Mar Martinez Pastor; Saaz Sakrikar; Deyra N Rodriguez; Amy K Schmid
Journal:  Biomolecules       Date:  2022-05-10

2.  Diversity, taxonomy, and evolution of archaeal viruses of the class Caudoviricetes.

Authors:  Ying Liu; Tatiana A Demina; Simon Roux; Pakorn Aiewsakun; Darius Kazlauskas; Peter Simmonds; David Prangishvili; Hanna M Oksanen; Mart Krupovic
Journal:  PLoS Biol       Date:  2021-11-09       Impact factor: 8.029

3.  A crowdsourcing open platform for literature curation in UniProt.

Authors:  Yuqi Wang; Qinghua Wang; Hongzhan Huang; Wei Huang; Yongxing Chen; Peter B McGarvey; Cathy H Wu; Cecilia N Arighi
Journal:  PLoS Biol       Date:  2021-12-06       Impact factor: 8.029

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.