| Literature DB >> 34202810 |
Friedhelm Pfeiffer1, Mike Dyall-Smith1,2.
Abstract
BACKGROUND: Annotation ambiguities and annotation errors are a general challenge in genomics. While a reliable protein function assignment can be obtained by experimental characterization, this is expensive and time-consuming, and the number of such Gold Standard Proteins (GSP) with experimental support remains very low compared to proteins annotated by sequence homology, usually through automated pipelines. Even a GSP may give a misleading assignment when used as a reference: the homolog may be close enough to support isofunctionality, but the substrate of the GSP is absent from the species being annotated. In such cases, the enzymes cannot be isofunctional. Here, we examined a variety of such issues in halophilic archaea (class Halobacteria), with a strong focus on the model haloarchaeon Haloferax volcanii.Entities:
Keywords: Gold Standard Protein; Haloferax volcanii; annotation error; genome annotation; haloarchaea
Mesh:
Substances:
Year: 2021 PMID: 34202810 PMCID: PMC8305020 DOI: 10.3390/genes12070963
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.1).
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 1a | HVO_1305 |
| yes | 67% | OE2623R | B0R4 × 6 | [ | 1555599 | |
| 1a | HVO_0888 |
| yes | 77% | OE1711R | B0R3G0 | [ | 6266826 | |
| 1a/1b | HVO_2995 |
| yes | 88% | OE4217R | B0R7I9 | [ | 964365 | role in oxidative decarboxylation |
| 1a/1b | HVO_2995 | self | D4GY89 | [ | 22103537 | role in nitrate assimilation | |||
| 1c | HVO_0979 |
| possibly | 50% | tlr0705 | Q8DKZ4 | [ | 15910282 | reoxidizes ferredoxin |
| 1c | HVO_0979 | no | 48% | b2287 | P0AFC7 | [ | 7607227 | reoxidizes NADH in | |
| 1d | NP_3508A |
| special | 26% (N-term 140 aa) | - | Q7ZAG8 | function of Q7ZAG8 was reassigned (from ndh1 to sqr) after annotation transfer | ||
| 1d | NP_3508A | possibly | 30% | BpOF4_04810 | A7LKG4 | [ | 18359284 | type II NADH dehydrogenase | |
| 1e | HVO_2620 |
| yes | 39% | SYNPCC7002_ | P28056 | [ | 11245788 | HVO_0842 ( |
| 1f | HVO_2810 |
| yes | 66% | NP_4268A | Q3INS7 | [ | 9109654 | |
| 1g | HVO_0943 |
| yes | 57% | NP_2966A | A0A1U7EWW4 | [ | 9428682 | |
| HVO_0943 | - | 63% | OE_4073R | B0R7A9 | - | halocyanin/cbaD fusion protein, uncharacterized | |||
| 1g | HVO_2150 |
| - | 44% | OE_4073R | B0R7A9 | - | halocyanin/cbaD fusion protein, uncharacterized | |
| 1h | HVO_0945 |
| yes | 64% | NP_2966A | A0A1U7EWW4 | [ | 9428682 | |
| 1h | HVO_0907 |
| self | [ | 11790755 | ||||
| 1h | HVO_0907 | yes | 70% | VNG_0657G (OE_1979R) | P33588 | [ | 2542239 | ||
| 1h | HVO_1645 |
| yes | 43% | APE_0793.1 | Q9YdX6 | [ | 12471503 | |
| 1h | HVO_0462 |
| yes | 32% | - | Q09049 | [ | 1655703 | |
| 1h | HVO_0462 | yes | 30% | b0733 | P0ABJ9 | [ | 6307994 | ||
| 1h | NP_4296A |
| yes | 28% | TTHA1135 | Q5SJ79 | [ | 2842747 | |
| 1i | HVO_2958 |
| self | D4GY15 | [ | 19910413 | Ile indirectly assigned as substrate | ||
| 1i | HVO_2958 | self | [ | 10832633 | no substrate was identified; pyruvate and alphaKG excluded | ||||
| 1i | HVO_2595 |
| self | [ | 12003954 | no substrate was identified; pyruvate and alphaKG excluded | |||
| 1i | HVO_0669 |
| self | [ | 17906130 | no substrate was identified; pyruvate and alphaKG excluded | |||
| 1i | HVO_2209 |
| self | not yet analyzed experimentally | |||||
| 1i | HVO_2958 | yes/no | 38% | TA1438 | Q9HIA3 | [ | 17894823 | substrates are Ile, Leu, Val | |
| 1i | HVO_2595 | no | 41% | - | Q57102 | [ | 1898934 | substrate is acetoin | |
| 1i | HVO_2595 | unknown | 40% | BSU08060 | O31404 | [ | 10368162 | substrate is acetoin | |
| 1i | HVO_0669 | unknown | 54% | BSU08060 | O31404 | [ | 10368162 | substrate is acetoin | |
| 1i | HVO_0669 | unknown | 49% | - | Q57102 | [ | 1898934 | substrate is acetoin | |
| 1i | HVO_2209 | unknown | 38% | TA1438 | Q9HIA3 | [ | 17894823 | substrates are Ile, Leu, Val | |
The column Section refers to the table listing the protein and to the section in the Results and in Supplementary Text S1. As an example, 2c covers topic (c) from the decimal-numbered Results Section 3.2. Amino Acid Biosynthesis. In Supplementary Text S1, this is covered under Section S2 subsection S2.c. The corresponding proteins are listed in Table 2. For a few proteins, two sections are indicated (e.g., 1a/1b). The column Code refers to a haloarchaeal protein by its locus tag, which is mainly from Haloferax volcanii (HVO) but, also, from Halobacterium salinarum (OE), Natronomonas pharaonis (NP) and Halohasta litchfieldiae (halTADL). When the reconstruction of a complete pathway is presented, the unassigned genes are indicated as a “pathway gap”. In one case, we indicate the absence of a haloarchaeal ortholog by a dash. In the case of a complex, we either list more than one code or we list only one subunit together with the term (complex). All subunits of these complexes are listed groupwise in Table S1. A protein may be shown in more than one row. From the 2nd row onwards, this is indicated by the term (cont.). The column Gene lists the assigned gene or a dash if no gene has been assigned. The assigned gene is only indicated in the first row of a protein. A set of four columns is used to relate a query protein to an experimentally characterized homolog, a GSP (Gold Standard Protein) (isofunc, %seq_id, Locus tag, UniProt). The column isofunc indicates if the query protein and its Gold Standard Protein homolog are isofunctional. The meanings of the terms used in this column in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 (yes, no, yes/no, probably, possibly, unclear, unknown, prediction, special and “-“) are described at the end of this legend. The column %seq_id indicates the protein sequence identity between the query protein and the homologous GSP. The column Locus tag contains the locus tag, if assigned. The column UniProt contains the UniProt accession of the GSP. GSPs are experimentally characterized as described in a publication. The column Reference links to the reference list of the manuscript. The column PMID lists the PubMed ID of the publication, if available. Otherwise, this is indicated as “not in PubMed”. Additionally, one PhD thesis is indicated (PhD_Mattar). The column Comment provides various types of additional information. The terms used in the column isofunc in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 have the following meanings: The term “yes” indicates that we consider the two proteins as isofunctional and annotate the query protein accordingly. The term “no” is used when we conclude that the proteins differ in function. Additional terms are used for more difficult cases. The term “yes/no” is used for GSPs that are multifunctional, and we assign only a subset of these functions to the query protein. The term “probably” is used when we consider it likely that the proteins are isofunctional and annotated the query protein accordingly (with the term probable added to the protein name). The term “possibly” is used when we see a good chance that the proteins are isofunctional but consider it too speculative to annotate the protein accordingly. The term “unclear” is used when we consider it likely that the same overall reaction is catalyzed but when reaction details, e.g., the energy-providing compound, are unresolved. The term “unknown” is used when it is not possible to predict the substrate of the query protein. The term “prediction” is used if a function assignment is based on bioinformatic analyses but not yet on an experimentally characterized homologous protein. The term “special” is used when multiple arguments have to be considered, with the full details provided in the corresponding section of Supplementary Text S1. Finally, a hyphen (“-“) is used when isofunctionality does not apply, e.g., when a homologous Gold Standard Protein could not be identified.
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.2). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | isofunc | %seq_id | Locus tag | UniProt | Reference | PMID | Comment |
| 2a | HVO_0047 |
| no | 54% | TT_C1544 | Q72HE5 | [ | 25392000 | for Arg, not for Lys biosynthesis |
| 2a | HVO_0047 | yes/no | 39% | Saci_0753 | Q4JAQ0 | only for Arg, not for Lys biosynthesis | |||
| 2a | HVO_0047 | yes/no | 61% | TK0279 | Q5JFV9 | [ | 27566549 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0046 |
| no | 44% | TT_C1543 | Q72HE6 | [ | 19620981 | for Arg, not for Lys biosynthesis |
| 2a | HVO_0046 | yes | 30% | Saci_1621 | Q4J8E7 | only for Arg, not for Lys biosynthesis | |||
| 2a | HVO_0046 | yes/no | 37% | TK0278 | Q5JFW0 | [ | 27566549 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0044 |
| no | 41% | TT_C1541 | O50147 | [ | 19620981 | for Arg, not for Lys biosynthesis |
| 2a | HVO_0044 | yes/no | 33% | Saci_0751 | Q4JAQ2 | [ | 23434852 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0044 | yes/no | 32% | TK0276 | Q5JFW2 | [ | 27566549 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0045 |
| no | 48% | TT_C1542 | O50146 | [ | 19620981 | for Arg, not for Lys biosynthesis |
| 2a | HVO_0045 | yes/no | 42% | Saci_0750 | Q4JAQ3 | [ | 23434852 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0045 | yes/no | 46% | TK0277 | Q5JFW1 | [ | 27566549 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0043 |
| no | 45% | TT_C1393 | Q93R93 | [ | 11489859 | for Arg, not for Lys biosynthesis |
| 2a | HVO_0043 | yes/no | 40% | Saci_0755 | Q4JAP8 | [ | 23434852 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0043 | yes/no | 42% | TK0275 | Q5JFW3 | [ | 27566549 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0042 |
| no | 36% | TT_C1396 | Q8VUS5 | [ | 19620981 | for Arg, not for Lys biosynthesis |
| 2a | HVO_0042 | yes/no | 29% | Saci_0756 | Q4JAP7 | [ | 23434852 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0042 | yes/no | 37% | TK0274 | Q5JFW4 | [ | 27566549 | only for Arg, not for Lys biosynthesis | |
| 2a | HVO_0041 |
| yes | 50% | P18186 | BSU11250 | [ | 4216455 | |
| 2a | HVO_0041 | yes | 47% | OE_5205R | B0R9X3 | [ | 7868583 | ||
| 2a | HVO_0049 |
| yes | 35% | - | P00966 | [ | 8792870 | human |
| 2a | HVO_0049 | yes | 23% | b3172 | P0A6E4 | [ | 10666579 |
| |
| 2a | HVO_0048 |
| yes | 38% | MMP0013 | O74026 | [ | 10220900 | |
| 2a | HVO_0008 |
| yes | 32% | BSU28470 | P08495 | [ | 15033471 | |
| 2a | HVO_2487 |
| yes | 51% | MJ0205 | Q57658 | [ | 16225889 | |
| 2a/9e | HVO_1101 |
| yes | 45% | PA1010 | Q9I4W3 | [ | 21396954 | |
| 2a | HVO_1100 |
| yes | 33% | b0031 | P04036 | [ | 7893644 | |
| 2a | HVO_1099 |
| yes | 32% | b0166 | P0A9D8 | [ | 6365916 | |
| 2a | HVO_1096 |
| yes | 29% | b2472 | P0AED7 | [ | 3276674 | function supported by gene clustering |
| 2a | HVO_1097 |
| yes | 35% | b3809 | P0A6K1 | [ | 6378903 | |
| 2a | HVO_1098 |
| yes | 38% | b2838 | P00861 | [ | 14343156 | |
| 2a | HVO_A0634 | - | unknown | 25% | b2472 | P0AED7 | [ | 3276674 | function assigned to HVO_1096 in |
| 2b | HVO_0790 |
| special | 67% | OE_1472F | B0R334 | [ | 25216252 | EC 2.2.1.10 activity of OE_1472F not yet confirmed in vitro |
| 2b | HVO_0790 | special | 45% | MJ0400 | Q57843 | [ | 15182204 | substrate uncertain | |
| 2b | HVO_0792 |
| yes | 69% | OE_1475F | B0R336 | [ | 25216252 | OE_1475F only partially characterized |
| 2b | HVO_0792 | yes | 44% | MJ1249 | Q58646 | [ | 15182204 | ||
| 2b | HVO_0602 |
| yes | 44% | OE_1477R | B0R338 | [ | 25216252 | |
| 2b | HVO_0602 | yes | 31% | MMP1394 | Q6LXF7 | [ | 15262931 | ||
| 2c | HVO_0009 |
| yes | 41% | b3708 | P0A853 | [ | 2659590 | |
| 2d | HVO_A0559 |
| yes | 42% | BSU39350 | P10944 | [ | 2454913 | |
| 2d | HVO_A0562 |
| yes | 62% | BSU39360 | P25503 | [ | 4990470 | |
| 2d | HVO_A0560 |
| yes | 42% | BSU39370 | P42084 | [ | 16990261 | |
| 2d | HVO_A0561 |
| yes | 33% | BSU39380 | P42068 | [ | 4990470 | |
| 2e | HVO_0431 | - | - | no GSP available | |||||
| 2e | HVO_0644 |
| yes/no | 47% | MJ1392 | Q58787 | [ | 9864346 | HVO_0644 monofunc (CimA) or bifunc (CimA+LeuA); |
| 2e | HVO_0644 | unclear | 44% | MJ1195 | Q58595 | [ | 9665716 | HVO_0644 monofunc (CimA) or bifunc (CimA+LeuA); | |
| 2e/2f | HVO_1510 |
| yes | 47% | MJ1195 | Q58595 | [ | 9665716 | HVO_1510 LeuA; MJ1195 LeuA |
| 2e/2f | HVO_1510 | no | 41% | MJ1392 | Q58787 | [ | 9864346 | HVO_1510 LeuA | |
| 2e | HVO_A0489 |
| no | 31% | MJ1392 | Q58787 | [ | 9864346 | HVO_A0489 general function only; |
| 2e | HVO_A0489 | no | 30% | MJ1195 | Q58595 | [ | 9665716 | HVO_A0489 general function only; | |
| 2e | HVO_1153 | - | - | function unassigned; | |||||
Figure 1Illustration of the haloarchaeal cobalamin and heme biosynthesis pathways and of the major cobalamin biosynthesis gene cluster. (A) Biosynthesis pathways. This illustration is based on the corresponding KEGG map 00860. Small circles represent pathway intermediates and have their names assigned. Pathway intermediates upstream of precorrin-2 are not displayed. The circle for sirohydrochlorin is highlighted in red, as this is the branchpoint for heme and cobalamin biosynthesis in haloarchaea. Enzymatic reactions are shown by arrows, the EC numbers being provided in rectangular boxes. Rectangles are colored when the enzyme has been reconstructed for haloarchaea (blue: heme biosynthesis; dark yellow: de novo cobalamin biosynthesis; light yellow: late cobaltochelatase, which may be a salvage reaction). Gene names in green are adopted from KEGG and represent those from bacterial model pathways. Consecutive arrowheads indicate reaction series that are not shown in detail for space reasons. Additionally, some enzymes of the heme biosynthesis pathway are omitted for space reasons. For enzymatic reactions that are considered to be open issues, Hfx. volcanii locus tags are provided. For two pathway gaps (white boxes in the cobalt-early pathway), the type of reaction is indicated (oxidoreductase and ~CH3, indicating a methylation reaction). The question mark after HVO_B0058 indicates that this protein, currently co-attributed to EC 2.1.1.272, is a candidate for the yet-unassigned EC 2.1.1.195 reaction. We note that haloarchaea might use a deviating biosynthesis pathway, e.g., by swapping the methylation and oxidoreductase reactions (not illustrated). (B) The major cobalamin cluster, encoded on megaplasmid pHV3. Arrows are used to indicate the coding strand and are roughly drawn to scale. If assigned, the gene name is provided in addition to the Hfx. volcanii locus tag. Locus tags in red indicate genes that are part of the cobalamin cluster.
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.3). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 3a | HVO_B0054 |
| yes | 30% | - | O87690 | [ | 12408752 | cobaltochelatase |
| 3a | HVO_B0054 | yes | 27% | MTH_1397 | O27448 | [ | 12686546 | cobaltochelatase | |
| 3a | HVO_1128 |
| no | 29% | AF0721 | O29537 | [ | 16835730 | cobaltochelatase |
| 3a | HVO_1128 | no | 28% | MTH_1397 | O27448 | [ | 12686546 | cobaltochelatase | |
| 3a | HVO_1128 | no | 29% | AF0721 | O29537 | [ | 16835730 | cobaltochelatase | |
| 3a | NP_0734A |
| - | function unassigned; | |||||
| 3a | HVO_2312 |
| yes/no | 31% | Mbar_A1461 | Q46CH4 | [ | 21197080 | precorrin-2 DH; no analysis for Fe-chelatase |
| 3a | HVO_2312 | yes/no | 29% | STM3477 | P25924 | [ | 14595395 | matches to the N-term domain which is bifunctional as precorrin-2 DH and Fe-chelatase | |
| 3a | HVO_2312 | yes/no | 29% | - | P61818 | [ | 12408752 | precorrin-2 DH; devoid of Fe-chelatase activity | |
| 3b | HVO_B0061 |
| no | 32% | STM2024 | Q05593 | [ | 1451790 | equivalent reaction on cobalt-free substrate |
| 3b | HVO_B0057 |
| yes | 45% | - | O87689 | [ | 23922391 | corresponds to N-term of O87689 which has a C-term extension |
| 3b | HVO_B0057 | no | 40% | STM2027 | Q05590 | [ | 9331403 | equivalent reaction on cobalt-free substrate | |
| 3b | HVO_B0058 |
| special | 32% | - | O87689 | [ | 23922391 | corresponds to N-term of O87689 which has a C-term extension; more distant to O87689 than CbiH2 |
| 3b | HVO_B0058 | no | 30% | STM2027 | Q05590 | [ | 9331403 | equivalent reaction on cobalt-free substrate | |
| 3b | HVO_B0060 |
| no | 40% | STM2029 | P0A2G9 | [ | 1451790 | equivalent reaction on cobalt-free substrate |
| 3b | HVO_B0060 | yes | 38% | - | O87686 | [ | 23922391 | ||
| 3b | HVO_B0059 |
| yes | 24% | - | O87687 | [ | 23922391 | |
| 3b | pathway gap | EC 2.1.1.195 | |||||||
| 3b | pathway gap | EC 1.3.1.106 | |||||||
| 3b | HVO_B0062 |
| yes | 36% | - | O87694 | [ | 23922391 | corresponds to the C-term of bifunctional O87694 |
| 3b | HVO_B0048 |
| yes | 28% | - | O87694 | [ | 23922391 | corresponds to the N-term of bifunctional O87694 |
| 3b | HVO_B0049 |
| yes | 33% | - | O87692 | [ | 23922391 | |
| 3b | HVO_A0487 |
| no | 37% | STM2035 | P29946 | [ | 15311923 | equivalent reaction on cobalt-free substrate |
| 3b | HVO_B0052 | - | - | function unassigned; | |||||
| 3b | HVO_B0053 | - | - | function unassigned; | |||||
| 3b | HVO_B0055 | - | - | function unassigned; | |||||
| 3b | HVO_B0056 | - | - | function unassigned; | |||||
| 3c | HVO_A0488 |
| yes | 31% | MM_3138 | Q8PSE1 | [ | 16672609 | |
| 3c | HVO_A0488 | yes | 30% | STM1718 | P31570 | [ | 12080060 | ||
| 3c | HVO_2395 |
| yes | 37% | - | Q9XDN2 | [ | 11160088 | PduO and CobA are isofunctional; |
| 3c | HVO_A0553 |
| yes | 63% | VNG_1576G | Q9HPL5 | [ | 14645280 | |
| 3c | HVO_0587 |
| yes | 58% | VNG_1578H | Q9HPL3 | [ | 14645280 | |
| 3c | HVO_0592 |
| yes | 57% | VNG_1583C | Q9HPL3 | [ | 14990804 | |
| 3c | HVO_0589 |
| yes | 47% | VNG_1581C | Q9HPL1 | [ | 12486068 | |
| 3c | HVO_0588 |
| yes | 30% | STM2017 | Q05602 | [ | 17209023 | |
| 3c | - | STM0643 | P39701 | [ | 7929373 | EC 3.1.3.73; CobC; no homolog in haloarchaea | |||
| 3c | HVO_0586 |
| prediction | - | - | - | [ | 12869542 | EC 3.1.3.73; prediction for HSL01294 (VNG_1577C) |
| 3c | pathway gap | EC 2.7.1.177 | |||||||
| 3c | HVO_0591 |
| yes | 31% | STM0644 | P97084 | [ | 9446573 | |
| 3c | HVO_0593 |
| yes | no GSP; 51% seq_id to HVO_0591 ( | |||||
| 3c | HVO_0590 |
| prediction | [ | 12869542 | prediction for VNG_1572C | |||
| 3c | halTADL_3045 |
| yes | 39% | STM0644 | Q05603 | [ | 8206834 | |
| 3d | HVO_B0051 |
| yes | 34% | - | P29929 | [ | 1429466 | |
| 3d | HVO_B0051 | no | 29% | - | Q55284 | [ | 8663186 | Mg chelatase | |
| 3d | HVO_B0050 |
| no | 46% | slr1030 | P51634 | [ | 8663186 | match to N-term; |
| 3d | HVO_B0050 | no | 33% | slr1777 | P52772 | [ | 8663186 | match to complete sequence, incl distant match to N-term; | |
| 3e | HVO_2227 |
| yes | 35% | - | I6UH61 | [ | 21969545 | |
| 3e | HVO_2313 |
| yes | 32% | - | I6UH61 | [ | 21969545 | |
| 3f | HVO_1121 |
| yes | 47% | Mbar_A1793 | Q46BK8 | [ | 21969545 | |
| 3f | HVO_2144 |
| self | [ | 29284023 | EC 1.3.98.6 | |||
| 3f | HVO_2144 | yes | 42% | Mbar_A1458 | Q46CH7 | [ | 24669201 | ||
| 3f | HVO_1871 |
| self | [ | 29284023 | EC 1.3.98.5 | |||
| 3f | HVO_1871 | yes | 46% | BSU37670 | P39645 | [ | 28123057 | ||
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.4). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 4a | HVO_2198 |
| yes | 35% | MJ1431 | Q58826 | [ | 14593448 | |
| 4a | HVO_2201 |
| yes | 43% | MJ0446 | Q57888 | [ | 14593448 | |
| 4a | HVO_2202 |
| yes | 25% | MJ0887 | Q58297 | [ | 18260642 | |
| 4a | HVO_2479 |
| yes | 39% | MM_1874 | Q8PVT6 | [ | 18252724 | |
| 4a | HVO_2479 | yes | 32% | MJ1256 | Q58653 | [ | 11888293 | ||
| 4a | HVO_1936 |
| yes | 47% | AF_2256 | O28028 | [ | 17669425 | |
| 4a | HVO_1936 | yes | 38% | MJ0768 | Q58178 | [ | 12911320 | ||
| 4b | HVO_0433 |
| yes | 38% | AF_0892 | O29370 | [ | not in PubMed | |
| 4b | HVO_B0113 | - | no | 27% | Rv0132c | P96809 | [ | 24349169 | too distant to assume isofunctionality |
| 4b | HVO_B0342 | - | unknown | 29% | - | O93734 | [ | 8706724 | too distant to assume isofunctionality |
| 4b | NP_1902A | - | no | 28% | - | Q9UXP0 | [ | 1735436 | too distant to assume isofunctionality |
| 4b | NP_4006A | - | no | 27% | MJ0870 | Q58280 | [ | 16048999 | too distant to assume isofunctionality |
| 4c/5c | HVO_1937 |
| no | 38% | MTH_1752 | O27784 | [ | 2298726 | |
| 4d | HVO_2911 |
| yes | 62% | VNG_1335G | Q9HQ46 | [ | 2681164 | |
| 4d | HVO_2843 |
| no | 45% | sll1629 | P77967 | [ | 12535521 | sll1629 implicated in transcription regulation |
| 4d | HVO_2843 | possibly | 45% | At5g24850 | Q84KJ5 | [ | 12834405 | mediates photo-repair of ssDNA | |
| 4d | HVO_1234 |
| possibly | 40% | Atu4765 | A9CH39 | [ | 23589886 | |
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.5). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 5a | HVO_0709 |
| no | 47% | TTHA1843 | P05379 | [ | 2844259 | Trp biosynthesis |
| 5a | HVO_0709 | yes/no | 39% | BSU00750 | P28819 | [ | 2123867 | TrpG works with TrpE and with PabB | |
| 5a | HVO_0710 |
| no | 46% | TTHA1844 | P05378 | [ | 2844259 | Trp biosynthesis |
| 5a | HVO_0710 | yes | 44% | BSU00740 | P28820 | [ | 19275258 | PabB; para-aminobenzoate biosynthesis | |
| 5a | HVO_0708 |
| no | 36% | AF_0933 | O29329 | [ | 30733943 | branched-chain amino acids |
| 5b | HVO_2348 |
| self | [ | 19478918 | gene deletion phenotypes | |||
| 5b | HVO_2348 | yes | 41% | MJ0775 | Q58185 | [ | 17497938 | common part of methanopterin and tetrahydrofolate biosynthesis | |
| 5b | HVO_A0533 | - | unknown | 27% | MJ0837 | Q58247 | [ | 19746965 | if isofunctional would resolve a pathway gap |
| 5b | HVO_2628 | - | no | 31% | AF_2089 | O28190 | [ | 12142414 | first committed step to methanopterin biosynthesis |
| 5b | HVO_2628 | no | 26% | MJ1427 | Q58822 | [ | 15262968 | first committed step to methanopterin biosynthesis | |
| 5c | HVO_2573 |
| no | 45% | MK0625 | P94954 | [ | 9676239 | acts on a one-carbon attached to methanopterin |
| 4c/5c | HVO_1937 |
| no | 38% | MTH_1752 | O27784 | [ | 2298726 | acts on a one-carbon compound attached to methanopterin |
Figure 2The structure of the C1 coenzymes tetrahydrofolate and methanopterin and two enzymes that act on the attached C1 compound. (A) The structures of tetrahydromethanopterin (top) and tetrahydrofolate (bottom) illustrate the similarities and differences between these C1 coenzymes. The common pteridine-based ring system is highlighted in yellow, and the initial biosynthesis step that generates this ring system is catalyzed by homologous enzymes (topic (b)). Two methanopterin-specific methyl groups are outlined by dashed ovals. N5 and N10, which are involved in the binding of the C1 compound, are colored red. (B) Two enzymatic reactions that alter the oxidation level of the C1 compound are illustrated. The methanogenic and haloarchaeal enzymes are homologous, even though they use distinct C1 coenzymes (topic (c)). It should be noted that MTH-1752 uses coenzyme F420 (not illustrated, Section 3.4, topic (c)), and this might also hold true for HVO_1937.
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.6). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 6a | HVO_2363 |
| unclear | 37% | Rv1695 | P9WHV7 | [ | 11006082 | can use ATP and PP |
| 6a | HVO_2363 | unclear | 31% | AF_2373 | O30297 | ATP or PP usage unresolved | |||
| 6a | HVO_0837 |
| unclear | 28% | Rv1695 | P9WHV7 | can use ATP and PP | ||
| 6a | HVO_0837 | unclear | partial | AF_2373 | O30297 | ATP or PP usage unresolved | |||
| 6b | HVO_0782 |
| yes | 53% | MJ0541 | Q57961 | [ | 9401030 | |
| 6b | HVO_0781 |
| unknown | 42% | Sare_1364 | A8M783 | [ | 18720493 | |
| 6b | HVO_0781 | unknown | 35% | PH0463 | O58212 | [ | 18551689 | ||
| 6c | HVO_0327 |
| yes | 43% | MJ0055 | Q60364 | [ | 12200440 | |
| 6c | HVO_0974 |
| yes | 45% | MJ0303 | Q57751 | [ | 12603336 | |
| 6c | HVO_1284 |
| self | [ | 21999246 | gene deletion leads to riboflavin auxotrophy | |||
| 6c | HVO_1284 | yes | 44% | MJ0145 | Q57609 | [ | 12475257 | ||
| 6c | HVO_1235 | - | prediction | [ | 28073944 | ||||
| 6c | HVO_1341 |
| yes | 36% | MJ0671 | Q58085 | [ | 11889103 | |
| 6c | HVO_2483 |
| prediction | 34% | MJ0699 | Q58110 | [ | 28073944 | also predicted for MJ0699 |
| 6c | pathway gap | EC 3.1.3.104 | |||||||
| 6c | HVO_0326 |
| yes | 37% | TA1064 | Q9HJA6 | [ | 28073944 | bifunctional as gene regulator and enzyme |
| 6c | HVO_0326 | yes/no | 32% | MJ0056 | Q60365 | [ | 18073108 | enzyme only; lacks an N-terminal HTH domain | |
| 6c | HVO_1015 |
| yes | 50% | MJ1179 | Q58579 | [ | 20822113 | |
Figure 3Biosynthesis of polar lipids. A key intermediate is CDP-archaeol, which is generated from archaeol (displayed as fully saturated) by CarS. Members of the InterPro:IPR000462 family then transfer the CDP-archaeol to the hydroxyl group (alcohol group) of the target molecule (backbone: serine, glycerol and myo-inositol). Subsequent modifications contribute to the diversity of polar lipids.
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.7). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 7a | NP_0604A |
| yes | 32% | GACE_1337 | A0A0A7GEY4 | [ | 30062607 | ortholog of HVO_0303 (66%); produces a C20 isoprenoid (same assignment for NP_0604A) |
| 7a | NP_0604A |
| no | 30% | APE_1764 | Q9YB31 | [ | 10632701 | produces a C25 isoprenoid (C20 assigned to NP_0604A) |
| 7a | NP_3996A |
| yes | 44% | GACE_1337 | A0A0A7GEY4 | [ | 30062607 | ortholog of HVO_2725 (67%); produces a C20 isoprenoid (same assignment for NP_3996A) |
| 7a | NP_3996A |
| no | 36% | APE_1764 | Q9YB31 | [ | 10632701 | produces a C25 isoprenoid (C20 assigned to NP_3996A) |
| 7a | NP_4556A |
| no | 34% | GACE_1337 | A0A0A7GEY4 | [ | 30062607 | no ortholog in |
| 7a | NP_4556A |
| yes | 29% | APE_1764 | Q9YB31 | [ | 10632701 | produces a C25 isoprenoid (same assignment for NP_4556A) |
| 7b | HVO_0332 |
| yes | 45% | AF_1740 | O28537 | [ | 25219966 | |
| 7b | HVO_1143 |
| yes | 32% | MTH_1027 | O27106 | [ | 12562787 | gene synonym: |
| 7b | HVO_1297 |
| yes | 25% | MTH_1691 | O27726 | [ | 19740749 | gene synonym: |
| 7b | HVO_1136 |
| - | only distant partial matches to GSPs | |||||
| 7b | HVO_1971 |
| unclear | 26% | MTH_1027 | O27106 | [ | 12562787 | MTH_1027 is less distant to HVO_1143 |
| 7b | HVO_0146 |
| no | 39% | SMc00551 | Q9FDI9 | [ | 18708506 | equivalent function for the bacterial lipid |
| 7b | HVO_1295 |
| self | [ | 2345144 | complements a His auxotrophy mutant | |||
| 7b | HVO_1295 | yes | 31% | b2021 | P06986 | [ | 2999081 | weak support, see text | |
| 7b | HVO_1296 |
| unclear | 34% | PAB0757 | Q9UZK4 | [ | 24823650 | |
| 7b | HVO_1296 | unclear | 32% | - | Q9Y3D8 | [ | 15630091 | human: adenylate kinase; HVO_1296 may be inositol kinase | |
| 7b | HVO_2496 |
| yes | 45% | BSU01370 | P16304 | [ | 31111079 | |
| 7b | HVO_B0213 | - | yes | 43% | AF_1794 | O28480 | [ | 11015222 | |
| 7b | HVO_1135 |
| - | a SAM-dependent methyltransferase | |||||
| 7c | HVO_2524 |
| self | [ | 25488358 | ||||
| 7c | HVO_2524 | yes | 32% | Synpcc7942 | P37269 | [ | 1537409 | ||
| 7c | HVO_2527 |
| self | [ | 21840984 | ||||
| 7c | HVO_2527 | yes | 65% | VNG_1682C | Q9HPD9 | [ | 21840984 | ||
| 7c | HVO_2527 | yes | 61% | C444_12922 | M0L7V9 | [ | 25712483 | ||
| 7c | HVO_2528 |
| self | [ | 29038254 | a HVO_2528 mutant was white | |||
| 7c | HVO_2528 | yes | 71% | C444_12917 | A0A0A1GKA2 | [ | 25712483 | ||
| 7c | HVO_2526 |
| yes | 59% | C444_12927 | A0A0A1GNF2 | [ | 25712483 | |
| 7d | HVO_1470 |
| yes | 38% | PA4231 | Q51508 | [ | 7500944 | |
| 7d | HVO_1469 |
| yes | 37% | BSU30820 | P23970 | [ | 20600129 | |
| 7d | pathway gap | EC 4.2.99.20 | |||||||
| 7d | HVO_1461 |
| no | 29% | BSU12980 | O34508 | [ | 11747447 | Ala/Glu epimerase |
| 7d | HVO_1461 | yes | 24% | BSU30780 | O34514 | [ | 10194342 | o-succinylbenzoate synthase | |
| 7d | HVO_1375 |
| yes | 36% | BSU30790 | P23971 | [ | 27933791 | |
| 7d | HVO_1465 |
| yes | 66% | Rv0548c | P9WNP5 | [ | 20643650 | |
| 7d | pathway gap | EC 3.1.2.28 | |||||||
| 7d | HVO_1462 |
| yes | 37% | b3930 | P32166 | [ | 9573170 | |
| 7d | HVO_0309 |
| yes/no | 44% | At3g63410 | Q9LY74 | [ | 14508009 | |
| 7d | HVO_0309 | yes | 27% | - | O86169 | [ | 9139683 | ||
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.8). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 8a | OE_1279R |
| self | [ | 2495365 | ||||
| 8b | HVO_0360 |
| yes | 94% | rrnAC2405 | P23357 | [ | 1764513 | |
| 8b | HVO_1392 |
| - | no GSP; 24% seq_id to HVO_0360 ( | |||||
| 8b | NP_4882A |
| yes | 72% | rrnAC1597.1 | P26816 | [ | 1832208 | full-length similarity; |
| 8b | NP_4882A | yes | 57% | YDL061C | P41058 | [ | 18782943 | yeast YS29B; | |
| 8b | NP_1768A |
| unclear | 80% | rrnAC1597.1 | P26816 | [ | 1832208 | N-term 20 aa divergent |
| 8c | OE_1373R |
| yes | 69% | rrnAC1669 | P60619 | [ | 10937989 | |
| 8c | OE_1373R | yes | 39% | YPR043W | P0CX25 | [ | 10588896 | ||
| 8c | HVO_0654 |
| yes | 54% | rrnAC1669 | P60619 | [ | 10937989 | |
| 8d | HVO_1631 |
| yes | 35% | PH1105 | O58832 | [ | 20931132 | |
| 8d | HVO_0916 |
| yes | 39% | PH0725 | O58456 | [ | 20873788 | |
| 8d | HVO_1077 |
| yes | 31% | YLR143W | Q12429 | [ | 23169644 | |
| 8e | HVO_0881 |
| yes | 33% | BSU19530 | O34525 | [ | 10455123 | |
| 8e | HVO_1987 |
| probably | 23% | BSU19530 | O34525 | [ | 10455123 | |
| 8e | HVO_1107 |
| prediction | no GSP | |||||
Proteins with open annotation issues and their Gold Standard Protein homologs (Section 3.9). For a description of this table, see the legend to Table 1.
| Gold Standard Protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Section | Code | Gene | Isofunc | %seq_id | Locus Tag | UniProt | Reference | PMID | Comment |
| 9a | HVO_1812 |
| prediction | no GSP | |||||
| 9b | halTADL_1913 |
| yes | 37% | - | P26984 | [ | 1809835 | |
| 9b | halTADL_1913 |
| yes | 31% | OCC_03567 | Q7LYW8 | [ | 15138858 | |
| 9c | HVO_1711 |
| probably | 33% | - | P29761 | [ | 1633799 | P29761 matches to C-term half of HVO_1711 |
| 9c | HVO_1711 |
| probably | 51% | SAMN | A0A1I6HD35 | [ | 8305855 | correlation between PMID:8305855 and A0A1I6HD35 likely (see text) |
| 9d | HVO_1967 |
| yes | 36% | MJ1605 | Q59000 | [ | 14655001 | |
| 9e | OE_1665R |
| no | 31% | PA1010 | Q9I4W3 | [ | 21396954 | GSP for |
| 9e | OE_1665R | probably | 30% | TTX_1156.1 | G4RJQ2 | [ | 15869466 | ||
| 9e | OE_1665R | probably | 25% | SSO3197 | Q97U28 | [ | 15869466 | ||
| 9f | HVO_1692 |
| self | [ | 30707467 | ||||
| 9f | HVO_1692 | probably | 35% | BSU34040 | O07021 | [ | 19201793 | matches up to HVO_1692 pos 490 of 733 | |
| 9f | HVO_1692 | probably | 35% | PST_3338 | O4VPR6 | [ | 25917905 | matches up to HVO_1692 pos 400 of 733 | |
| 9f | HVO_1693 |
| self | [ | 30707467 | ||||
| 9f | HVO_1693 | probably | 30% | BSU34030 | O32259 | [ | 19201793 | ||
| 9f | HVO_1693 | probably | 33% | PST_3339 | O4VPR7 | [ | 25917905 | partial match | |
| 9f | HVO_1697 |
| unclear | 24% | PST_3340 | O4VPR8 | [ | 25917905 | |
| 9f | HVO_1696 |
| probably | 44% | PST_3336 | O4VPR4 | [ | 25917905 | |
| 9g | HVO_B0300 |
| yes | 49% | BSU32450 | O32141 | [ | 20168977 | |
| 9g | HVO_B0299 |
| yes | 43% | BSU32460 | O32142 | [ | 16098976 | |
| 9g | HVO_B0301 |
| yes | 43% | BSU32450 | O32141 | [ | 17567580 | |
| 9g | HVO_B0302 |
| no | 33% | - | Q8VTT5 | [ | 12148274 | paper in Chinese, abstract in English; |
| 9g | HVO_B0302 | yes | 30% | STM0523 | Q7CR08 | [ | 23287969 | purine degradation | |
| 9g | HVO_B0302 | yes | 29% | BSU32410 | O32137 | [ | 11344136 | purine degradation | |
| 9g | HVO_B0306 |
| no | 39% | - | Q53389 | [ | 22904279 | carbamoyl-AA hydrolysis |
| 9g | HVO_B0306 | yes | 34% | At5g43600 | Q8VXY9 | [ | 19935661 | purine degradation | |
| 9g | HVO_B0308 |
| no | 46% | Saci_2270 | Q4J6M5 | [ | 10095793 | GAPDH |
| 9g | HVO_B0308 | no | 41% | - | P19915 | [ | 10482497 | CO-DH | |
| 9g | HVO_B0308 | yes | 39% | b2868 | Q46801 | [ | 10986234 | xanthine DH | |
| 9g | HVO_B0309 |
| yes | 33% | b2866 | Q46799 | [ | 10986234 | xanthine DH |
| 9g | HVO_B0309 | no | 28% | - | P19913 | [ | 10482497 | CO-DH | |
| 9g | HVO_B0309 | no | 26% | Saci_2271 | Q4J6M3 | [ | 10095793 | GAPDH | |
| 9g | HVO_B0310 |
| no | 31% | Saci_2269 | Q4J6M6 | [ | 10095793 | GAPDH |
| 9g | HVO_B0310 | no | 31% | - | P19914 | [ | 10482497 | CO-DH | |
| 9g | HVO_B0310 | yes | 25% | b2867 | Q46800 | [ | 10986234 | xanthine DH | |
| 9g | HVO_B0303 |
| yes | 38% | b3654 | P0AGM9 | [ | 16096267 | |
| 9h | HVO_0197 |
| possibly | 39% | lp_0105 | F9UST0 | [ | 27114550 | LarB family protein |
| 9h | HVO_2381 |
| possibly | 31% | lp_0106/ | F9UST1 | [ | 27114550 | LarC family protein |
| 9h | HVO_0190 |
| possibly | 34% | lp_0109 | F9UST4 | [ | 27114550 | LarE family protein |
| 9i | HVO_1660 |
| self | [ | 30884174 | ||||
| 9i | HVO_0756 |
| prediction | [ | 32095817 | ||||
| 9i | HVO_0990 |
| prediction | [ | 32095817 | ||||
| 9i | HVO_1690 |
| prediction | [ | 32095817 | ||||
| 9j | HVO_2763 |
| self | [ | 22350204 | no function could be assigned | |||
| 9j | HVO_2763 | no | 27% | HVO_0144 | D4GZ88 | [ | 18437358 | Rnase Z | |
| 9k | HVO_2410 |
| yes | 33% | Hneap_0211 | D0KWS7 | [ | 31406332 | |
| 9k | HVO_2411 |
| yes | 31% | Hneap_0212 | D0KWS8 | [ | 31406332 | |