| Literature DB >> 34572495 |
Colbie J Reed1, Geoffrey Hutinet1, Valérie de Crécy-Lagard1,2.
Abstract
Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as "GTP cyclohydrolase I type 2" through electronic propagation based on one study. Here, the annotation status of this protein family was examined through a comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox, and universal stress responses.Entities:
Keywords: bioinformatics; comparative genomics; conserved unknowns; function prediction; functional annotation; metabolic reconstruction; orthology
Mesh:
Substances:
Year: 2021 PMID: 34572495 PMCID: PMC8469502 DOI: 10.3390/biom11091282
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Focal publications featuring members of the DUF34 protein family.
| Name | Organisms | Phenotype, Biological Relevance | Reference |
|---|---|---|---|
| YqfO/BC_4286 |
| Inserted domain similar to PII-like/CutA1 family proteins; present in select bacterial clades; domain may regulate catalytic activity | [ |
| YqfO/BSU_25170 |
| With YlxR, coregulates | [ |
| BmNIF3l |
| Translocates to nucleus from cytoplasm upon ATRA tx; higher transcript levels in differentiating tissues; no expression detected in the egg stage | [ |
| YbgI/b0710 |
| Structure, homohexameric toroid; monomers possess dinuclear metal ion-binding site; putatively involved in DNA repair | [ |
| No survival impairment upon mutant UV tx; polar localization during cell division (co-localized with PstB, TktA); GlmS putative interaction partner; mutant sensitive to antibiotics affecting cell wall synthesis | [ | ||
| XynX |
| Negatively regulates expression of | [ |
| NIF3L1/ALS2CR1/CALS-7/MDS015/My018 |
| Ubiquitously expressed during embryonic development; strong over-expression in spermatogonia-derived, teratocarcinoma cell lines; Isolated, characterized; cytosolic subcellular localization; highly conserved N-, C-terminal regions; shares inserted region of its murine homolog (CutA1-like) | [ |
| NIF3L1 interacts with splice variant, NIF3L1 BP1 (THOC7), cytosolic colocalization; C-terminal leucine zipper-like domain of variant mediates interaction; not indicated in repression in NIH3T3 cells; binding partner, NIF3L1 BP1, demonstrates additional passive presence in the nucleus | [ | ||
| Retinoic acid-induced binding, cooperative translocation with Trip15/CSN2 from the cytosol to the nucleus (early neuronal development, silences differentiation suppressor Oct-3/4); ubiquitous expression, important in neuronal development | [ | ||
| Detected in brain, spinal cord, and lymphocytes; observed as two distinct transcripts with similar patterns of expression; highest levels of both transcripts in heart, skeletal muscle, testis; smaller transcript was expressed at a higher level than the other; no deletions, polymorphisms linked to ALS patients relative to controls; 1 of 6 candidates eliminated for a causative link to ALS2 | [ | ||
| 1 of 4 hypermethylated, significant differential expression shared between two cancellous bone specimen groups: osteoarthritis, osteoporosis | [ | ||
| With 14-3-3, co-regulates transcriptional of Wbscr14 by preventing its nuclear localization via complex formation (Wbscr14 participates in the complex-mediated transcription of lipogenic enzymes, promoting fat accumulation) | [ | ||
| Included in a 7.5-Mb interstitial deletion on 2q32.3–33.1 (28 genes) inpatient diagnosed with SATB2-Associated 2q32-q33 microdeletion syndrome | [ | ||
| Significantly associated with triptolide chemosensitivity in lymphoblast cell lines | [ | ||
| COPS2 point mutations consistent with previously defined NIF3L1-COPS2 co-repression interaction model (limited; pathogenesis associated COPS2 mutations: S120C, N144S, Y159H, R173C) | [ | ||
| HP0959 |
| GTP-binding, hydrolysis in vitro, biologically irrelevant pH, temperature | [ |
| HcgD/MJ0927 |
| Proposed iron chaperone required for FeGP cofactor biosynthesis | [ |
| Nif3l1/1110030G24Rik |
| Isolated, characterized; ubiquitous expression across tissues; cytosolic localization; highly conserved N-, C-terminal regions; shares inserted region of the human homolog | [ |
| Retinoic acid-induced binding, cooperative translocation with Trip15/CSN2 from the cytosol to the nucleus (early neuronal development, results in the silence of the differentiation suppressor Oct-3/4); ubiquitous tissue expression, important in neuronal development | [ | ||
| WP_046236688 |
| (“YqfO03”) small, secreted protein; demonstrated high potency as nematicide against | [ |
| Nif3/YGL221C |
| Determined to have dual/multiple localizations (cytosolic, mitochondrial) | [ |
| SA1388 |
| The central domain of NIF3 homolog has high structural similarity to CutA1 (family linked to cation tolerance, homeostasis) | [ |
| SP1609 |
| Described as a member of the same orthologous group (COG2384) as TrmK, RpoD protein families via structural alignment ( | [ |
| TTHA1606 |
| Binds to ssDNA (very weakly, in vitro) | [ |
| NIF3-like protein superfamily | NA | (electronic translation) describes family members of model organisms (Eukaryota, Bacteria), structures published prior to 2007 | [ |
Published structures of DUF34 protein family members.
| Name | Organisms | Ligands | PII Domain | PDB | Phenotype | Reference |
|---|---|---|---|---|---|---|
| YbgI |
| (2)Fe3+ | No | 1NMO | NA | [ |
| (2)Mg2+ | No | 1NMP | ||||
| HcgD/MJ0927 |
| (1)Cl−, (2)Fe3+ | No | 3WSD | Weaker Fe1 site under oxidized conditions in vitro | [ |
| (2)Fe2+, (1)PO43− | No | 3WSE | ||||
| (1)Fe3+, (1)citrate | No | 3WSF | ||||
| (1)Fe2+, (1)citrate | No | 3WSG | ||||
| (1)Fe3+, (1)SO42− | No | 3WSH | ||||
| (1)Fe2+, (1)PO43− | No | 3WSI | ||||
| NA | No | 4IWG | Binds to ssDNA, dsDNA in vitro | [ | ||
| NA | No | 4IWM | ||||
| SA1388 |
| (2)Zn2+, (1)B3P | Yes | 3LNL | Cavity diameter = 38 Å; opening edge length = 20 Å (triangular opening) | [ |
| (2)Zn2+ | Yes | 2NYD | ||||
| SP1609 |
| NA | No | 2FYW | NA | PDB only |
| TTHA1606 |
| NA | No | 2YYB | Binds ssDNA not dsDNA in vitro | [ |
| Sthe_0840 |
| (7)Cl− *, (14)FMT *, (1)ACT * | No | 3RXY | NA | PDB only |
| YqfO |
| (2)Zn2+, (1)HEPES, (1)TRS | Yes | 2GX8 | NA | [ |
* Asterisk indicates that ion count is per the respective asymmetrical unit as opposed to per monomer.
Figure 1Dinuclear metal-binding site of the E. coli DUF34 homolog, YbgI. The crystal structure of YbgI (DUF34 homolog, E. coli) illustrates conserved residues of the protein family specific to the monomeric cleft of the active site and its dinuclear metal center. There are highly conserved residues noted by Ladner et al. [26] to demonstrate involvement in the structure of the binding pocket that are distinctively colorized, annotated (orange; residue identity and location labeled accordingly).
Figure 2Key motifs of Bacteria and Archaea compared to those of Eukaryota. Sequences were aligned for eukaryotic sequences, separately, and, for bacterial and archaeal sequences, combined. A multiple motif method was used to determine and compare family signatures. A full figure illustrating the distinct levels of conservation per superkingdom can be examined in Figure S3.
Figure 3Inserted domain lengths across model taxa. The lengths of inserted domains were measured for each homolog. The sequences (organisms listed in Data Table S4) were aligned per superkingdom for delimiting domains, which then allowed for the measurement of each inserted region (if present). An evolutionary tree was generated using PhyloT and iToL, and was mapped with the lengths of inserted domains within each respective homolog. For all inserted domain lengths measured, these data were used to generate Figure S5, a histogram illustrating counts by ranges of domain lengths per superkingdom.
Figure 4COG-InterPro HMM signature profile relationships and defined subgroups across DUF34 family members. The sequences of organisms across the DUF34 protein family, including all fusions and paralogs, were analyzed for co-occurrence relationships of COGs and HMM-determined InterPro family/superfamily/domain annotations. All organism homologs, paralogs & fusions were validated using eggNOG and KEGG Paralog Search. Sequences missing InterPro annotation were analyzed by NCBI CDD Search and InterProScan Search. See Data Table S5 for categories and respective COG designations/InterPro signature profiles in tabular format. The sequence source organisms considered were those also observed in Data Table S4. Groups were designated by differential keystone signatures shown in (a) and select representative sequences of subgroups (A–G) are shown (b).
Figure 5Absence–presence of DUF34 architectural domain subgroups. Absence–presence data of COGs and HMM-determined InterPro family/superfamily/domain signature profiles added to a species tree, generated using organisms harboring published homologs and those used in alignments acquired via OrthoInspector (Data Table S4). Proteins are designated as categories A–G, as detailed in Figure 4 and Data Table S5. These homologous domains are classified in the map according to their HMM-defined DUF34 domain identities (see Figure 4a).
Top 20 COGs found to occur in operons containing COG0327.
| Rank | COG | Name/Description | Metal(s) | References (PMID, EC Number) |
|---|---|---|---|---|
| 1 | COG0327 | Putative GTP cyclohydrolase 1 type 2, NIF3 family | Fe2+/Fe3+, Zn2+, Mg2+ | [ |
| 2 | COG1579 | Predicted nucleic acid-binding protein DR0291, contains C4-type Zn-ribbon domain | Zn2+ | [ |
| 3 | COG0568 | DNA-directed RNA polymerase, sigma subunit (sigma70/sigma32) | Zn2+, Mg2+ | [ |
| 4 | COG0358 | DNA primase (bacterial type) | Zn2+, Mg2+, Mn2+ | [ |
| 5 | COG0457 a | Tetratricopeptide (TPR) repeat |
| None listed |
| 6 | COG2384 | tRNA A22 N1-methylase |
| [2.1.1.217] |
| 7 | COG0079 | Histidinol-phosphate/aromatic aminotransferase or cobyric acid decarboxylase | [ | |
| 8 | COG0240 | Glycerol-3-phosphate dehydrogenase |
| [1.1.1.94] |
| 9 | COG0328 | Ribonuclease HI (RnhA) | Mg2+, Mn2+, Co2+, Ni2+ | [ |
| 10 | COG0500 b | SAM-dependent methyltransferase |
| [2.1.1.242] |
| 11 | COG0513 c | Superfamily II DNA and RNA helicase (SrmB/RhlB) | Mg2+, Mn2+ | [3.6.4.13] |
| 12 | COG0596 | 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase MenH and related esterases, alpha/beta hydrolase fold (MhpC) |
| [3.7.1.14] |
| 13 | COG0655 | Multimeric flavodoxin WrbA, includes NAD(P)H:quinone oxidoreductase | Most req. Fe-S cluster; subtypes without Fe-S clusters | [1.6.5.2], [1.6.5.6] |
| 14 | COG0752 | Glycyl-tRNA synthetase, alpha subunit | Mg2+, Mn2+, Co2+ | [ |
| 15 | COG0826 | 23S rRNA C2501 and tRNA U34 5’-hydroxylation protein RlhA/YrrN/YrrO, U32 peptidase family; ubiquinone biosynthesis protein, UbiU/YhbU | Fe-S cluster/Fe, Ca2+ | [ |
| 16 | COG1028 | NAD(P)-dependent dehydrogenase, short-chain alcohol dehydrogenase family | Co2+, Fe/Fe2+, Mg2+, Mn2+, Zn/Zn2+ | [1.1.1.2] |
| 17 | COG1897 | Homoserine O-succinyltransferase |
| [2.3.1.31], [2.3.1.46] |
| 18 | COG0177 d | Endonuclease III (Nth) | Fe-S cluster, Ca2+, Co2+, Fe/Fe2+, Mg2+, Mn2+, Ni2+, Zn2+ | [ |
| 19 | COG0477 d | MFS family permease (includes anhydromuropeptide permease AmpG, ProP) |
| None listed |
| 20 | COG0494 e | 8-oxo-dGTP pyrophosphatase MutT and related house-cleaning NTP pyrophosphohydrolases, NUDIX family | Co2+, Mg2+, Mn2+, Zn2+ | [3.6.1.13] |
Exceptions to representative operons relative to table contents: a Proteins containing TPR repeat domains present in archaeal operons. b SAM-dependent methyltransferase domains present (not designated COG0500). c Though not assigned COG0513, helicase domain-containing proteins are present (e.g., Era/COG1159, YhaM/COG3481). d MutY is present (COG1194), another endonuclease family member. e MutM/NUDIX domain containing proteins are present (COG0266).
Figure 6Metal ion-binding of proteins encoded in representative Bacterial and Archaeal operons. (a) A radar chart illustrating the proportions of DUF34-operon encoded proteins documented to interact with certain metals or metal-containing moieties. Accounting for the over-representation of magnesium and zinc among available protein structures, a second radar chart (b) was generated to show the same data without proteins found to exclusively bind either or both ions. Bacterial data are shown in blue while Archaeal data are shown in red. Data used to generate these figures can be found in Table S4.
Figure 7DUF34 fusions and select gene neighborhoods. (a) Domain architectures of DUF34 fusions. The domain rendering dimensions and positions are approximate. DUF34 domains are rendered in white with black outlines. Domain colors correspond to the key shown in panel b. COGs of fusion domains are listed below each. Fusions deemed “invalid” or “inconclusive” were excluded for panels a and b. (b) Pie chart of DUF34 fusions (126 sequences, total). The outer halo surrounding chart indicates the superkingdoms in which respective fusions were observed (Eukaryota: black; Archaea: dark gray; Bacteria: light gray). (c) Neighborhoods of select bacterial and archaeal fusions are shown (12 kb, each), all of at least “conditional” validation confidence (Data Table S11). DUF34 is depicted in bright yellow and fusion domains are indicated by hashing or alternative coloring. For DUF34 sequence labels, “YqfO” denotes a sequence also containing inserted domain, COG3323, while “YbgI” denotes a sequence without the inserted COG3323 domain. Rendered fusion domains do not reflect exact sizes or locations. The color key is divided into two sets of identities (gray boxes): (top) general metabolic theme or specific annotation with bioinformatic precedent; and (bottom) COGs observed in physical clustering analysis (PCA). COGs also observed in PCA (Table 3) are shown in bold. Six minor exceptions to the top-20 rank cut-off are shown in bold with an asterisk (*): COG1196 (top 31st); COG0564 (top 23rd); COG0648 (top 25th); COG0406 (top 48th) in a fusion with COG0328; and COG0041 (top 36th). Others observed in rep. operons but were ranked beyond the “minor exception” threshold (exceeded top-50) in PCA are shown without additional symbols, not bolded: COG0245 (116th) and COG0761 (61st). Finally, one was not observed in PCA (not bolded) but was in at least one rep. operon (double asterisk, **): COG0642 (SAMN05192534_10671 of A. persepolensis; rep. operon, Desulfurispirillum indicum S5) (Data Table S7). Note: COG4111 (NUDIX hydrolase), present in panel c (neighborhood of M. rubeus), was absent from PCA (any rank) and rep. operons, despite the fusion with COG3323 in F. nucleatum having been resolved in preceding homolog capture and literature review.
Figure 8DUF34 of E. coli, ybgI, fails complementation in the absence of folE. Plates were imaged after 20 h of growth at 37 °C. (a,b) dT essentiality assay. WT, single mutants, and double mutant (folE, ybgI) strains have been grown at 37 °C in LB supplemented in the absence (a) or presence (b) or dT 0.3 mM. Each curve shown is averaged across 5 replicates. (c) dT essentiality complementation assay. WT, single mutants, and double mutant (folE, ybgI) strains, containing various derivatives of pBAD24 encoding for either E. coli YbgI or FolE, have been streaked on LB plates supplemented with Ampicillin 100 µg/mL in the presence of either 0.2% glucose for repression of the gene expression, or 0.2% arabinose for overexpression of the gene of interest, and in presence or absence of dT 0.3 mM.