| Literature DB >> 24646681 |
A Maxwell Burroughs1, L Aravind1.
Abstract
A protein family including mammalian NEMF, Drosophila caliban, yeast Tae2, and bacterial FpbA-like proteins was first defined over a decade ago and found to be universally distributed across the three domains/superkingdoms of life. Since its initial characterization, this family of proteins has been tantalizingly linked to a wide range of biochemical functions. Tapping the enormous wealth of genome information that has accumulated since the initial characterization of these proteins, we perform a detailed computational analysis of the family, identifying multiple conserved domains. Domains identified include an enzymatic domain related to the formamidopyrimidine (Fpg), MutM, and Nei/EndoVIII family of DNA glycosylases, a novel, predicted RNA-binding domain, and a domain potentially mediating protein-protein interactions. Through this characterization, we predict that the DNA glycosylase-like domain catalytically operates on double-stranded RNA, as part of a hitherto unknown base modification mechanism that probably targets rRNAs. At least in archaea, and possibly eukaryotes, this pathway might additionally include the AMMECR1 family of proteins. The predicted RNA-binding domain associated with this family is also observed in distinct architectural contexts in other proteins across phylogenetically diverse prokaryotes. Here it is predicted to play a key role in a new pathway for tRNA 4-thiouridylation along with TusA-like sulfur transfer proteins.Entities:
Keywords: DNA glycosylase; FbpA; IscS; NEMF; Tae2; TusA; base modification; caliban; fibronectin-binding; tRNA 4-thiouridylation
Mesh:
Substances:
Year: 2014 PMID: 24646681 PMCID: PMC4075521 DOI: 10.4161/rna.28302
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. Domain architectures and conserved gene neighborhoods relating to NFACT proteins. Domains architectures are depicted by adjoining polygonal shapes labeled with individual domain names, size of domains are not drawn precisely to scale. Individual genes within conserved gene neighborhoods are depicted with boxed arrows with the arrowhead pointing toward the 3′ end of the gene. Each architecture/neighborhood is labeled with gene name, GenBank gene identifier (gi) number, and organism name separated by semicolons. Architectures/neighborhoods relating to the NFACT gene family are boxed in purple; those relating specifically to independent contexts of the NFACT-R domain are boxed in orange. The coiled-coil region characteristic of the NFACT gene family is depicted as a light green circle in architectures and a light green box in gene neighborhoods. Abbreviations: CC, coiled-coil; HhH, helix-hairpin-helix; ZnK, zinc knuckle; ZnR, zinc ribbon.

Figure 2. Structural relationship between NFACT-N and FMN-DG domains. (A) Cartoon renderings of solved crystal structures (top) for the NFACT-N (left) and FMN-DG domains (right) accompanied by corresponding topology diagrams (bottom). Residues conserved across the domains are rendered as ball-and-stick in the cartoons; active site/enzymatic residues are colored in red, residues with a likely direct or indirect role in substrate recognition are colored in blue, and the conserved asparagine/histidine found in the HhH domains is colored in yellow. The PDB identifier is provided to the right of both cartoons. The labeling scheme provided below each diagram reflects the spatial/evolutionary conservation of each element as evident from the solved crystal structures and as referred to in the text. (B) Stepwise scenario for the emergence of the FMN-DG domain from the ancestral NFACT-N domain. Domains are depicted as arrays of secondary structure elements to show the wiring between elements.

Figure 3. Multiple sequence alignment of NFACT-N+HhH domains with selected FMN-DG+HhH domain sequences. FMN-DG sequences from solved crystal structures are at the top of the alignment followed by the NFACT-N sequences. Secondary structure elements are depicted as follows: extended loop regions are represented by black lines, β-strands represented by orange arrows, and α-helices represented by purple cylinders. The transition from the core enzymatic domain to the HhH domains is labeled with a black arrow above the alignment. Individual sequences are labeled to the left with gene name, organism name, and gi number separated by an underscore. Gene names are replaced by PDB identifiers where appropriate. Numbers to the left and right of the alignment correspond to amino acid position within the protein encoding the domain. Insert regions are excised and replaced with numbers indicating the length in amino acids of the insert. Due to the “re-wiring” between the core enzymatic domains (Fig. 2), FMN-DG sequences are not presented in linear order; “breaks” in this order are marked at appropriate positions with “x.” Coloring is based on the consensus line at the bottom of the alignment: h, hydrophobic (shaded in yellow); s, small (shaded in green); l, aliphatic (shaded in yellow); -, negatively charged (shaded in purple); p, polar (shaded in blue); +, positively charged (shaded in purple); a, aromatic (shaded in yellow); b, big (shaded in gray); u, tiny (shaded in green); c, charged (shaded in purple). Columns corresponding to active site residue positions are shaded in red, colored in white, and marked at the top with “*.” Columns corresponding to positions involved in either direct or indirect substrate recognition are shaded in brown, colored in white, and marked with “^.” The column corresponding to the conserved glutamate/histidine residue in the HhH domains is shaded in red, colored in yellow, and marked with “&.” The column corresponding to the conserved lysine/arginine residue specific to NFACT-N is marked with a “%.” Organism abbreviations as follows: Aboo, Aciduliprofundum boonei; Alai, Acholeplasma laidlawii; Aory, Aspergillus oryzae; Aper, Aeropyrum pernix; Atha, Arabidopsis thaliana; Bcer, Bacillus cereus; Bthu, Bacillus thuringiensis; CCal, Candidatus Caldiarchaeum; CChl, Candidatus Chloracidobacterium; CKor, Candidatus Korarchaeum; Cele, Caenorhabditis elegans; Cint, Ciona intestinalis; Cowc, Capsaspora owczarzaki; Cpas, Clostridium pasteurianum; Cpha, Chlorobium phaeobacteroides; Csym, Cenarchaeum symbiosum; Dpul, Daphnia pulex; Drer, Danio rerio; Dtur, Dictyoglomus turgidum; Ecol, Escherichia coli; Efae, Enterococcus faecalis; Ehis, Entamoeba histolytica; Fnec, Fusobacterium necrophorum; Glam, Giardia lamblia; Gsp., Geobacillus sp.; Gste, Geobacillus stearothermophilus; Hmag, Hydra magnipapillata; Hsap, Homo sapiens; Hter, Halorubrum terrestre; Hvol, Haloferax volcanii; Klac, Kluyveromyces lactis; Ldel, Lactobacillus delbrueckii; Llac, Lactococcus lactis; Lmon, Listeria monocytogenes; Mbre, Monosiga brevicollis; Mmus, Mus musculus; Mpsy, Methanolobus psychrophilus; Myel, Metallosphaera yellowstonensis; Nfis, Neosartorya fischeri; Ngar, Natrinema gari; Ngru, Naegleria gruberi; Pmar, Prochlorococcus marinus; Psp., Pyrococcus sp.; Ptet, Paramecium tetraurelia; Rnor, Rattus norvegicus; Saci, Sulfolobus acidocaldarius; Saur, Staphylococcus aureus; Scer, Saccharomyces cerevisiae; Sequ, Streptococcus equi; Skow, Saccoglossus kowalevskii; Spur, Strongylocentrotus purpuratus; Tbru, Trypanosoma brucei; Tcru, Trypanosoma cruzi; Tlie, Thermovirga lienii; Tori, Theileria orientalis; Tsp., Thermococcus sp.; Tsp., Thermotoga sp.; Uarc, uncultured archaeon.

Figure 4. Known and predicted reactions. (A) Base removal and ring-opening steps catalyzed by FMN-DNA glycosylases (top) and predicted analogous steps catalyzed by NFACT-N during a potential base-exchange reaction (bottom). The introduced free base in the NFACT-N reaction is labeled with a red “M,” indicating the base is potentially modified in some way despite being shown here as a uridine. (B) 2-thiouridylation, canonical 4-thiouridylation, and the novel, predicted 4-thiouridylation sulfur relay pathways. Potential intermediate step in the novel 4-thiouridylation pathway involving transfer to a conserved cysteine on IscU is shown in dotted lines, reflecting uncertainty in whether this step is present in all organisms containing the ThiI+NFACT-R fusion or whether it is restricted to a subset of them.