| Literature DB >> 23617634 |
Rajaram Gana1, Shruti Rao, Hongzhan Huang, Cathy Wu, Sona Vasudevan.
Abstract
BACKGROUND: The post-genomic era poses several challenges. The biggest is the identification of biochemical function for protein sequences and structures resulting from genomic initiatives. Most sequences lack a characterized function and are annotated as hypothetical or uncharacterized. While homology-based methods are useful, and work well for sequences with sequence identities above 50%, they fail for sequences in the twilight zone (<30%) of sequence identity. For cases where sequence methods fail, structural approaches are often used, based on the premise that structure preserves function for longer evolutionary time-frames than sequence alone. It is now clear that no single method can be used successfully for functional inference. Given the growing need for functional assignments, we describe here a systematic new approach, designated ligand-centric, which is primarily based on analysis of ligand-bound/unbound structures in the PDB. Results of applying our approach to S-adenosyl-L-methionine (SAM) binding proteins are presented.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23617634 PMCID: PMC3662625 DOI: 10.1186/1472-6807-13-6
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1Ligand-centric approach. This approach involves a multipronged analysis at various sequence and structural levels. These include analysis at the residue level, analysis at the protein/domain level, and analysis at the family level. At the residue level, this analysis includes identification of conserved binding site residues based on structure-guided sequence alignments of representative members of a family and identification of conserved motifs. At the protein/domain level, analysis includes examination of SCOP folds, pfam domains, and protein topologies. At the ligand level, the analysis includes ligand conformations, ribose sugar puckering (when applicable), and identification of conserved ligand-atom interactions. Finally, at the family level, the approach includes phylogenetic analysis.
Figure 2Fold types of SAM-binding proteins. The folds follow SCOP classification, except for Helical Bundle, which we have assigned. A total of 18 folds include 9 Mtases and 9 non-MTases indicated by #. Structures belonging to the Rossmann fold methylases have evolved to become MTases and non- MTases and are indicated with a yellow box. SAM-dependent MTases have been previously categorized into five classes by Cheng et al.[38]. We have extended this to include a total of nine classes. The four added classes are indicated by **.
Ligfolds and newly classified topological sub-classes
| 3214567 | 351 | SAM_DM_Ia | Class Ia |
| 6754123 | 321 | SAM_DM_Ib | Class Ib |
| 32145 | 2 | SAM_DM_Ic | Class Ic |
| 54123 | 19 | SAM_DM_Id | Class Id |
| 564312 | 29 | SAM_DM_Ie | Class Ie |
| 654321 | 2 | SAM_DM_If | Class If |
| 1762354 | 10 | SAM_DM_Ig | Class Ig |
| 7645321 | 1 | SAM_DM_Ih | Class Ih |
| 7654123 | 12 | SAM_DM_Ii | Class Ii |
| 17865234 | 1 | SAM_DM_Ij | Class Ij |
| 5671432 | 2 | SAM_DM_Ik | Class1k |
| 6754123/3214567 | 1 | SAM_DM_Il | ClassIl |
| 3421567 | 1 | SAM_DM_Im | ClassIm |
| 34215687 | 4 | SAM_DM_In | ClassIn |
Belong to SCOP fold S-adenosyl-L-methionine dependent methyltransferase (SAM_DM).
Figure 3Topological classes within fold type I. The classification is based on defining the strands that form the core. The strands are numbered from the N-terminus to the C-terminus and read from left to right. Only the SAM/SAH domain is included. Bound SAM is shown as a ball and sticks, and the structures are represented in cartoon diagrams. The strands that form the core are colored in red. The labels list the PDB-ID followed by the topology. The corresponding two-dimensional topological arrangement is provided in Additional file 3. The figures were generated using PyMOL visualization software (http://www.pymol.org).
Figure 4Ligand conformations across all 18 fold types. A striking correlation between fold type and ligand conformation was noted. One representative structure was selected from each of the different folds. The structure with the highest resolution was chosen. The ligand SAM/SAH is indicated as a ball and stick. The figure was generated using Chimera visualization software (http://www.cgl.ucsf.edu/chimera/), and atoms are labeled. **Beside Type VII (PDB-ID: 4A2N) indicates an average temperature factor of >80Å2 for the ligand and hence may not be reliable. Conformation can be confirmed as more structures become available.
Figure 5Superposition of all fold type I SAM/SAH ligands of representative structures from each family that have a mean B-factor of <80Å. A. Superposition via the ribose moiety. B. Superposition of all SAM atoms. Figure was generated using Chimera Visualization Software (http://www.cgl.ucsf.edu/chimera/).
Figure 6Structure-guided alignment of representative structures for fold type I. Only the aligned core is shown. The alignment was completed using the Cn3d tool. The structural representation is shown as tubes.
Figure 7Taxonomic distributions of SAM-binding proteins. Families that have representative members from all three branches of life [Archaea (A), Bacteria (B), and Eukaryotes (E)] are indicated within the rectangular boxes. The corresponding fold type is indicated for each of these families along with the circumference of the circle. A total of 29 families that belong to 10 different fold types contain members in all three branches of life. This information may help to identify the last universal common ancestor of SAM-binding proteins.
Annotation of uncharacterized proteins based on our ligand-centric approach
| 2PGX | Crystal structure of UPF0341 protein yhiQ from E. coli | SF016106 | E=0, B=156, A=0, V=0, O=4 | Type 1 | Putative SAM dependent r-RNA methyltransferase |
| Class 1a | |||||
| 2O3A | Crystal structure of a protein AF_0751 from Archaeoglobus fulgidus | SF016123 | E=0, B=0, A=134, V=0, O=0 | Type IV | Putative SAM dependent t-RNA archaeal methyltransferase |
| Class IV | |||||
| 2B78 | A putative sam-dependent methyltransferase from Streptococcus mutans | SF004981 | E=2, B=403, A=16, V=0, O=5 | Type 1 | Putative SAM dependent RNA methyltransferase |
| Class 1a | |||||
| 3DR5 | Crystal structure of the Q8NRD3_CORGL protein from Corynebacterium glutamicum | SF005841 | E=122, B=346, A=6, V=0, O=3 | Type 1 | Putative SAM dependent COMT type methyltransferase |
| Class 1b | |||||
| 1XXL | The crystal structure of YcgJ protein from Bacillus subitilis at 2.1 A resolution | SF006616 | E=6, B=130, A=2, V=0, O=4 | Type 1 | Putative SAM dependent Class Ib methyltransferase |
| Class 1b | |||||
| 1YB2 | Structure of a putative methyltransferase from Thermoplasma acidophilum | SF017269 | E=227, B=418, A=110, V=0, O=2 | Type 1 | Putative SAM dependent t-RNA methyltransferase |
| Class 1a | |||||
| 1JSX | Crystal Structure of the Escherichia coli Glucose-Inhibited Division Protein B (GidB) | SF003078 | E=19, B=4040, A=0, V=0, O=13 | Type 1 | Putative SAM dependent r-RNA methyltransferase |
| Class 1b |
Functions are assigned based on the results of the analysis presented in this manuscript. Majority of the structures are from Structural Genomics Initiatives with unassigned functions.