| Literature DB >> 28002465 |
Andrew F Neuwald1, Stephen F Altschul2.
Abstract
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).Entities:
Mesh:
Substances:
Year: 2016 PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 5The Caenorhabditis elegans glucosamine-6-phosphate N-acetyltransferase (Gna1) complexed with CoA and N-acetylglucosamine-6-phosphate (GlcNAc6p) (pdb_id: 4ag9) [84].
Gna1 was assigned to node 12 of the hierarchy. A. Structural locations of acetylase residues (yellow) and node 12-specific residues (red). B. Node 12-specific residues involved in substrate binding. These residue positions are indicated in (as red dots above column positions). C. Node 12-specific residues associated with the homodimeric interface.
Fig 7Node 35 structural features implicated in a proposed induced-fit mechanism.
A-C. Conformational changes that involve two conserved residues and that mediate opening and closing of the substrate binding pocket of a putative acetylase from Salmonella typhimurium with and without bound acetyl-CoA (pdb_id: 3dr8 and 3dr6). A. (top) Surface view of the open conformation when CoA is bound to both subunits (pdb_id: 3dr8). The surface of Glu82 is shown in red, of Arg72 in yellow and of the rest of the substrate binding pocket (SBP) in green. (bottom) Close up view of Glu82 and Arg72 at the dimeric interface (pdb_id: 3dr8); the SBP is indicated. B. (top) Surface view of the closed conformation when neither subunit is bound to CoA (pdb_id: 3dr6). Note that the substrate binding pocket appears inaccessible. (bottom) Close up of the Glu82-Arg72 salt bridge formed at the subunit interface. C. (top left) Surface side view of the acetyl-CoA bound form (pdb_id: 3dr8) showing the locations of a cluster of Set42-specific residues (shaded yellow). CoA is shown in cyan. (bottom) The same view of 3dr8 as in A but rotated by 90 degrees to show a side view. The expanded box shows the node-42 pattern residue interactions forming a bridge between adjacent loop regions. (top right) A similar view of C. Elegans Gna1 (pdb_id: 4ag9) showing the adjacent locations of the CoA and substrate within a channel rather than a pocket as in (A). D-F. Differences between node 40 and node 36 pattern residues. Residues with red sidechains correspond to node 35 pattern residues. See for the contrast alignment showing pattern residues. D. Node 40 pattern residues (orange sidechains) within 3dr6 (an acetylase assigned to node 42). E. Node 40 pattern residues (orange sidechains) within 1vhs (an acetylase assigned to node 41). F. Node 36 pattern residues (light green sidechains) within 4jxr (an acetylase assigned to node 39).