Literature DB >> 20299324

SeSAW: balancing sequence and structural information in protein functional mapping.

Daron M Standley¹, Reiko Yamashita, Akira R Kinjo, Hiroyuki Toh, Haruki Nakamura.

Abstract

MOTIVATION: Functional similarity between proteins is evident at both the sequence and structure levels. SeSAW is a web-based program for identifying functionally or evolutionarily conserved motifs in protein structures by locating sequence and structural similarities, and quantifying these at the level of individual residues. Results can be visualized in 2D, as annotated alignments, or in 3D, as structural superpositions. An example is given for both an experimentally determined query structure and a homology model.
AVAILABILITY AND IMPLEMENTATION: The web server is located at http://www.pdbj.org/SeSAW/.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2010 PMID： 20299324 PMCID： PMC2859130 DOI： 10.1093/bioinformatics/btq116

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Sequence alignment and structural alignment are widely used techniques for inferring functional or evolutionary relationships between proteins. However, most alignment methods do not integrate sequence and structural information into one measure of similarity or describe the similarity at the level of individual residues. We recently introduced a sequence and structure-based scoring method that employs sequence profile–profile comparisons, but is anchored by structural alignments and showed that the functional information associated with the top-scoring hits found by the method agreed well with expert annotations published in the literature (Standley et al., 2008b). Subsequently, we have shown that this approach can be used to identify functional sites in remote (e.g.10–20% sequence identity) homology models, even when the structural template used to build the model is itself un-annotated (Standley et al., 2008a). That is, a structure without a known function (e.g. a structural genomics target) can be used as an intermediate template to subsequently locate a functionally characterized structure, and thus map putative functional sites onto a distantly related query sequence. Here, we describe a web-based implementation of the method called SeSAW (sequence-derived structural alignment weights) that can automatically perform putative functional residue mapping. We emphasize that such mapping is intended to guide subsequent experiments rather than to serve as a substitute for experimental annotations.

2 ALGORITHM

SeSAW takes as input a PDB-formatted query file, chain ID, and, in the case of a template-based model, the PDB ID and chain ID of the template. As illustrated in Figure 1, A PSI-BLAST position specific scoring matrix (PSSM) for the query is retrieved or computed, as necessary (we maintain a database of PSSMs for every unique PDB chain). The query is partitioned into unique structural domains using SCOP (Murzin et al., 1995), CATH (Pearl et al., 2005) and Protein Domain Parser (Alexandrov and Shindyalov, 2003). For each domain, SeSAW attempts to construct a list of representative structure neighbors by mapping from a pre-computed list of pairwise structural alignments using PSI-BLAST; if this fails, SeSAW performs direct structural alignment on the representative list using ASH (Standley et al., 2007). The representative neighbors are then expanded to include their sequence homologs. The resulting hits are structurally aligned to the query and ranked by the SeSAW score. The score is given by adding the ASH structure alignment score to the sum of a per-residue similarity score: where per-residue similarity score S is defined as: Here, d is the distance between Cα atoms in the two aligned residues (after superposition of the query and template), dmax is a reference distance (4 Å used on all calculations), w is a scalar weight (0.8 used in all calculations), SB is the bit Blosum62 matrix, aQ and aT are the amino acid types of the query and template, respectively, wP is a scalar weight (1.5 used in all calculations), and ST and SQ are the odds column vectors of the query and template PSSMs, respectively. The SeSAW score is reported, along with a P-value computed by numerically integrating the known distribution of scores. Functional annotations, extracted regularly at the Protein Data Bank Japan, are then mapped onto the query–template alignment.

Fig. 1.

Outline of the server. The rectangles on the left indicate major steps that are performed in real-time. Those on the right indicate steps that are done offline. Ovals in the center represent external software used for both types of calculations. Colored lines indicate their interconnection.

3 VISUALIZATION

Query–template alignments, with residue-level functional descriptions, when available, are displayed with Jalview (Waterhouse et al., 2009). Superpositions can be downloaded or visualized in 3D with an interactive table of residues pairs that score highly according to the per-residue similarity score.

4 EXAMPLES

The SeSAW method was used to find templates related to the hypothetical protein TTHA1568 from Thermus thermophilus HB8 (PDB identifier 2czlA), a structural genomics target with unknown function (Standley et al., 2008b). The biochemical function of TTHA1568 has subsequently been determined (Hiratsuka et al., 2008). In our original work, while we were unable to pinpoint the exact biochemical function, our analysis indicated a likely active site near residues S57, T105 and T106, as well as a highly significant glycine (G82) that we proposed would act as a hinge, allowing substrate access. These predictions are supported by recent experimental evidence (Arai et al., 2009). This result is significant since the closest sequence homolog with known function at the time of our prediction, a glutamate transport protein, had a sequence identity of only 15%. The second example illustrates the use of SeSAW in annotating a homology model. Zc3h12a from Mus musculus is a protein that was found to be required for mRNA stability of inflammatory cytokines. Because of the very low sequence homology to known folds, a number of models were built and submitted to SeSAW, and the model with the highest raw score retained for further analysis. This model was built on a structural genomics target of unknown function (PDB ID 2qipA). The top two SeSAW hits to this model were to a Mg-dependent hydrolase (2ho4B) and a Mg-dependent phosphatase (1k1e). From these hits, a cluster of conserved aspartic acids that bind Mg could be identified in the query. The second highest hit was to the nuclease domain of the Taq DNA polymerase (1tauA). These three hits are consistent with a possible Mg-dependent ribonuclease function, and this prediction was subsequently demonstrated in vitro and in vivo; moreover, when we mutated one of the predicted Mg-binding aspartic acids to asperagine, the nuclease activity was abolished, confirming the predicted active site location (Matsushita et al., 2009). These two examples are typical of SeSAW results when only very low sequence homologs exist. More recently, SeSAW was used to correctly identify the dual (Ser/Thr or Tyr) specificity of the kinase ROP16 from Toxoplasma gondii using a more modest (20%) homology model, while sequence analysis alone indicated greater similarity to Ser/Thr kinases (Yamamoto et al., 2009). The structural alignment step allows very distantly related templates to be recognized, while the use of profile–profile sequence comparison highlights the residue pairs that are mutually conserved. However, not all such residues are expected to be part of the active site; structurally important amino acids such as proline and some large hydrophobic groups often score highly as well. Another limitation of SeSAW is that, in difficult cases such as these, the exact biochemical function is not automatically revealed, although residues that make up the active site can often be located. Prediction of the biochemical role of the protein requires some investigation and, ultimately, biochemical experimentation. Nevertheless, SeSAW is a significant improvement over running structural and sequence analysis separately (Standley et al., 2008b), and can thus play an important role in automated functional annotation of structural genomics targets or homology models.

10 in total

1. PDP: protein domain parser.

Authors: Nickolai Alexandrov; Ilya Shindyalov
Journal: Bioinformatics Date: 2003-02-12 Impact factor: 6.937

2. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

Authors: Daron M Standley; Hiroyuki Toh; Haruki Nakamura
Journal: Proteins Date: 2008-09

3. An alternative menaquinone biosynthetic pathway operating in microorganisms.

Authors: Tomoshige Hiratsuka; Kazuo Furihata; Jun Ishikawa; Haruyuki Yamashita; Nobuya Itoh; Haruo Seto; Tohru Dairi
Journal: Science Date: 2008-09-19 Impact factor: 47.728

4. SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors: A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal: J Mol Biol Date: 1995-04-07 Impact factor: 5.469

5. Crystal structure of MqnD (TTHA1568), a menaquinone biosynthetic enzyme from Thermus thermophilus HB8.

Authors: Ryoichi Arai; Kazutaka Murayama; Tomomi Uchikubo-Kamo; Madoka Nishimoto; Mitsutoshi Toyama; Seiki Kuramitsu; Takaho Terada; Mikako Shirouzu; Shigeyuki Yokoyama
Journal: J Struct Biol Date: 2009-07-12 Impact factor: 2.867

6. Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors: Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal: Bioinformatics Date: 2009-01-16 Impact factor: 6.937

7. Zc3h12a is an RNase essential for controlling immune responses by regulating mRNA decay.

Authors: Kazufumi Matsushita; Osamu Takeuchi; Daron M Standley; Yutaro Kumagai; Tatsukata Kawagoe; Tohru Miyake; Takashi Satoh; Hiroki Kato; Tohru Tsujimura; Haruki Nakamura; Shizuo Akira
Journal: Nature Date: 2009-03-25 Impact factor: 49.962

8. ASH structure alignment package: sensitivity and selectivity in domain classification.

Authors: Daron M Standley; Hiroyuki Toh; Haruki Nakamura
Journal: BMC Bioinformatics Date: 2007-04-04 Impact factor: 3.169

9. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.

Authors: Frances Pearl; Annabel Todd; Ian Sillitoe; Mark Dibley; Oliver Redfern; Tony Lewis; Christopher Bennett; Russell Marsden; Alistair Grant; David Lee; Adrian Akpor; Michael Maibaum; Andrew Harrison; Timothy Dallman; Gabrielle Reeves; Ilhem Diboun; Sarah Addou; Stefano Lise; Caroline Johnston; Antonio Sillero; Janet Thornton; Christine Orengo
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. A single polymorphic amino acid on Toxoplasma gondii kinase ROP16 determines the direct and strain-specific activation of Stat3.

Authors: Masahiro Yamamoto; Daron M Standley; Seiji Takashima; Hiroyuki Saiga; Megumi Okuyama; Hisako Kayama; Emi Kubo; Hiroshi Ito; Mutsumi Takaura; Tadashi Matsuda; Dominique Soldati-Favre; Kiyoshi Takeda
Journal: J Exp Med Date: 2009-11-09 Impact factor: 14.307

10 in total

6 in total

1. Basis for substrate recognition and distinction by matrix metalloproteinases.

Authors: Boris I Ratnikov; Piotr Cieplak; Kosi Gramatikoff; James Pierce; Alexey Eroshkin; Yoshinobu Igarashi; Marat Kazanov; Qing Sun; Adam Godzik; Andrei Osterman; Boguslaw Stec; Alex Strongin; Jeffrey W Smith
Journal: Proc Natl Acad Sci U S A Date: 2014-09-22 Impact factor: 11.205

Review 2. Protein structure annotation resources.

Authors: Margaret J Gabanyi; Helen M Berman
Journal: Methods Mol Biol Date: 2015

3. Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Authors: Shunsuke Teraguchi; Ashwini Patil; Daron M Standley
Journal: BMC Bioinformatics Date: 2010-10-15 Impact factor: 3.169

4. Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format.

Authors: Akira R Kinjo; Hirofumi Suzuki; Reiko Yamashita; Yasuyo Ikegawa; Takahiro Kudou; Reiko Igarashi; Yumiko Kengaku; Hasumi Cho; Daron M Standley; Atsushi Nakagawa; Haruki Nakamura
Journal: Nucleic Acids Res Date: 2011-10-05 Impact factor: 16.971

5. Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures.

Authors: Akira R Kinjo; Gert-Jan Bekker; Hirofumi Suzuki; Yuko Tsuchiya; Takeshi Kawabata; Yasuyo Ikegawa; Haruki Nakamura
Journal: Nucleic Acids Res Date: 2016-10-26 Impact factor: 16.971

6. Virtual interactomics of proteins from biochemical standpoint.

Authors: Jaroslav Kubrycht; Karel Sigler; Pavel Souček
Journal: Mol Biol Int Date: 2012-08-08

6 in total