Literature DB >> 19417059

Ulla: a program for calculating environment-specific amino acid substitution tables.

Semin Lee1, Tom L Blundell.   

Abstract

SUMMARY: Amino acid residues are under various kinds of local environmental restraints, which influence substitution patterns. Ulla,(1) a program for calculating environment-specific substitution tables, reads protein sequence alignments and local environment annotations. The program produces a substitution table for every possible combination of environment features. Sparse data is handled using an entropy-based smoothing procedure to estimate robust substitution probabilities. AVAILABILITY: The Ruby source code is available under a Creative Commons Attribution-Noncommercial License along with additional documentation from http://www-cryst.bioc.cam.ac.uk/ulla. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19417059      PMCID: PMC2712337          DOI: 10.1093/bioinformatics/btp300

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

In the evolution of proteins, individual amino acid residues are under various kinds of local environmental restraints such as secondary structure type, solvent accessibility and hydrogen bonding patterns. Previous study of amino acid substitutions as a function of local environment has showed that there are clear differences among substitution patterns under various environmental restraints (Overington et al., 1992). The unique patterns of amino acid substitutions have been successfully exploited to predict the stability of protein mutants (Topham et al., 1993), to identify potential interaction sites (Chelliah et al., 2004; Gong and Blundell, 2008) and to detect remote sequence-structure homology (Chelliah et al., 2005). However, estimating amino acid substitution probabilities is not a trivial problem, especially when there are a very small number of observations in specific combinations of environments. To cope with the sparse data problem, an algorithm was developed by Sali (1991) as an extension of the method used by Sippl (1990) to derive robust potentials of mean force. Several variants of the generalized procedure such as Makesub (Topham et al., 1993) and SUBST (Mizuguchi, unpublished results) have been subsequently implemented for smoothing substitution probabilities. Nevertheless, each lacks crucial features implemented in the other, and they use slightly different procedures for smoothing substitution probabilities, which may lead to very different amino acid substitution matrices. To overcome these problems, we developed Ulla, a program for calculating environment-specific substitution tables (ESSTs), to unify all the major features of the previously developed programs and to provide additional functionalities. The program also generates heat maps from substitution tables to visualize the degree of conservation of amino acids under the environmental restraints.

2 DESCRIPTION

Ulla reads multiple sequence alignments and annotations for local environments in JOY template format (Mizuguchi et al., 1998a). Users can provide their own definition of environment features, and an environment feature can be constrained to count substitutions only when the environment of residues is conserved. Ulla also supports confining percent identity (PID) range of sequence pairs to be considered and uses BLOSUM-like weighting scheme (Henikoff and Henikoff, 1992) to minimize sampling bias from highly similar sequences. Ulla uses entropy-based smoothing procedures to reduce problems caused by sparse data. It is an iterative procedure that estimates probability distribution by perturbing the previous probability distribution with the successive measurement (Sali, 1991; Sippl, 1990). Hence, starting from a uniform frequency distribution, the estimated probability distribution at each step serves as an approximation for the next probability distribution (see Supplementary Material for details).

3 EXAMPLE USAGE

As an illustration, we generate ESSTs from HOMSTRAD alignments (Mizuguchi et al., 1998b) with environment feature definitions of secondary structure type and solvent accessibility (Fig. 1a): Actual annotations for the environment features need to be provided in PIR format:
Fig. 1.

Environment feature combinations and ESST generation. (a) The environment features are secondary structure type (H: helix, E: beta sheet, P: positive phi, C: coil) and solvent accessibility (A: solvent accessible, a: solvent inaccessible). Eight sets of combinations of environment features are generated. (b) Heat maps from each of resultant ESSTs. Blue to red indicates log-odds ratio of substitution probabilities.

# name of feature (string);\\ # values adopted in .tem (alignment) file (string);\\ # class labels assigned for each value (string);\\ # constrained or not (T or F);\\ # silent (used as masks)? (T or F) secondary structure and phi angle;HEPC;HEPC;F;F solvent accessibility;TF;Aa;F;F JOY (Mizuguchi et al., 1998a) is useful to annotate the alignments with the structural environments, but Ulla recognizes any environment feature definition which conforms to the format above. Paths for an environment definition file and a file containing the list of environment feature annotated alignments are given to Ulla as input: Ulla produces three different types of substitution tables: raw counts tables, substitution probability tables and log-odds ratio tables. Heat maps also can be generated to visualize resultant substitution tables (Fig. 1b). $ ulla -c feature.def -l alignments.lst Environment feature combinations and ESST generation. (a) The environment features are secondary structure type (H: helix, E: beta sheet, P: positive phi, C: coil) and solvent accessibility (A: solvent accessible, a: solvent inaccessible). Eight sets of combinations of environment features are generated. (b) Heat maps from each of resultant ESSTs. Blue to red indicates log-odds ratio of substitution probabilities.

4 CONCLUSION

Ulla generates ESSTs from a sparse data set using entropy-based smoothing procedures. It allows us to conduct analyses of amino acid substitution patterns under various environmental restraints. The resultant ESSTs can be exploited in many ways such as binding site prediction, remote homology detection, and protein stability estimation. Ulla is publicly available on the web site http://github.com/semin/ulla, where the code is maintained in a Git repository, and its pre-built RubyGems package can be obtained from http://rubyforge.org/projects/ulla.
  9 in total

1.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

2.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites.

Authors:  Vijayalakshmi Chelliah; Lan Chen; Tom L Blundell; Simon C Lovell
Journal:  J Mol Biol       Date:  2004-10-01       Impact factor: 5.469

3.  Functional restraints on the patterns of amino acid substitutions: application to sequence-structure homology recognition.

Authors:  Vijayalakshmi Chelliah; Tom Blundell; Kenji Mizuguchi
Journal:  Proteins       Date:  2005-12-01

4.  Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds.

Authors:  J Overington; D Donnelly; M S Johnson; A Sali; T L Blundell
Journal:  Protein Sci       Date:  1992-02       Impact factor: 6.725

5.  HOMSTRAD: a database of protein structure alignments for homologous families.

Authors:  K Mizuguchi; C M Deane; T L Blundell; J P Overington
Journal:  Protein Sci       Date:  1998-11       Impact factor: 6.725

6.  Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins.

Authors:  M J Sippl
Journal:  J Mol Biol       Date:  1990-06-20       Impact factor: 5.469

7.  Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables.

Authors:  C M Topham; A McLeod; F Eisenmenger; J P Overington; M S Johnson; T L Blundell
Journal:  J Mol Biol       Date:  1993-01-05       Impact factor: 5.469

8.  JOY: protein sequence-structure representation and analysis.

Authors:  K Mizuguchi; C M Deane; T L Blundell; M S Johnson; J P Overington
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

9.  Discarding functional residues from the substitution table improves predictions of active sites within three-dimensional structures.

Authors:  Sungsam Gong; Tom L Blundell
Journal:  PLoS Comput Biol       Date:  2008-10-03       Impact factor: 4.475

  9 in total
  8 in total

Review 1.  Structural and functional constraints in the evolution of protein families.

Authors:  Catherine L Worth; Sungsam Gong; Tom L Blundell
Journal:  Nat Rev Mol Cell Biol       Date:  2009-09-16       Impact factor: 94.444

2.  Structural and functional restraints on the occurrence of single amino acid variations in human proteins.

Authors:  Sungsam Gong; Tom L Blundell
Journal:  PLoS One       Date:  2010-02-12       Impact factor: 3.240

3.  MEDELLER: homology-based coordinate generation for membrane proteins.

Authors:  Sebastian Kelm; Jiye Shi; Charlotte M Deane
Journal:  Bioinformatics       Date:  2010-10-05       Impact factor: 6.937

4.  BioRuby: bioinformatics software for the Ruby programming language.

Authors:  Naohisa Goto; Pjotr Prins; Mitsuteru Nakao; Raoul Bonnal; Jan Aerts; Toshiaki Katayama
Journal:  Bioinformatics       Date:  2010-08-25       Impact factor: 6.937

5.  SDM: a server for predicting effects of mutations on protein stability.

Authors:  Arun Prasad Pandurangan; Bernardo Ochoa-Montaño; David B Ascher; Tom L Blundell
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

6.  Protein sites with more coevolutionary connections tend to evolve slower, while more variable protein families acquire higher coevolutionary connections.

Authors:  Sapan Mandloi; Saikat Chakrabarti
Journal:  F1000Res       Date:  2017-04-10

Review 7.  Genomes, structural biology and drug discovery: combating the impacts of mutations in genetic disease and antibiotic resistance.

Authors:  Arun Prasad Pandurangan; David B Ascher; Sherine E Thomas; Tom L Blundell
Journal:  Biochem Soc Trans       Date:  2017-04-15       Impact factor: 5.407

8.  Improving the accuracy of the structure prediction of the third hypervariable loop of the heavy chains of antibodies.

Authors:  Mario Abdel Messih; Rosalba Lepore; Paolo Marcatili; Anna Tramontano
Journal:  Bioinformatics       Date:  2014-06-13       Impact factor: 6.937

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.