Literature DB >> 19238251

FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins.

Abstract

UNLABELLED: Conformational switches observed in the protein backbone play a key role in a variety of fundamental biological activities. This paper describes a web-server that implements a pattern recognition algorithm trained on the examples from the Database of Macromolecular Movements to predict residue positions involved in conformational switches. Prediction can be performed at an adjustable false positive rate using a user-supplied protein sequence in FASTA format or a structure in a Protein Data Bank (PDB) file. If a protein sequence is submitted, then the web-server uses sequence-derived information only (such as evolutionary conservation of residue positions). If a PDB file is submitted, then the web-server uses sequence-derived information and residue solvent accessibility calculated from this file. AVAILABILITY: FlexPred is publicly available at http://flexpred.rit.albany.edu.

Entities: Chemical

Keywords: conformational variability; prediction; protein flexibility; structural transition; support vector machine

Year: 2008 PMID： 19238251 PMCID： PMC2639688 DOI： 10.6026/97320630003134

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Proteins are flexible macromolecules. The protein backbone can switch from one specific folded conformation to another. Conformational switches have been shown to be involved in a variety of biological functions, such as catalysis, macromolecular recognition, signal transduction, locomotion, and a number of pathogenic disorders [1-2]. Molecular dynamics simulations of long time scale conformational transitions are very computationally expensive and therefore impractical for large-scale studies [3]. Several bioinformatics methods that attempt to predict conformational switches from sequence information alone have been developed. Most of these methods were not trained to predict conformational switches directly by using a dataset of experimental examples of such switches, but rather to identify them indirectly by predicting certain structural properties related to protein flexibility in general. Flexibility-related properties used to train these methods include the crystallographic B-factor [4-5], the ambiguity in secondary structure assignment [6-7], and the magnitude of large-scale fluctuations obtained from coarse grained protein dynamics modeling [8]. A dataset of experimental examples of flexible linkers connecting structurally rigid domains was used directly to develop a sequence-based method for predicting hinge points [2]. Recently, we used a dataset of experimentally characterized conformational switches to develop predictors of flexible residue positions and studied the performance of these predictors. The results of this study showed that positions involved in conformational switches can be predicted with balanced sensitivity and specificity for all types of secondary structure and all types of protein movements [9]. Here, we use predictive models from our previous work to develop FlexPred, a web-server that uses a protein sequence alone or in combination with solvent accessibility to predict residue positions involved in conformational switches. A detailed description of the training and testing methods was provided in our original paper [9]. We therefore only briefly describe the methodology here. We used a non-redundant dataset obtained from the Database of Macromolecular Movements that contains examples of conformational switches derived by comparing experimental atomic-level structures of the same protein solved under different conditions [10]. This dataset was used to train a supervised pattern recognition method, Support Vector Machine (SVM), to distinguish between flexible and rigid residue positions. We implemented two types of encoding of the input sequence. One is the binary encoding which utilizes the input sequence alone and represents the 20 amino acid types as 20 mutually orthogonal binary vectors. The other is the PSSM encoding which accounts for evolutionary conservation of the input sequence and is based on the PSI-BLAST position-specific scoring matrix (PSSM). If the user submits a protein structure in a Protein Data Bank (PDB) file, then the normalized residue solvent accessibility calculated from this file is also used for prediction along with one of the two types of sequence encoding. Thus, we have four possible ways of encoding protein sequence with or without solvent accessibility. Accordingly, four SVM predictors, one for each of the four combinations, were implemented.

Input

FlexPred is freely available at http://flexpred.rit.albany.edu. It has a simple intuitive user interface that consists of four input fields described below. Instructions for each field and general information about the methodology and the output format can be found by clicking a corresponding help hyperlink on the input page.

Field 1: Protein sequence or PDB file to be analyzed

For sequence-based prediction, the user can paste or upload an amino acid sequence in FASTA format. For the prediction based on a protein sequence and solvent accessibility of its residue positions, the user can either upload a PDB file or provide a four-character PDB id and let the server automatically download a corresponding file from ftp://ftp.wwpdb.org.

Field 2: Selection of encoding method

The user can select either binary or PSSM sequence encoding. The PSSM encoding performs better if protein sequence information alone is used for prediction, whereas the binary encoding performs better if both protein sequence and residue solvent accessibility are used. Therefore, the PSSM encoding is the default method for the sequence-based submissions, while the binary encoding is the default method for the PDB-based submissions.

Field 3: Selection of false positive rate

The false positive rate (FPr) gives the fraction of rigid positions incorrectly predicted as flexible, whereas the true positive rate (TPr) gives the fraction of flexible positions correctly predicted as flexible. For any prediction method, when FPr is decreased, TPr is also decreased, and vice versa. The user can choose FPr of 5¢, 10¢, 15¢, or 20¢. Since most statistical tests consider the 5¢ chance of false positive prediction to be an acceptable level, the FPr of 5¢ is selected by default.

Field 4: Selection of retrieval method

The user can choose to receive results by E-mail (default) or manually retrieve them using a temporary URL provided upon submission. The results of prediction are kept on the web-server for one day from the moment of submission, and deleted afterwards.

Output

The output from FlexPred consists of a header that describes the output format itself, the selected encoding type, the selected false positive rate, the submitted sequence in FASTA format, and the predicted labels for each residue position (Figure 1). The labels are ’R‘ (rigid) and ’F‘ (flexible). The column ’S_PRB‘ shows the probability of label ’F‘ for each residue position. The probability of label ’R‘ is (1-P), where P is the probability of label ’F‘. The probabilities are in range [0.0, 1.0]. Higher probability corresponds to a greater prediction confidence.

Figure 1

Sample FlexPred output

Future development

To the best of the authors' knowledge, FlexPred is the only on-line method for predicting conformational switches in proteins directly trained on a large dataset of experimentally characterized examples. We will continue updating FlexPred by adding new predictive models and new experimental examples as they become available.

10 in total

1. Predicting conformational switches in proteins.

Authors: M Young; K Kirshenbaum; K A Dill; S Highsmith
Journal: Protein Sci Date: 1999-09 Impact factor: 6.725

Review 2. Proteins that convert from alpha helix to beta sheet: implications for folding and disease.

Authors: M Gross
Journal: Curr Protein Pept Sci Date: 2000-12 Impact factor: 3.272

3. Prediction of protein B-factor profiles.

Authors: Zheng Yuan; Timothy L Bailey; Rohan D Teasdale
Journal: Proteins Date: 2005-03-01

4. PROFbval: predict flexible and rigid residues in proteins.

Authors: Avner Schlessinger; Guy Yachdav; Burkhard Rost
Journal: Bioinformatics Date: 2006-02-02 Impact factor: 6.937

5. Identifying sequence regions undergoing conformational change via predicted continuum secondary structure.

Authors: Mikael Bodén; Timothy L Bailey
Journal: Bioinformatics Date: 2006-05-23 Impact factor: 6.937

6. How well can we understand large-scale protein motions using normal modes of elastic network models?

Authors: Lei Yang; Guang Song; Robert L Jernigan
Journal: Biophys J Date: 2007-05-04 Impact factor: 4.033

7. Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data.

Authors: Igor B Kuznetsov
Journal: Proteins Date: 2008-07

8. The Database of Macromolecular Motions: new features added at the decade mark.

Authors: Samuel Flores; Nathaniel Echols; Duncan Milburn; Brandon Hespenheide; Kevin Keating; Jason Lu; Stephen Wells; Eric Z Yu; Michael Thorpe; Mark Gerstein
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Wiggle-predicting functionally flexible regions from primary sequence.

Authors: Jenny Gu; Michael Gribskov; Philip E Bourne
Journal: PLoS Comput Biol Date: 2006-06-05 Impact factor: 4.475

10. Hinge Atlas: relating protein sequence to sites of structural flexibility.

Authors: Samuel C Flores; Long J Lu; Julie Yang; Nicholas Carriero; Mark B Gerstein
Journal: BMC Bioinformatics Date: 2007-05-22 Impact factor: 3.169

10 in total

16 in total

1. Palmitoylation controls DLK localization, interactions and activity to ensure effective axonal injury signaling.

Authors: Sabrina M Holland; Kaitlin M Collura; Andrea Ketschek; Kentaro Noma; Toby A Ferguson; Yishi Jin; Gianluca Gallo; Gareth M Thomas
Journal: Proc Natl Acad Sci U S A Date: 2015-12-30 Impact factor: 11.205

2. Structural and Functional Attributes of the Interleukin-36 Receptor.

Authors: Guanghui Yi; Joel A Ybe; Siddhartha S Saha; Gary Caviness; Ernest Raymond; Rajkumar Ganesan; M Lamine Mbow; C Cheng Kao
Journal: J Biol Chem Date: 2016-06-15 Impact factor: 5.157

3. Structure-function studies on non-synonymous SNPs of chemokine receptor gene implicated in cardiovascular disease: a computational approach.

Authors: A Sai Ramesh; Rao Sethumadhavan; Padma Thiagarajan
Journal: Protein J Date: 2013-12 Impact factor: 2.371

4. Prediction of protein motions from amino acid sequence and its application to protein-protein interaction.

Authors: Shuichi Hirose; Kiyonobu Yokota; Yutaka Kuroda; Hiroshi Wako; Shigeru Endo; Satoru Kanai; Tamotsu Noguchi
Journal: BMC Struct Biol Date: 2010-07-13

5. PredyFlexy: flexibility and local structure prediction from sequence.

Authors: Alexandre G de Brevern; Aurélie Bornot; Pierrick Craveur; Catherine Etchebest; Jean-Christophe Gelly
Journal: Nucleic Acids Res Date: 2012-06-11 Impact factor: 16.971

Review 6. Computer-Aided Protein Directed Evolution: a Review of Web Servers, Databases and other Computational Tools for Protein Engineering.

Authors: Rajni Verma; Ulrich Schwaneberg; Danilo Roccatano
Journal: Comput Struct Biotechnol J Date: 2012-10-22 Impact factor: 7.271

10. An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation.

Authors: Thanawadee Preeprem; Greg Gibson
Journal: BioData Min Date: 2013-12-23 Impact factor: 2.522