Literature DB >> 20461153

CFP: a web-server for constructing sequence-based protein conformational flexibility profiles.

Igor B Kuznetsov1, Shalom Rackovsky.   

Abstract

UNLABELLED: Many proteins contain conformationally flexible segments that undergo significant changes in the backbone conformation or completely lack a well-defined conformation. Previously, we have developed the generalized local propensity (GLP), a quantitative sequence-based measure of the protein backbone flexibility. In this paper, we present the CFP (Conformational Flexibility Profile) web-server that constructs the GLP flexibility profile for a user-submitted sequence and uses this profile to identify segments with high backbone flexibility. The statistical significance of a flexible sequence segment is assessed using the discrete scan statistics based on the density of flexible residues observed in this segment. AVAILABILITY: CFP is publicly available at http://cfp.rit.albany.edu.

Entities:  

Keywords:  conformational variability; flexibility; local propensity; protein backbone; sequence

Year:  2009        PMID: 20461153      PMCID: PMC2859570          DOI: 10.6026/97320630004176

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Many proteins contain conformationally flexible segments. These segments undergo significant changes in backbone conformation, or are completely disordered (lack a well-defined structure) [1-3]. A quantitative representation of the conformational flexibility of the protein backbone is important for many applications. Previously, we developed generalized local propensity (GLP), a quantitative sequence-based measure of backbone flexibility [4]. The GLP can be used to construct sequence-based protein flexibility profiles, and provides an objective numeric threshold for defining conformationally flexible segments [5]. For a given sequence position k, the GLP measures the width of the context-dependent distribution of backbone conformations accessible to this position, glp(k) (see references [4-5] for details). If glp(k) ≥ 1, it indicates that sequence position k is conformationally flexible. Here, we present the CFP (Conformational Flexibility Profile) web server that constructs the GLP flexibility profile for a user-submitted sequence and uses this profile to identify segments with high conformational flexibility. Below is a brief outline of the steps implemented in CFP: The GLP flexibility profile is constructed for the query sequence and then smoothed using a sliding window of size W1. Consecutive positions which have GLP above a threshold T1 are merged into seed flexible segments. Each seed flexible segment is extended by adding extension windows of size W2 until its average GLP drops below an extension threshold T2. An extension window is added only if its average GLP is above a certain threshold T3. This extension procedure is similar to that used in the SEG program [6]. The extended flexible segments are reported in the final table. If the number of flexible residues observed in a given final flexible segment is unusually high (p-value ≪ 0.05), then this segment is marked as statically significant. The significance of the number of flexible residues is estimated using the discrete scan statistic. This statistical procedure is the same as the one we previously implemented in the BIAS software to identify statistically significant clusters of userspecified amino acid types [7-8]. The web-server is publicly available at http://cfp.rit.albany.edu.

Methodology

Input

The only mandatory input is the query protein sequence. All other input fields have default values that can be modified by advanced users, if desired. These input fields are described below. Instructions for each field and general information about the methodology and the output format can be found by clicking a corresponding help hyperlink on the input page.

Smoothing window size

The size of the sliding window (W) used to smooth the raw profile. High values of W tend to reveal long flexible segments and mask the short ones. Lower values tend to reveal short segments.

GLP threshold for seed segments

The threshold T used to identify seed flexible segments. Contiguous sequence positions that have values of the smoothed GLP profile above this threshold are merged into a seed flexible segment.

Extension threshold

Each seed segment with high flexibility is extended on both sides until its average GLP drops below this threshold (T)

Extension window threshold

The ends of a seed flexible segment are extended if the extension window has the average GLP above this threshold (T).

Extension window size

The size of the extension window (W).

Hat­shaped local smoother

Positions in the center of the smoothing window contribute more to the smoothed GLP score than positions at the ends of the window.

Equal weights smoother

The smoothed GLP score is the unweighted average computed over all positions in the window.

Minimum seed segment

Seed segments with length smaller than this threshold are not extended.

Maximum separation between merged segments

Flexible segments separated by this or smaller number of positions are merged into one.

Flexible residues

A set of flexible residues used in the scan statistics to estimate the statistical significance of flexible segments (G, H, D, N by default).

SWIS-PROT or PDB background frequencies

The amino acid frequencies of the SwissProt or Protein Databank are used to estimate the statistical significance.

X axis size, Y axis size

The size of X and Y axis of the plot in pixels.

Create a plot

Display the smoothed GLP profile in web­browser.

Create a text file

Save the raw and smoothed GLP profiles in a text file.

Output

The CFP output consists of two parts. The first part shows the smoothed GLP plot of the input sequence (Figure 1A). The second part shows the detailed information for every flexible segment found in the input sequence and the p­values that provide the estimates of the statistical significance (Figure 1B). If the p­value for a given segment is less than 0.05, this segment has an unusually high density of residues with high degree of backbone flexibility.
Figure 1

a) First part of the output ­ the GLP flexibility plot; b) Second part of the output ­ the summary of all flexible segments.

  8 in total

1.  On the properties and sequence context of structurally ambivalent fragments in proteins.

Authors:  Igor B Kuznetsov; S Rackovsky
Journal:  Protein Sci       Date:  2003-11       Impact factor: 6.725

2.  Normal modes for predicting protein motions: a comprehensive database assessment and associated Web tool.

Authors:  Vadim Alexandrov; Ursula Lehnert; Nathaniel Echols; Duncan Milburn; Donald Engelman; Mark Gerstein
Journal:  Protein Sci       Date:  2005-03       Impact factor: 6.725

3.  A novel sensitive method for the detection of user-defined compositional bias in biological sequences.

Authors:  Igor B Kuznetsov; Seungwoo Hwang
Journal:  Bioinformatics       Date:  2006-02-24       Impact factor: 6.937

4.  ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences.

Authors:  Igor B Kuznetsov
Journal:  Bioinformatics       Date:  2008-05-14       Impact factor: 6.937

5.  Analysis of compositionally biased regions in sequence databases.

Authors:  J C Wootton; S Federhen
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

6.  Comparative computational analysis of prion proteins reveals two fragments with unusual structural properties and a pattern of increase in hydrophobicity associated with disease-promoting mutations.

Authors:  Igor B Kuznetsov; Shalom Rackovsky
Journal:  Protein Sci       Date:  2004-12       Impact factor: 6.725

7.  The unfoldomics decade: an update on intrinsically disordered proteins.

Authors:  A Keith Dunker; Christopher J Oldfield; Jingwei Meng; Pedro Romero; Jack Y Yang; Jessica Walton Chen; Vladimir Vacic; Zoran Obradovic; Vladimir N Uversky
Journal:  BMC Genomics       Date:  2008-09-16       Impact factor: 3.969

8.  Sequence-similar, structure-dissimilar protein pairs in the PDB.

Authors:  Mickey Kosloff; Rachel Kolodny
Journal:  Proteins       Date:  2008-05-01
  8 in total
  1 in total

1.  Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains.

Authors:  Robert W Williams; Bin Xue; Vladimir N Uversky; A Keith Dunker
Journal:  Intrinsically Disord Proteins       Date:  2013-04-01
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.