| Literature DB >> 16845018 |
Z R Li1, H H Lin, L Y Han, L Jiang, X Chen, Y Z Chen.
Abstract
Sequence-derived structural and physicochemical features have frequently been used in the development of statistical learning models for predicting proteins and peptides of different structural, functional and interaction profiles. PROFEAT (Protein Features) is a web server for computing commonly-used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes six feature groups composed of ten features that include 51 descriptors and 1447 descriptor values. The computed features include amino acid composition, dipeptide composition, normalized Moreau-Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors and the composition, transition and distribution of various structural and physicochemical properties. In addition, it can also compute previous autocorrelations descriptors based on user-defined properties. Our computational algorithms were extensively tested and the computed protein features have been used in a number of published works for predicting proteins of functional classes, protein-protein interactions and MHC-binding peptides. PROFEAT is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/prof/prof.cgi.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845018 PMCID: PMC1538821 DOI: 10.1093/nar/gkl305
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1PROFEAT feature-display options window
List of structural and physicochemical features of proteins and peptides commonly-used for predicting proteins and peptides of specific properties by using statistical learning methods
| Feature group | Feature | Feature index | No. of descriptors | No. of descriptor values |
|---|---|---|---|---|
| Amino acid, dipeptide composition | Amino acid composition | F1.1 | 1 | 20 |
| Dipeptide composition | F1.2 | 1 | 400 | |
| Autocorrelation 1 | Normalized Moreau–Broto autocorrelation | F2.1 | 8 | 240 |
| Autocorrelation 2 | Moran autocorrelation | F3.1 | 8 | 240 |
| Autocorrelation 3 | Geary autocorrelation | F4.1 | 8 | 240 |
| Composition, transition, distribution | Composition | F5.1 | 7 | 21 |
| Transition | F5.2 | 7 | 21 | |
| Distribution | F5.3 | 7 | 105 | |
| Sequence-order | Sequence-order-coupling number | F6.1 | 2 | 60 |
| Quasi-sequence-order descriptors | F6.2 | 2 | 100 |
Amino acid attributes and the division of the amino acids into three groups for each attribute
| Attribute | Divisions | ||
|---|---|---|---|
| Hydrophobicity | Polar | Neutral | Hydrophobicity |
| R,K,E,D,Q,N | G, A, S,T,P,H,Y | C,L,V,I,M,F,W | |
| Normalized van der Waals volume | Volume range 0–2.78 | Volume range 2.95–94.0 | Volume range 4.03–8.08 |
| G,A,S,T,P,D | N,V,E,Q,I,L | M,H,K,F,R,Y,W | |
| Polarity | Polarity value 4.9–6.2 | Polarity value 8.0–9.2 | Polarity value 10.4–13.0 |
| L,I,F,W,C,M,V,Y | P,A,T,G,S | H,Q,R,K,N,E,D | |
| Polarizability | Polarizability value 0–1.08 | Polarizability value 0.128–120.186 | Polarizability value 0.219–0.409 |
| G,A,S,D,T | C,P,N,V,E,Q,I,L | K,M,H,F,R,Y,W | |
| Charge | Positive | Neutral | Negative |
| KR | ANCQGHILMFPSTWYV | DE | |
| Secondary structure | Helix | Strand | Coil |
| EALMQKRH | VIYCWFT | GNPSD | |
| Solvent accessibility | Buried | Exposed | Intermediate |
| ALFCGIVW | PKQEND | MPSTHY |
The division is based on the clusters of the amino acid indices of Tomii and Kanehisa (5,40) for each of the seven attributes. For such attributes as secondary structure and solvent accessibility, the division is based on statistical appearance of each amino acid in a specific state.