Literature DB >> 21609959

Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

H B Rao¹, F Zhu, G B Yang, Z R Li, Y Z Chen.

Abstract

Sequence-derived structural and physicochemical features have been extensively used for analyzing and predicting structural, functional, expression and interaction profiles of proteins and peptides. PROFEAT has been developed as a web server for computing commonly used features of proteins and peptides from amino acid sequence. To facilitate more extensive studies of protein and peptides, numerous improvements and updates have been made to PROFEAT. We added new functions for computing descriptors of protein-protein and protein-small molecule interactions, segment descriptors for local properties of protein sequences, topological descriptors for peptide sequences and small molecule structures. We also added new feature groups for proteins and peptides (pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, total amino acid properties and atomic-level topological descriptors) as well as for small molecules (atomic-level topological descriptors). Overall, PROFEAT computes 11 feature groups of descriptors for proteins and peptides, and a feature group of more than 400 descriptors for small molecules plus the derived features for protein-protein and protein-small molecule interactions. Our computational algorithms have been extensively tested and used in a number of published works for predicting proteins of specific structural or functional classes, protein-protein interactions, peptides of specific functions and quantitative structure activity relationships of small molecules. PROFEAT is accessible free of charge at http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/profnew.cgi.

Entities: Chemical Gene

Mesh：

Substances：
Ligands
Peptides
Proteins

Year: 2011 PMID： 21609959 PMCID： PMC3125735 DOI： 10.1093/nar/gkr284

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Sequence-derived structural and physicochemical features are highly useful for representing and distinguishing proteins or peptides of different structural, functional and interaction properties, and have been widely used in developing methods and software for predicting protein structural and functional classes (1–7), protein–protein interactions (8–10), protein–ligand interactions (11,12), protein substrates (13,14), molecular binding sites on proteins (15–20), subcellular locations (21), protein crystallization propensity (22–24) and peptides of specific properties (25–30). Web servers, such as PROFEAT (31) and PseAAC (http://www.csbio.sjtu.edu.cn/bioinf/PseAA/) (32), have been built to facilitate the computation of protein and peptide features. Nonetheless, some features important for studying proteins, peptides and molecular interactions have not been provided in these web servers. Examples of these features include atomic-level topological descriptors that are useful for structure–property correlations (33) and descriptors of total amino acid properties (TAAPs) that have been used for modeling protein conformational stability (34), ligand binding site structural features (35) and interaction with small molecules (36). Moreover, the descriptors provided in those available web servers are not suitable for analyzing local properties of sequence subsections, and additional works are needed to use descriptors to study protein–protein and protein–ligand interactions. Therefore, it is desirable to provide segment descriptors for local properties of subsections of protein sequences, and descriptors that can be straightforwardly used for exploring protein–protein and protein–small molecule interactions. We updated PROFEAT by adding new functions for computing descriptors of protein–protein and protein–small molecule interactions, segment descriptors for local properties of subsections of protein sequences, atomic-level topological descriptors for peptide sequences and small molecule structures, and topological polar surface areas of small molecules. Moreover, we added new feature groups such as pseudo-amino acid composition (PAAC), amphiphilic PAAC (APAAC), TAAPs, and atomic-level topological descriptors. The computational algorithms of these newly added feature groups have been extensively tested and used in a number of published works for predicting proteins and peptides of specific properties, protein–protein interactions, and quantitative structure activity relationships of small molecules. A list of publications using features covered by PROFEAT is provided in Supplementary Table S1 and in PROFEAT online server which can be accessed at http://bidd.cz3.nus.edu.sg/prof/part_of_publications.htm. PROFEAT homepage is shown in Figure 1. A list of features for proteins and peptides covered by this version of PROFEAT is summarized in Table 1 and a list of the topological descriptors for peptides and small molecules computed by PROFEAT is summarized in Supplementary Table S2.

Figure 1.

PROFEAT new web page.

Table 1.

List of PROFEAT computed features for proteins, peptides and protein–protein interactions

Feature group	Features	No. of descriptors	No. of descriptor values
Composition-1	Amino acid composition	1	20
Composition-2	Dipeptide composition	1	400
Autocorrelation 1	Normalized Moreau–Broto autocorrelation	^a	^a
Autocorrelation 2	Moran autocorrelation	^a	^a
Autocorrelation 3	Geary autocorrelation	^a	^a
Composition, Transition, Distribution	Composition	7	21
	Transition	7	21
	Distribution	7	105
Quasi-sequence order descriptors	Sequence order coupling number	2	90
Quasi-sequence order descriptors	Quasi-sequence order descriptors	2	150
PAAC	PAAC	^b	^b
APAAC	APAAC	^c	^c
Topological descriptors	Topological descriptors		405
TAAPs	TAP	^d	^d

aThe number depends on the choice of the number of properties of amino acid and the choice of the maximum values of the lag.

bThe number depends on the choice of the number of the set of amino acid properties and the choice of the λ value.

cThe number depends on the choice of the λ value.

dThe numbers depend on the choice of the number of properties of amino acid.

PROFEAT new web page. List of PROFEAT computed features for proteins, peptides and protein–protein interactions aThe number depends on the choice of the number of properties of amino acid and the choice of the maximum values of the lag. bThe number depends on the choice of the number of the set of amino acid properties and the choice of the λ value. cThe number depends on the choice of the λ value. dThe numbers depend on the choice of the number of properties of amino acid.

METHODS FOR NEWLY ADDED FEATURES AND FUNCTIONS

PAAC descriptors

First, three variables are derived from the original hydrophobicity values hydrophilicity values and side chain masses of 20 amino acids (i = 1, 2, … , 20) (32): Then, a correlation function can be computed as: from which, sequence order-correlated factors are defined as: is a parameter. Let f be the normalized occurrence frequency of 20 amino acids in the protein sequence, a set of descriptors called the PAAC are defined as: where w is the weighting factor for the sequence-order effect and is set to be w = 0.05 as suggested by Shen (32).

APAAC

From defined in Equation (1) and (2), the hydrophobicity and hydrophilicity correlation functions are defined (32), respectively, as: from which sequence order factors can be defined as: and APAAC are defined as: where w is the weighting factor and is taken as .

Topological descriptors at atomic level

Topological descriptors are based on graph theory and encode information about the types of atoms and bonds in a molecule and the nature of their connections. Examples of topological descriptors include counts of atom and bond types and indexes that encode the size, shape and types of branching in a molecule (37). These descriptors can be calculated from the 2D structure of a peptide automatically generated from its sequence based on the molecular structures of the amino acid residues in the sequence. Supplementary Table S2 gives a list of the topological descriptors computed by PROFEAT.

TAAP

TAAP descriptor for a specific physicochemical property i is defined as: where represents the property i of amino acid R that is normalized between 0 and 1 using the following expression, is the original amino acid property i for residue j. are, respectively, the minimum and maximum values of the original amino acid property i, and N is the length of the sequence (38–40).

Protein–protein interaction descriptors

Protein–protein interaction descriptors can be computed from the descriptors V = {V(i), i = 1, 2, … , n} and V = {V(i), i = 1, 2, … , n} of individual proteins A and B by three methods. In the first method, two protein-pair vectors V and V with dimension of 2n are constructed with V = (V, V) for interaction between proteins A and B and V = (V, Va) for interaction between proteins B and A (8,9). In the second method, one vector V with dimension of 2n is constructed: V = {V(i) + V(i), V(i) × V(i), i = 1, 2, … , n} which has the property that V is unchanged when a and b are exchanged. In the third method, one vector V with dimension of n2 is constructed by the tensor product: V = {V(k) = Va(i) × Vb(j), i = 1, 2, … , n, j = 1, 2, … , n, k = (i − 1) × n + j}.

Protein–ligand interaction descriptors

Protein–ligand interaction descriptor vector V can be constructed from the protein descriptor vector V (V(i), i = 1, … , n) and ligand descriptor vector Vl (Vl(i), i = 1, … , nl) by two methods similar to the first and third method for constructing protein pair descriptors. In the first method, one vector V with dimension of n + nl are constructed V = (V,Vl) for interaction between protein and ligand. In the second method, one vector V with dimension of n × nl is constructed by the tensor product: V = {v(k) = V(i) × Vl(j), i = 1, 2 , … , n, j = 1, 2, … , nl, k=(i − 1) × n + j}.

Segmented sequence descriptors

To characterize the local feature of a protein sequence, a protein sequence can be divided into several segments and descriptors are calculated for each segment.

Topological descriptors for small molecules

For small molecules, topological descriptors are calculated from the input 2D structures of small molecules in mol or sdf format. Names of these descriptors are the same as those for protein segments which are listed in Supplementary Table S2.

REMARKS

Compared with its earlier version, the updated PROFEAT is significantly enhanced in both the number of newly added features useful for representing various protein properties, and newly added functions for computing features for local properties of protein segments, protein–protein interactions, protein–small molecule interactions and small molecules. These enhancements are intended to provide more comprehensive features for facilitating the analysis and prediction of proteins, peptides, small molecules of different properties and molecular interactions involving proteins, peptides and small molecules. With continued interest in using molecular and interaction features and developing new algorithms for representing these features, new descriptors and functions such as those involving DNA, RNA and other nucleotides can be integrated into PROFEAT in the near future to better facilitate the study of molecular and bio-molecular functions and interactions.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Funding for open access charge: National Natural Science Foundation of China (grant number 20973118). Conflict of interest statement. None declared.

39 in total

1. Bioinformatics predictions of localization and targeting.

Authors: Shruti Rastogi; Burkhard Rost
Journal: Methods Mol Biol Date: 2010

2. Prediction of protein-RNA binding sites by a random forest method with combined features.

Authors: Zhi-Ping Liu; Ling-Yun Wu; Yong Wang; Xiang-Sun Zhang; Luonan Chen
Journal: Bioinformatics Date: 2010-05-18 Impact factor: 6.937

3. Influence of amino acid properties for discriminating outer membrane proteins at better accuracy.

Authors: M Michael Gromiha; Makiko Suwa
Journal: Biochim Biophys Acta Date: 2006-07-31

4. Amino acid sequence autocorrelation vectors and Bayesian-regularized genetic neural networks for modeling protein conformational stability: gene V protein mutants.

Authors: Leyden Fernández; Julio Caballero; José Ignacio Abreu; Michael Fernández
Journal: Proteins Date: 2007-06-01

5. Efficient peptide-MHC-I binding prediction for alleles with few known binders.

Authors: Laurent Jacob; Jean-Philippe Vert
Journal: Bioinformatics Date: 2007-12-14 Impact factor: 6.937

6. Protease substrate site predictors derived from machine learning on multilevel substrate phage display data.

Authors: Ching-Tai Chen; Ei-Wen Yang; Hung-Ju Hsu; Yi-Kun Sun; Wen-Lian Hsu; An-Suei Yang
Journal: Bioinformatics Date: 2008-10-29 Impact factor: 6.937

7. PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences.

Authors: Yoichi Murakami; Ruth V Spriggs; Haruki Nakamura; Susan Jones
Journal: Nucleic Acids Res Date: 2010-05-27 Impact factor: 16.971

8. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Authors: Z R Li; H H Lin; L Y Han; L Jiang; X Chen; Y Z Chen
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

9. DescFold: a web server for protein fold recognition.

Authors: Ren-Xiang Yan; Jing-Na Si; Chuan Wang; Ziding Zhang
Journal: BMC Bioinformatics Date: 2009-12-14 Impact factor: 3.169

10. Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families.

Authors: Marc Röttig; Christian Rausch; Oliver Kohlbacher
Journal: PLoS Comput Biol Date: 2010-01-08 Impact factor: 4.475

33 in total

Review 1. Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity.

Authors: Huilin Wang; Liubin Feng; Geoffrey I Webb; Lukasz Kurgan; Jiangning Song; Donghai Lin
Journal: Brief Bioinform Date: 2018-09-28 Impact factor: 11.622

2. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides.

Authors: Leyi Wei; Chen Zhou; Huangrong Chen; Jiangning Song; Ran Su
Journal: Bioinformatics Date: 2018-12-01 Impact factor: 6.937

Review 3. Computational Tools and Strategies to Develop Peptide-Based Inhibitors of Protein-Protein Interactions.

Authors: Maxence Delaunay; Tâp Ha-Duong
Journal: Methods Mol Biol Date: 2022

4. Computational chemogenomics: is it more than inductive transfer?

Authors: J B Brown; Yasushi Okuno; Gilles Marcou; Alexandre Varnek; Dragos Horvath
Journal: J Comput Aided Mol Des Date: 2014-04-27 Impact factor: 3.686

5. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites.

Authors: Zhen Chen; Xuhan Liu; Fuyi Li; Chen Li; Tatiana Marquez-Lago; André Leier; Tatsuya Akutsu; Geoffrey I Webb; Dakang Xu; Alexander Ian Smith; Lei Li; Kuo-Chen Chou; Jiangning Song
Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622

6. An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity.

Authors: Liqi Li; Yuan Zhang; Lingyun Zou; Changqing Li; Bo Yu; Xiaoqi Zheng; Yue Zhou
Journal: PLoS One Date: 2012-01-30 Impact factor: 3.240

7. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets.

Authors: Zhen Chen; Xuhan Liu; Pei Zhao; Chen Li; Yanan Wang; Fuyi Li; Tatsuya Akutsu; Chris Bain; Robin B Gasser; Junzhou Li; Zuoren Yang; Xin Gao; Lukasz Kurgan; Jiangning Song
Journal: Nucleic Acids Res Date: 2022-05-07 Impact factor: 19.160

8. Bagging with CTD--a novel signature for the hierarchical prediction of secreted protein trafficking in eukaryotes.

Authors: Geetha Govindan; Achuthsankar S Nair
Journal: Genomics Proteomics Bioinformatics Date: 2013-12-06 Impact factor: 7.691

Review 9. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.

Authors: Ahmet Sureyya Rifaioglu; Heval Atas; Maria Jesus Martin; Rengul Cetin-Atalay; Volkan Atalay; Tunca Doğan
Journal: Brief Bioinform Date: 2019-09-27 Impact factor: 11.622

10. Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models.

Authors: R Geetha Ramani; Shomona Gracia Jacob
Journal: PLoS One Date: 2013-03-07 Impact factor: 3.240