Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices.

Literature DB >> 18606172

Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices.

Cristian Robert Munteanu¹, Humberto González-Díaz, Alexandre L Magalhães.

Abstract

The huge amount of new proteins that need a fast enzymatic activity characterization creates demands of protein QSAR theoretical models. The protein parameters that can be used for an enzyme/non-enzyme classification includes the simpler indices such as composition, sequence and connectivity, also called topological indices (TIs) and the computationally expensive 3D descriptors. A comparison of the 3D versus lower dimension indices has not been reported with respect to the power of discrimination of proteins according to enzyme action. A set of 966 proteins (enzymes and non-enzymes) whose structural characteristics are provided by PDB/DSSP files was analyzed with Python/Biopython scripts, STATISTICA and Weka. The list of indices includes, but it is not restricted to pure composition indices (residue fractions), DSSP secondary structure protein composition and 3D indices (surface and access). We also used mixed indices such as composition-sequence indices (Chou's pseudo-amino acid compositions or coupling numbers), 3D-composition (surface fractions) and DSSP secondary structure amino acid composition/propensities (obtained with our Prot-2S Web tool). In addition, we extend and test for the first time several classic TIs for the Randic's protein sequence Star graphs using our Sequence to Star Graph (S2SG) Python application. All the indices were processed with general discriminant analysis models (GDA), neural networks (NN) and machine learning (ML) methods and the results are presented versus complexity, average of Shannon's information entropy (Sh) and data/method type. This study compares for the first time all these classes of indices to assess the ratios between model accuracy and indices/model complexity in enzyme/non-enzyme discrimination. The use of different methods and complexity of data shows that one cannot establish a direct relation between the complexity and the accuracy of the model.

Entities: Species

Mesh：

Substances：
Enzymes
Proteins

Year: 2008 PMID： 18606172 DOI： 10.1016/j.jtbi.2008.06.003

Source DB: PubMed Journal: J Theor Biol ISSN： 0022-5193 Impact factor: 2.691

Keyword Cloud
Cited

16 in total

1. Prediction of ketoacyl synthase family using reduced amino acid alphabets.

Authors: Wei Chen; Pengmian Feng; Hao Lin
Journal: J Ind Microbiol Biotechnol Date: 2011-10-26 Impact factor: 3.346

Review 2. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM).

Authors: Michael Fernandez; Julio Caballero; Leyden Fernandez; Akinori Sarai
Journal: Mol Divers Date: 2010-03-20 Impact factor: 2.943

3. Protein sequence analysis based on hydropathy profile of amino acids.

Authors: Xiao-li Xie; Li-fei Zheng; Ying Yu; Li-ping Liang; Man-cai Guo; John Song; Zhi-fa Yuan
Journal: J Zhejiang Univ Sci B Date: 2012-02 Impact factor: 3.066

4. Biomacromolecular quantitative structure-activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein-protein binding affinity.

Authors: Peng Zhou; Congcong Wang; Feifei Tian; Yanrong Ren; Chao Yang; Jian Huang
Journal: J Comput Aided Mol Des Date: 2013-01-10 Impact factor: 3.686

Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices.

1. Prediction of ketoacyl synthase family using reduced amino acid alphabets.

Review 2. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM).

3. Protein sequence analysis based on hydropathy profile of amino acids.

4. Biomacromolecular quantitative structure-activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein-protein binding affinity.

5. Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method.

6. Computational Approaches for Automated Classification of Enzyme Sequences.

7. Predicting subcellular location of proteins using integrated-algorithm method.

8. Fragment-based optimization of small molecule CXCL12 inhibitors for antagonizing the CXCL12/CXCR4 interaction.

9. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids.

10. Using feature optimization-based support vector machine method to recognize the β-hairpin motifs in enzymes.