Literature DB >> 20558184

High performance set of PseAAC and sequence based descriptors for protein classification.

Loris Nanni1, Sheryl Brahnam, Alessandra Lumini.   

Abstract

The study of reliable automatic systems for protein classification is important for several domains, including finding novel drugs and vaccines. The last decade has seen a number of advances in the development of reliable systems for classifying proteins. Of particular interest has been the exploration of new methods for extracting features from a protein that enhance classification for a given problem. Most methods developed to date, however, have been evaluated in only one or two application areas. Methods have not been explored that generalize well across a number of application areas and datasets. The aim of this study is to find a general method, or an ensemble of methods, that works well on different protein classification datasets and problems. Towards this end, we evaluate several feature extraction approaches for representing proteins starting from their amino acid sequence as well as different feature descriptor combinations using an ensemble of classifiers (support vector machines). In our experiments, more than ten different protein descriptors are compared using nine different datasets. We develop our system using a blind testing protocol, where the parameters of the system are optimized using one dataset and then validated using the other datasets (and so on for each dataset). Although different stand-alone classifiers work well on some datasets and not on others, we have discovered that fusion among different methods obtains a good performance across all the tested datasets, especially when using the weighted sum rule. Included in our feature descriptor combinations is the introduction of two new descriptors, one based on wavelets and the other based on amino acid groups. Using our system, both outperform their standard implementations. We also consider as a baseline the simple amino acid composition (AC) and dipeptide composition (2G), since they have been widely used for protein classification. Our proposed method outperforms AC and 2G. Copyright 2010 Elsevier Ltd. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20558184     DOI: 10.1016/j.jtbi.2010.06.006

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  9 in total

1.  Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.

Authors:  Die Chen; Hua Zhang; Zeqi Chen; Bo Xie; Ye Wang
Journal:  Comput Math Methods Med       Date:  2022-06-28       Impact factor: 2.809

2.  Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.

Authors:  Ruifeng Xu; Jiyun Zhou; Hongpeng Wang; Yulan He; Xiaolong Wang; Bin Liu
Journal:  BMC Syst Biol       Date:  2015-02-06

3.  An empirical study of different approaches for protein classification.

Authors:  Loris Nanni; Alessandra Lumini; Sheryl Brahnam
Journal:  ScientificWorldJournal       Date:  2014-06-15

4.  Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics.

Authors:  Lisa M Breckels; Sean B Holden; David Wojnar; Claire M Mulvey; Andy Christoforou; Arnoud Groen; Matthew W B Trotter; Oliver Kohlbacher; Kathryn S Lilley; Laurent Gatto
Journal:  PLoS Comput Biol       Date:  2016-05-13       Impact factor: 4.475

5.  UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

Authors:  Pu-Feng Du; Wei Zhao; Yang-Yang Miao; Le-Yi Wei; Likun Wang
Journal:  Int J Mol Sci       Date:  2017-11-14       Impact factor: 5.923

6.  Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.

Authors:  Wei Wang; Lin Sun; Shiguang Zhang; Hongjun Zhang; Jinling Shi; Tianhe Xu; Keliang Li
Journal:  BMC Bioinformatics       Date:  2017-06-12       Impact factor: 3.169

7.  PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.

Authors:  Yanbin Wang; Zhuhong You; Xiao Li; Xing Chen; Tonghai Jiang; Jingting Zhang
Journal:  Int J Mol Sci       Date:  2017-05-11       Impact factor: 5.923

8.  Consistency and variation of protein subcellular location annotations.

Authors:  Ying-Ying Xu; Hang Zhou; Robert F Murphy; Hong-Bin Shen
Journal:  Proteins       Date:  2020-09-26

9.  An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis.

Authors:  Chuanxin Zou; Jiayu Gong; Honglin Li
Journal:  BMC Bioinformatics       Date:  2013-03-09       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.