Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Feature selection and the class imbalance problem in predicting protein function from sequence.

Literature DB >> 16231961

Feature selection and the class imbalance problem in predicting protein function from sequence.

Ali Al-Shahib¹, Rainer Breitling, David Gilbert.

Abstract

When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence features. However, there are two issues to consider before successful functional prediction can take place: identifying discriminatory features, and overcoming the challenge of a large imbalance in the training data. We show that by applying feature subset selection followed by undersampling of the majority class, significantly better support vector machine (SVM) classifiers are generated compared with standard machine learning approaches. As well as revealing that the features selected could have the potential to advance our understanding of the relationship between sequence and function, we also show that undersampling to produce fully balanced data significantly improves performance. The best discriminating ability is achieved using SVMs together with feature selection and full undersampling; this approach strongly outperforms other competitive learning algorithms. We conclude that this combined approach can generate powerful machine learning classifiers for predicting protein function directly from sequence.

Entities: Disease

Mesh：

Substances：
Proteins

Year: 2005 PMID： 16231961 DOI： 10.2165/00822942-200504030-00004

Source DB: PubMed Journal: Appl Bioinformatics ISSN： 1175-5636

Keyword Cloud
Cited

11 in total

10. Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines.

Authors: Zhi Qun Tang; Hong Huang Lin; Hai Lei Zhang; Lian Yi Han; Xin Chen; Yu Zong Chen
Journal: Bioinform Biol Insights Date: 2009-11-24

Feature selection and the class imbalance problem in predicting protein function from sequence.

1. A top-down approach to classify enzyme functional classes and sub-classes using random forest.

2. PoGO: Prediction of Gene Ontology terms for fungal proteins.

3. Identification of protein functions using a machine-learning approach based on sequence-derived properties.

4. Class prediction for high-dimensional class-imbalanced data.

5. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.

6. Predicting protein function by machine learning on amino acid sequences--a critical evaluation.

7. A novel method for functional annotation prediction based on combination of classification methods.

8. Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer.

9. How Well Does a Sequential Minimal Optimization Model Perform in Predicting Medicine Prices for Procurement System?

10. Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines.