Literature DB >> 11331239

The utility of different representations of protein sequence for predicting functional class.

R D King1, A Karwath, A Clare, L Dehaspe.   

Abstract

MOTIVATION: Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model.
RESULTS: Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11331239     DOI: 10.1093/bioinformatics/17.5.445

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Feature amplified voting algorithm for functional analysis of protein superfamily.

Authors:  Che-Lun Hung; Chihan Lee; Chun-Yuan Lin; Chih-Hung Chang; Yeh-Ching Chung; Chuan Yi Tang
Journal:  BMC Genomics       Date:  2010-12-01       Impact factor: 3.969

2.  Predicting protein function by machine learning on amino acid sequences--a critical evaluation.

Authors:  Ali Al-Shahib; Rainer Breitling; David R Gilbert
Journal:  BMC Genomics       Date:  2007-03-20       Impact factor: 3.969

3.  Gene function classification using Bayesian models with hierarchy-based priors.

Authors:  Babak Shahbaba; Radford M Neal
Journal:  BMC Bioinformatics       Date:  2006-10-12       Impact factor: 3.169

4.  Homology induction: the use of machine learning to improve sequence similarity searches.

Authors:  Andreas Karwath; Ross D King
Journal:  BMC Bioinformatics       Date:  2002-04-23       Impact factor: 3.169

Review 5.  Hierarchical ensemble methods for protein function prediction.

Authors:  Giorgio Valentini
Journal:  ISRN Bioinform       Date:  2014-05-04
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.