Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Using genetic algorithms to select most predictive protein features.

Literature DB >> 18798568

Using genetic algorithms to select most predictive protein features.

Abstract

Many important characteristics of proteins such as biochemical activity and subcellular localization present a challenge to machine-learning methods: it is often difficult to encode the appropriate input features at the residue level for the purpose of making a prediction for the entire protein. The problem is usually that the biophysics of the connection between a machine-learning method's input (sequence feature) and its output (observed phenomenon to be predicted) remains unknown; in other words, we may only know that a certain protein is an enzyme (output) without knowing which region may contain the active site residues (input). The goal then becomes to dissect a protein into a vast set of sequence-derived features and to correlate those features with the desired output. We introduce a framework that begins with a set of global sequence features and then vastly expands the feature space by generically encoding the coexistence of residue-based features. It is this combination of individual features, that is the step from the fractions of serine and buried (input space 20 + 2) to the fraction of buried serine (input space 20 * 2) that implicitly shifts the search space from global feature inputs to features that can capture very local evidence such as a the individual residues of a catalytic triad. The vast feature space created is explored by a genetic algorithm (GA) paired with neural networks and support vector machines. We find that the GA is critical for selecting combinations of features that are neither too general resulting in poor performance, nor too specific, leading to overtraining. The final framework manages to effectively sample a feature space that is far too large for exhaustive enumeration. We demonstrate the power of the concept by applying it to prediction of protein enzymatic activity. (c) 2008 Wiley-Liss, Inc.

Entities: Chemical

Mesh：

Substances：

Year: 2009 PMID： 18798568 DOI： 10.1002/prot.22211

Source DB: PubMed Journal: Proteins ISSN： 0887-3585

Keyword Cloud
Cited

8 in total

Review 1. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM).

Authors: Michael Fernandez; Julio Caballero; Leyden Fernandez; Akinori Sarai
Journal: Mol Divers Date: 2010-03-20 Impact factor: 2.943

2. Contrastive learning on protein embeddings enlightens midnight zone.

Authors: Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal: NAR Genom Bioinform Date: 2022-06-11

3. Machine learning on normalized protein sequences.

Authors: Dominik Heider; Jens Verheyen; Daniel Hoffmann
Journal: BMC Res Notes Date: 2011-03-31

4. Insights into the classification of small GTPases.

Authors: Dominik Heider; Sascha Hauke; Martin Pyka; Daniel Kessler
Journal: Adv Appl Bioinform Chem Date: 2010-05-21

5. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers.

Authors: J Nikolaj Dybowski; Mona Riemenschneider; Sascha Hauke; Martin Pyka; Jens Verheyen; Daniel Hoffmann; Dominik Heider
Journal: BioData Min Date: 2011-11-14 Impact factor: 2.522

6. Predicting Bevirimat resistance of HIV-1 from genotype.

Authors: Dominik Heider; Jens Verheyen; Daniel Hoffmann
Journal: BMC Bioinformatics Date: 2010-01-20 Impact factor: 3.169

7. Automatic quantitative MRI texture analysis in small-for-gestational-age fetuses discriminates abnormal neonatal neurobehavior.

Authors: Magdalena Sanz-Cortes; Giuseppe A Ratta; Francesc Figueras; Elisenda Bonet-Carne; Nelly Padilla; Angela Arranz; Nuria Bargallo; Eduard Gratacos
Journal: PLoS One Date: 2013-07-26 Impact factor: 3.240

8. Effective automated feature construction and selection for classification of biological sequences.

Authors: Uday Kamath; Kenneth De Jong; Amarda Shehu
Journal: PLoS One Date: 2014-07-17 Impact factor: 3.240

8 in total