P Baldi1, S Brunak, P Frasconi, G Soda, G Pollastri. 1. Department of Information and Computer Science, College of Medicine, University of California, Irvine 92697-3425, USA.
Abstract
MOTIVATION: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit capturing variable long-rang information. RESULTS: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction--at least comparable to the best existing systems--the main emphasis here is on the development of new algorithmic ideas. AVAILABILITY: The executable program for predicting protein secondary structure is available from the authors free of charge. CONTACT: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it.
MOTIVATION: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit capturing variable long-rang information. RESULTS: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction--at least comparable to the best existing systems--the main emphasis here is on the development of new algorithmic ideas. AVAILABILITY: The executable program for predicting protein secondary structure is available from the authors free of charge. CONTACT: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it.
Authors: Daniel A Colón-Ramos; Pablo M Irusta; Eugene C Gan; Michael R Olson; Jaewhan Song; Richard I Morimoto; Richard M Elliott; Mark Lombard; Robert Hollingsworth; J Marie Hardwick; Gary K Smith; Sally Kornbluth Journal: Mol Biol Cell Date: 2003-07-11 Impact factor: 4.138
Authors: Ossama B Kashlan; Joshua L Adelman; Sora Okumura; Brandon M Blobner; Zachary Zuzek; Rebecca P Hughey; Thomas R Kleyman; Michael Grabe Journal: J Biol Chem Date: 2010-10-25 Impact factor: 5.157
Authors: Hasan Metin Aktulga; Ioannis Kontoyiannis; L Alex Lyznik; Lukasz Szpankowski; Ananth Y Grama; Wojciech Szpankowski Journal: EURASIP J Bioinform Syst Biol Date: 2007
Authors: Blaise Gassend; Charles W O'Donnell; William Thies; Andrew Lee; Marten van Dijk; Srinivas Devadas Journal: BMC Bioinformatics Date: 2007-05-24 Impact factor: 3.169