Literature DB >> 15297297

Training HMM structure with genetic algorithm for biological sequence analysis.

Kyoung-Jae Won1, Adam Prügel-Bennett, Anders Krogh.   

Abstract

SUMMARY: Hidden Markov models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic means of optimizing the structure of HMMs would be highly desirable. However, this raises two important issues; first, the new HMMs should be biologically interpretable, and second, we need to control the complexity of the HMM so that it has good generalization performance on unseen sequences. In this paper, we explore the possibility of using a genetic algorithm (GA) for optimizing the HMM structure. GAs are sufficiently flexible to allow incorporation of other techniques such as Baum-Welch training within their evolutionary cycle. Furthermore, operators that alter the structure of HMMs can be designed to favour interpretable and simple structures. In this paper, a training strategy using GAs is proposed, and it is tested on finding HMM structures for the promoter and coding region of the bacterium Campylobacter jejuni. The proposed GA for hidden Markov models (GA-HMM) allows, HMMs with different numbers of states to evolve. To prevent over-fitting, a separate dataset is used for comparing the performance of the HMMs to that used for the Baum-Welch training. The GA-HMM was capable of finding an HMM comparable to a hand-coded HMM designed for the same task, which has been published previously.

Entities:  

Mesh:

Year:  2004        PMID: 15297297     DOI: 10.1093/bioinformatics/bth454

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Learning biophysically-motivated parameters for alpha helix prediction.

Authors:  Blaise Gassend; Charles W O'Donnell; William Thies; Andrew Lee; Marten van Dijk; Srinivas Devadas
Journal:  BMC Bioinformatics       Date:  2007-05-24       Impact factor: 3.169

2.  The construction and use of log-odds substitution scores for multiple sequence alignment.

Authors:  Stephen F Altschul; John C Wootton; Elena Zaslavsky; Yi-Kuo Yu
Journal:  PLoS Comput Biol       Date:  2010-07-15       Impact factor: 4.475

3.  ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

Authors:  Mohsen Hajiloo; Yadav Sapkota; John R Mackey; Paula Robson; Russell Greiner; Sambasivarao Damaraju
Journal:  BMC Bioinformatics       Date:  2013-02-22       Impact factor: 3.169

4.  Analysis of an optimal hidden Markov model for secondary structure prediction.

Authors:  Juliette Martin; Jean-François Gibrat; François Rodolphe
Journal:  BMC Struct Biol       Date:  2006-12-13

5.  Predicting the Outcome of Patient-Provider Communication Sequences using Recurrent Neural Networks and Probabilistic Models.

Authors:  Mehedi Hasan; Alexander Kotov; April Idalski Carcone; Ming Dong; Sylvie Naar
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2018-05-18

6.  An evolutionary method for learning HMM structure: prediction of protein secondary structure.

Authors:  Kyoung-Jae Won; Thomas Hamelryck; Adam Prügel-Bennett; Anders Krogh
Journal:  BMC Bioinformatics       Date:  2007-09-21       Impact factor: 3.169

7.  Breast cancer prediction using genome wide single nucleotide polymorphism data.

Authors:  Mohsen Hajiloo; Babak Damavandi; Metanat Hooshsadat; Farzad Sangi; John R Mackey; Carol E Cass; Russell Greiner; Sambasivarao Damaraju
Journal:  BMC Bioinformatics       Date:  2013-10-01       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.