Literature DB >> 33771095

A new algorithm to train hidden Markov models for biological sequences with partial labels.

Jiefu Li1, Jung-Youn Lee2,3, Li Liao4,5,6.   

Abstract

BACKGROUND: Hidden Markov models (HMM) are a powerful tool for analyzing biological sequences in a wide variety of applications, from profiling functional protein families to identifying functional domains. The standard method used for HMM training is either by maximum likelihood using counting when sequences are labelled or by expectation maximization, such as the Baum-Welch algorithm, when sequences are unlabelled. However, increasingly there are situations where sequences are just partially labelled. In this paper, we designed a new training method based on the Baum-Welch algorithm to train HMMs for situations in which only partial labeling is available for certain biological problems.
RESULTS: Compared with a similar method previously reported that is designed for the purpose of active learning in text mining, our method achieves significant improvements in model training, as demonstrated by higher accuracy when the trained models are tested for decoding with both synthetic data and real data.
CONCLUSIONS: A novel training method is developed to improve the training of hidden Markov models by utilizing partial labelled data. The method will impact on detecting de novo motifs and signals in biological sequence data. In particular, the method will be deployed in active learning mode to the ongoing research in detecting plasmodesmata targeting signals and assess the performance with validations from wet-lab experiments.

Entities:  

Keywords:  Biological sequences; Constrained Baum-Welch algorithm; Hidden Markov model; Partial label

Mesh:

Substances:

Year:  2021        PMID: 33771095      PMCID: PMC7995745          DOI: 10.1186/s12859-021-04080-0

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  9 in total

1.  A plasmodesmata-localized protein mediates crosstalk between cell-to-cell communication and innate immunity in Arabidopsis.

Authors:  Jung-Youn Lee; Xu Wang; Weier Cui; Ross Sager; Shannon Modla; Kirk Czymmek; Boris Zybaliov; Klaas van Wijk; Chong Zhang; Hua Lu; Venkatachalam Lakshmanan
Journal:  Plant Cell       Date:  2011-09-20       Impact factor: 11.277

2.  Inclusion of dyad-repeat pattern improves topology prediction of transmembrane β-barrel proteins.

Authors:  Sikander Hayat; Christoph Peters; Nanjiang Shu; Konstantinos D Tsirigos; Arne Elofsson
Journal:  Bioinformatics       Date:  2016-01-21       Impact factor: 6.937

3.  An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes.

Authors:  Robel Y Kahsay; Guang Gao; Li Liao
Journal:  Bioinformatics       Date:  2005-02-02       Impact factor: 6.937

4.  An HMM posterior decoder for sequence feature prediction that includes homology information.

Authors:  Lukas Käll; Anders Krogh; Erik L L Sonnhammer
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

5.  A hidden Markov model for predicting transmembrane helices in protein sequences.

Authors:  E L Sonnhammer; G von Heijne; A Krogh
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1998

6.  Semi-supervised learning of Hidden Markov Models for biological sequence analysis.

Authors:  Ioannis A Tamposis; Konstantinos D Tsirigos; Margarita C Theodoropoulou; Panagiota I Kontou; Pantelis G Bagos
Journal:  Bioinformatics       Date:  2019-07-01       Impact factor: 6.937

7.  A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins.

Authors:  Piero Fariselli; Pier Luigi Martelli; Rita Casadio
Journal:  BMC Bioinformatics       Date:  2005-12-01       Impact factor: 3.169

8.  Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins.

Authors:  Pantelis G Bagos; Theodore D Liakopoulos; Stavros J Hamodrakas
Journal:  BMC Bioinformatics       Date:  2006-04-05       Impact factor: 3.169

9.  Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server.

Authors:  Lukas Käll; Anders Krogh; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2007-05-05       Impact factor: 16.971

  9 in total
  1 in total

1.  Molecular Evolution and Functional Divergence of Stress-Responsive Cu/Zn Superoxide Dismutases in Plants.

Authors:  Guozhi Zhou; Chaochao Liu; Yuan Cheng; Meiying Ruan; Qingjing Ye; Rongqing Wang; Zhuping Yao; Hongjian Wan
Journal:  Int J Mol Sci       Date:  2022-06-25       Impact factor: 6.208

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.